Hi,

i've prepared a test setup (see https://github.com/marcindulak/vagrant-lustre-tutorial)
in order to test a possible migration of the 1.8.9-wc1 servers into lustre 2.X.
The requirement is that the new 2.X servers must work with the old clients
1.8.9-wc1 in a (possibly long) transition period. Which X too choose?

As a side point, I'm suggesting the maintainers responsible for lustre
documentation make use of this setup. Lustre really lacks good documentation,
and having a runnable vagrant example is helpful.

Back to the interoperability question, it seems not working for me.

[root@centos6_lustre18 ~]# mount /lustre
mount.lustre: mount mds01@tcp0,mds02@tcp0:/testfs at /lustre failed: Cannot send after transport endpoint shutdown

and centos6_lustre18:/var/log/messages says:

Jun  2 14:41:47 centos6_lustre18 kernel: Lustre: 1125:0:(client.c:1529:ptlrpc_expire_one_request()) @@@ Request x1502877822484484 sent from MGC10.0.4.6@tcp to NID 10.0.4.7@tcp 6s ago has timed out (5s prior to deadline).
Jun  2 14:41:47 centos6_lustre18 kernel:  req@ffff88000c8f9800 x1502877822484484/t0 o250->MGS@MGC10.0.4.6@tcp_0:26/25 lens 368/584 e 0 to 1 dl 1433256106 ref 1 fl Rpc:N/0/0 rc 0/0
Jun  2 14:41:47 centos6_lustre18 kernel: LustreError: 2648:0:(client.c:859:ptlrpc_import_delay_req()) @@@ IMP_INVALID  req@ffff88000c8f9400 x1502877822484486/t0 o501->MGS@MGC10.0.4.6@tcp_0:26/25 lens 264/432 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0
Jun  2 14:41:47 centos6_lustre18 kernel: LustreError: 15c-8: MGC10.0.4.6@tcp: The configuration from log 'testfs-client' failed (-108). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
Jun  2 14:41:47 centos6_lustre18 kernel: LustreError: 2648:0:(llite_lib.c:1099:ll_fill_super()) Unable to process log: -108
Jun  2 14:41:47 centos6_lustre18 kernel: Lustre: client testfs-client(ffff88000db7f000) umount complete
Jun  2 14:41:47 centos6_lustre18 kernel: LustreError: 2648:0:(obd_mount.c:2067:lustre_fill_super()) Unable to mount  (-108)

and https://jira.hpdd.intel.com/browse/LU-5974 . The latter one is more recent and suggests
some kind of interoperability is still there. I can mount the filesystem on centos6_lustre18 with mds01@tcp0:/testfs, but not with mds01@tcp0,mds02@tcp0:/testfs.
The mds02 is not active.
I get a similar problem for ubuntu12 client (official Ubuntu's 1.8.5): while I can mount mds01@tcp0,mds02@tcp0:/testfs, accessing the filesystem from the client hangs.
Remounting to mds01@tcp0:/testfs works.

Is there something wrong in my setup? What's the interoperability status?

Best regards,

Marcin