Hi,
i've prepared a test setup (see
https://github.com/marcindulak/vagrant-lustre-tutorial)
in order to test a possible migration of the 1.8.9-wc1 servers into lustre
2.X.
The requirement is that the new 2.X servers must work with the old clients
1.8.9-wc1 in a (possibly long) transition period. Which X too choose?
As a side point, I'm suggesting the maintainers responsible for lustre
documentation make use of this setup. Lustre really lacks good
documentation,
and having a runnable vagrant example is helpful.
Back to the interoperability question, it seems not working for me.
[root@centos6_lustre18 ~]# mount /lustre
mount.lustre: mount mds01@tcp0,mds02@tcp0:/testfs at /lustre failed: Cannot
send after transport endpoint shutdown
and centos6_lustre18:/var/log/messages says:
Jun 2 14:41:47 centos6_lustre18 kernel: Lustre:
1125:0:(client.c:1529:ptlrpc_expire_one_request()) @@@ Request
x1502877822484484 sent from MGC10.0.4.6@tcp to NID 10.0.4.7@tcp 6s ago has
timed out (5s prior to deadline).
Jun 2 14:41:47 centos6_lustre18 kernel: req@ffff88000c8f9800
x1502877822484484/t0 o250->MGS@MGC10.0.4.6@tcp_0:26/25 lens 368/584 e 0 to
1 dl 1433256106 ref 1 fl Rpc:N/0/0 rc 0/0
Jun 2 14:41:47 centos6_lustre18 kernel: LustreError:
2648:0:(client.c:859:ptlrpc_import_delay_req()) @@@ IMP_INVALID
req@ffff88000c8f9400 x1502877822484486/t0 o501->MGS@MGC10.0.4.6@tcp_0:26/25
lens 264/432 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0
Jun 2 14:41:47 centos6_lustre18 kernel: LustreError: 15c-8: MGC10.0.4.6@tcp:
The configuration from log 'testfs-client' failed (-108). This may be the
result of communication errors between this node and the MGS, a bad
configuration, or other errors. See the syslog for more information.
Jun 2 14:41:47 centos6_lustre18 kernel: LustreError:
2648:0:(llite_lib.c:1099:ll_fill_super()) Unable to process log: -108
Jun 2 14:41:47 centos6_lustre18 kernel: Lustre: client
testfs-client(ffff88000db7f000) umount complete
Jun 2 14:41:47 centos6_lustre18 kernel: LustreError:
2648:0:(obd_mount.c:2067:lustre_fill_super()) Unable to mount (-108)
I found two posts:
https://lists.01.org/pipermail/hpdd-discuss/2013-November/000614.html
and
https://jira.hpdd.intel.com/browse/LU-5974 . The latter one is more
recent and suggests
some kind of interoperability is still there. I can mount the filesystem on
centos6_lustre18 with mds01@tcp0:/testfs, but not with mds01@tcp0,mds02@tcp0
:/testfs.
The mds02 is not active.
I get a similar problem for ubuntu12 client (official Ubuntu's 1.8.5):
while I can mount mds01@tcp0,mds02@tcp0:/testfs, accessing the filesystem
from the client hangs.
Remounting to mds01@tcp0:/testfs works.
Is there something wrong in my setup? What's the interoperability status?
Best regards,
Marcin