Hi,
Sorry to write almost the same mail again. Since the original email with
tile "Help for Lustre Server Move" contains lengthy reports. I think
perhaps it's better to restart a new one.
I have restored Lustre from 2.4.2 back to 2.5.0. The error is basically the
same as the first mail I sent. However, I rearrange the commands and
outputs in order so that if somebody can help, he can have a better
understanding....
The case is basically the data move of the old MDS to another new MDS.
*******************************************************************
Some basic info again
*** Client (cola1)
eth0: 10.242.116.6
eth1: 192.168.1.6
modprobe.conf: options lnet networks=tcp0(eth1),tcp1(eth0)
*** Old MDS (old_mds)
eth0: 10.242.116.7
eth1: 192.168.1.7
modprobe.conf: options lnet networks=tcp0(eth1),tcp1(eth0)
MGS/MDS mount point: /MDT
device: /dev/mapper/VolGroup00-LogVol03
*** New MDS (new_mds)
eth0: 10.242.116.32
eth1: 192.168.1.32
modprobe.conf: options lnet networks=tcp0(eth1),tcp1(eth0)
MGS/MDS mount point: /MDT
device: /dev/sda6
*** OSS (myoss)
eth0: 192.168.1.34
eth1: Disabled
modprobe.conf: options lnet ip2nets="tcp0 192.168.1.*"
OST mount point: /OST
device: /dev/sda5
*******************************************************************
[root@new_mds]# tunefs.lustre --erase-params /dev/sda6
checking for existing Lustre data:
found
Reading
CONFIGS/mountdata
Read previous values:
Target: lustre-MDT0000
Index: 0
Lustre FS: lustre
Mount type: ldiskfs
Flags: 0x5
(MDT MGS )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:
Permanent disk data:
Target: lustre-MDT0000
Index: 0
Lustre FS: lustre
Mount type: ldiskfs
Flags: 0x45
(MDT MGS update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:
Writing CONFIGS/mountdata
Dec 30 09:38:23 new_mds kernel: LDISKFS-fs (sda6): mounted filesystem with
ordered data mode. quota=on. Opts:
[root@new_mds home]# tunefs.lustre --writeconf --mgs --mdt
/dev/sda6
checking for existing Lustre data:
found
Reading
CONFIGS/mountdata
Read previous values:
Target: lustre-MDT0000
Index: 0
Lustre FS: lustre
Mount type: ldiskfs
Flags: 0x45
(MDT MGS update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:
Permanent disk data:
Target: lustre=MDT0000
Index: 0
Lustre FS: lustre
Mount type: ldiskfs
Flags: 0x145
(MDT MGS update writeconf )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:
Writing CONFIGS/mountdata
[root@new_mds home]# mount -t lustre /dev/sda6 /MDT
Dec 30 09:40:10 new_mds kernel: LDISKFS-fs (sda6): mounted filesystem with
ordered data mode. quota=on. Opts:
Dec 30 09:40:35 new_mds kernel: LDISKFS-fs (sda6): mounted filesystem with
ordered data mode. quota=on. Opts:
Dec 30 09:40:35 new_mds kernel: LNet: HW CPU cores: 8, npartitions: 2
Dec 30 09:40:35 new_mds modprobe: FATAL: Error inserting crc32c_intel
(/lib/modules/2.6.32-358.18.1.el6_lustre.x86_64/kernel/arch/x86/crypto/crc32c-intel.ko):
No such device
Dec 30 09:40:35 new_mds kernel: alg: No test for crc32 (crc32-table)
Dec 30 09:40:35 new_mds kernel: alg: No test for adler32 (adler32-zlib)
Dec 30 09:40:39 new_mds kernel: padlock: VIA PadLock Hash Engine not
detected.
Dec 30 09:40:39 new_mds modprobe: FATAL: Error inserting padlock_sha
(/lib/modules/2.6.32-358.18.1.el6_lustre.x86_64/kernel/drivers/crypto/padlock-sha.ko):
No such device
Dec 30 09:40:43 new_mds kernel: Lustre: Lustre: Build Version:
2.5.0-RC1--PRISTINE-2.6.32-358.18.1.el6_lustre.x86_64
Dec 30 09:40:43 new_mds kernel: LNet: Added LNI 192.168.1.32@tcp[8/256/0/180]
Dec 30 09:40:43 new_mds kernel: LNet: Added LNI 10.242.116.32@tcp1[8/256/0/180]
Dec 30 09:40:43 new_mds kernel: LNet: Accept secure, port 988
Dec 30 09:40:44 new_mds kernel: LDISKFS-fs (sda6): mounted filesystem with
ordered data mode. quota=on. Opts:
Dec 30 09:40:44 new_mds kernel: Lustre: MGS: Logs for fs lustre were
removed by user request. All servers must be restarted in order to
regenerate the logs.
Dec 30 09:40:45 new_mds kernel: Lustre: lustre-MDT0000: used disk, loading
Dec 30 09:40:45 new_mds kernel: LustreError:
3461:0:(osd_io.c:950:osd_ldiskfs_read()) lustre=MDT0000: can't read
128@8192on ino 33: rc = 0
Dec 30 09:40:45 new_mds kernel: LustreError:
3461:0:(mdt_recovery.c:112:mdt_clients_data_init()) error reading MDS
last_rcvd idx 0, off 8192: rc -14
Dec 30 09:40:45 new_mds kernel: LustreError: 11-0:
lustre-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect
failed with -11.
Dec 30 09:40:45 new_mds kernel: Lustre: lustre-MDD0000: changelog on
[root@myoss ~]# tunefs.lustre --erase-params /dev/sda5
checking for existing Lustre data: found
Reading CONFIGS/mountdata
Read previous values:
Target: lustre-OST0000
Index: 0
Lustre FS: lustre
Mount type: ldiskfs
Flags: 0x1002
(OST no_primnode )
Persistent mount opts: errors=remount-ro
Parameters: mgsnode=192.168.1.32@tcp
Permanent disk data:
Target: lustre-OST0000
Index: 0
Lustre FS: lustre
Mount type: ldiskfs
Flags: 0x1042
(OST update no_primnode )
Persistent mount opts: errors=remount-ro
Parameters:
Writing CONFIGS/mountdata
Dec 30 09:42:08 myoss kernel: LDISKFS-fs (sda5): mounted filesystem with
ordered data mode. quota=on. Opts:
[root@myoss ~]# tunefs.lustre --writeconf --mgsnode=192.168.1.32@tcp --ost
/dev/sda5
checking for existing Lustre data:
found
Reading
CONFIGS/mountdata
Read previous values:
Target: lustre-OST0000
Index: 0
Lustre FS: lustre
Mount type: ldiskfs
Flags: 0x1042
(OST update no_primnode )
Persistent mount opts: errors=remount-ro
Parameters:
Permanent disk data:
Target: lustre=OST0000
Index: 0
Lustre FS: lustre
Mount type: ldiskfs
Flags: 0x1142
(OST update writeconf no_primnode )
Dec 30 09:42:51 myoss kernel: LDISKFS-fs (sda5): mounted filesystem with
ordered data mode. quota=on. Opts:
[root@myoss ~]# tunefs.lustre --writeconf /dev/sda5checking for existing
Lustre data: found
Reading CONFIGS/mountdata
Read previous values:
Target: lustre-OST0000
Index: 0
Lustre FS: lustre
Mount type: ldiskfs
Flags: 0x1142
(OST update writeconf no_primnode )
Persistent mount opts: errors=remount-ro
Parameters: mgsnode=192.168.1.32@tcp
Permanent disk data:
Target: lustre=OST0000
Index: 0
Lustre FS: lustre
Mount type: ldiskfs
Flags: 0x1142
(OST update writeconf no_primnode )
Persistent mount opts: errors=remount-ro
Parameters: mgsnode=192.168.1.32@tcp
Writing CONFIGS/mountdata
Dec 30 09:44:14 myoss kernel: LDISKFS-fs (sda5): mounted filesystem with
ordered data mode. quota=on. Opts:
[root@myoss ~]# mount -t lustre /dev/sda5 /OST
Dec 30 09:44:55 myoss kernel: LDISKFS-fs (sda5): mounted filesystem with
ordered data mode. quota=on. Opts:
Dec 30 09:44:55 myoss kernel: LNet: HW CPU cores: 12, npartitions: 4
Dec 30 09:44:55 myoss kernel: alg: No test for crc32 (crc32-table)
Dec 30 09:44:55 myoss kernel: alg: No test for adler32 (adler32-zlib)
Dec 30 09:44:55 myoss kernel: alg: No test for crc32 (crc32-pclmul)
Dec 30 09:44:59 myoss kernel: padlock: VIA PadLock Hash Engine not detected.
Dec 30 09:44:59 myoss modprobe: FATAL: Error inserting padlock_sha
(/lib/modules/2.6.32-358.18.1.el6_lustre.x86_64/kernel/drivers/crypto/padlock-sha.ko):
No such device
Dec 30 09:45:03 myoss kernel: Lustre: Lustre: Build Version:
2.5.0-RC1--PRISTINE-2.6.32-358.18.1.el6_lustre.x86_64
Dec 30 09:45:03 myoss kernel: LNet: Added LNI 192.168.1.34@tcp [8/256/0/180]
Dec 30 09:45:03 myoss kernel: LNet: Accept secure, port 988
Dec 30 09:45:04 myoss kernel: LDISKFS-fs (sda5): mounted filesystem with
ordered data mode. quota=on. Opts:
Dec 30 09:45:04 myoss kernel: LustreError: 13a-8: Failed to get MGS log
params and no local copy.
Dec 30 09:45:04 myoss kernel: LustreError:
3457:0:(fld_handler.c:150:fld_server_lookup()) srv-lustre-OST0000: lookup
0x860002, but not connects to MDT0yet: rc = -5.
Dec 30 09:45:04 myoss kernel: LustreError:
3457:0:(osd_handler.c:2134:osd_fld_lookup()) lustre-OST0000-osd: cannot
find FLD range for 0x860002: rc = -5
Dec 30 09:45:04 myoss kernel: LustreError:
3457:0:(osd_handler.c:3364:osd_mdt_seq_exists()) lustre-OST0000-osd: Can
not lookup fld for 0x860002
Dec 30 09:45:05 myoss kernel: LustreError: 13a-8: Failed to get MGS log
params and no local copy.
Dec 30 09:46:24 new_mds kernel: Lustre: MGS: Regenerating lustre-OST0000
log by user request.
Dec 30 09:46:34 new_mds kernel: Lustre:
3432:0:(mgc_request.c:1645:mgc_process_recover_log()) Process recover log
lustre-mdtir error -22
Dec 30 09:46:34 new_mds kernel: LustreError:
3505:0:(ldlm_lib.c:429:client_obd_setup()) can't add initial connection
Dec 30 09:46:34 new_mds kernel: LustreError:
3505:0:(osp_dev.c:684:osp_init0()) lustre-OST0000-osc-MDT0000: can't setup
obd: -2
Dec 30 09:46:34 new_mds kernel: LustreError:
3505:0:(obd_config.c:572:class_setup()) setup lustre-OST0000-osc-MDT0000
failed (-2)
Dec 30 09:46:34 new_mds kernel: LustreError:
3505:0:(obd_config.c:1591:class_config_llog_handler()) MGC192.168.1.32@tcp:
cfg command failed: rc = -2
Dec 30 09:46:34 new_mds kernel: Lustre: cmd=cf003
0:lustre-OST0000-osc-MDT0000 1:lustre-OST0000_UUID 2:0@<0:0>
[root@cola1 ~]# mount -t lustre 192.168.1.32@tcp:/lustre /lustre
mount.lustre: mount 192.168.1.32@tcp:/lustre at /lustre failed: No such
file or directory
Is the MGS specification correct?
Is the filesystem name correct?
If upgrading, is the copied client log valid? (see upgrade docs)
Dec 30 09:48:52 cola1 kernel: LNet: HW CPU cores: 8, npartitions: 2
Dec 30 09:48:52 cola1 modprobe: FATAL: Error inserting crc32c_intel
(/lib/modules/2.6.32-358.18.1.el6_lustre.x86_64/kernel/arch/x86/crypto/crc32c-intel.ko):
No such device
Dec 30 09:48:52 cola1 kernel: alg: No test for crc32 (crc32-table)
Dec 30 09:48:52 cola1 kernel: alg: No test for adler32 (adler32-zlib)
Dec 30 09:48:56 cola1 kernel: padlock: VIA PadLock Hash Engine not detected.
Dec 30 09:48:56 cola1 modprobe: FATAL: Error inserting padlock_sha
(/lib/modules/2.6.32-358.18.1.el6_lustre.x86_64/kernel/drivers/crypto/padlock-sha.ko):
No such device
Dec 30 09:49:00 cola1 kernel: Lustre: Lustre: Build Version:
2.5.0-RC1--PRISTINE-2.6.32-358.18.1.el6_lustre.x86_64
Dec 30 09:49:00 cola1 kernel: LNet: Added LNI 192.168.1.7@tcp [8/256/0/180]
Dec 30 09:49:00 cola1 kernel: LNet: Added LNI 10.242.116.7@tcp1[8/256/0/180]
Dec 30 09:49:00 cola1 kernel: LNet: Accept secure, port 988
Dec 30 09:49:00 cola1 kernel: LustreError:
2562:0:(ldlm_lib.c:429:client_obd_setup()) can't add initial connection
Dec 30 09:49:00 cola1 kernel: LustreError:
2562:0:(obd_config.c:572:class_setup()) setup
lustre-OST0000-osc-ffff88021996ac00 failed (-2)
Dec 30 09:49:00 cola1 kernel: LustreError:
2562:0:(obd_config.c:1591:class_config_llog_handler()) MGC192.168.1.32@tcp:
cfg command failed: rc = -2
Dec 30 09:49:00 cola1 kernel: Lustre: cmd=cf003 0:lustre-OST0000-osc
1:lustre-OST0000_UUID 2:0@<0:0>
Dec 30 09:49:00 cola1 kernel: LustreError: 15c-8: MGC192.168.1.32@tcp: The
configuration from log 'lustre-client' failed (-2). This may be the result
of communication errors between this node and the MGS, a bad configuration,
or other errors. See the syslog for more information.
Dec 30 09:49:00 cola1 kernel: LustreError:
2481:0:(llite_lib.c:1044:ll_fill_super()) Unable to process log: -2
Dec 30 09:49:00 cola1 kernel: LustreError:
2481:0:(obd_config.c:619:class_cleanup()) Device 4 not setup
Dec 30 09:49:00 cola1 kernel: Lustre: Unmounted lustre-client
Dec 30 09:49:00 cola1 kernel: LustreError:
2481:0:(obd_mount.c:1311:lustre_fill_super()) Unable to mount (-2)
Dec 30 09:47:24 myoss kernel: INFO: task tgt_recov:3645 blocked for more
than 120 seconds.
Dec 30 09:47:24 myoss kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 30 09:47:24 myoss kernel: tgt_recov D 000000000000000b 0
3645 2 0x00000080
Dec 30 09:47:24 myoss kernel: ffff88044d985da0 0000000000000046
0000000000000000 0000000000000003
Dec 30 09:47:24 myoss kernel: ffff88044d985d30 ffffffff81055f96
ffff88044d985d40 ffff8804733beae0
Dec 30 09:47:24 myoss kernel: ffff88044d983af8 ffff88044d985fd8
000000000000fb88 ffff88044d983af8
Dec 30 09:47:24 myoss kernel: Call Trace:
Dec 30 09:47:24 myoss kernel: [<ffffffff81055f96>] ? enqueue_task+0x66/0x80
Dec 30 09:47:24 myoss kernel: [<ffffffffa07b0210>] ?
check_for_clients+0x0/0x70 [ptlrpc]
Dec 30 09:47:24 myoss kernel: [<ffffffffa07b187d>]
target_recovery_overseer+0x9d/0x230 [ptlrpc]
Dec 30 09:47:24 myoss kernel: [<ffffffffa07aff00>] ?
exp_connect_healthy+0x0/0x20 [ptlrpc]
Dec 30 09:47:24 myoss kernel: [<ffffffff81096da0>] ?
autoremove_wake_function+0x0/0x40
Dec 30 09:47:24 myoss kernel: [<ffffffffa07b8140>] ?
target_recovery_thread+0x0/0x1920 [ptlrpc]
Dec 30 09:47:24 myoss kernel: [<ffffffffa07b8680>]
target_recovery_thread+0x540/0x1920 [ptlrpc]
Dec 30 09:47:24 myoss kernel: [<ffffffff81063422>] ?
default_wake_function+0x12/0x20
Dec 30 09:47:24 myoss kernel: [<ffffffffa07b8140>] ?
target_recovery_thread+0x0/0x1920 [ptlrpc]
Dec 30 09:47:24 myoss kernel: [<ffffffff81096a36>] kthread+0x96/0xa0
Dec 30 09:47:24 myoss kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
Dec 30 09:47:24 myoss kernel: [<ffffffff810969a0>] ? kthread+0x0/0xa0
Dec 30 09:47:24 myoss kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Thanks,
Frank
Show replies by date