Dear list,
after a writeconf we cannot mount the MDT. This happened with
Lustre 2.1.3. Any hints for fixing this problem would be greatly
appreciated. For details and log messages see below.
Details:
The file system is already more than 5 years old and was created
with Lustre 1.6. Later it was running with Lustre 1.8 and we upgraded
to version 2.1.3 a year ago. Since that time we had very few problems.
However, we frequently got LustreError messages on clients because
some applications wanted to use ACLs and ACLs were not enabled.
In order to change the ACL configuration we did a writeconf which
probably was a bad idea since afterwards the MDT did not start.
Removing pfs1work-MDT0000 on MGS/MDS or pfs1work-client on the MGS
did not help. Upgrading to version 2.1.6 on MDS and MDT did not fix
this problem. We made a backup of the MDT device and downgraded MDS
and MDT to version 1.8 since the writeconf had worked with that version
and indeed we were able to start the MDT. However, after upgrading to
version 2.1.3 the MDT does not mount again. We ran a read-only e2fsck
on the MDT and this did not find any problems. We are wondering if an
upgrade to version 2.4 would fix the problem.
Here are the messages from the MDS:
Dec 11 19:28:18 pfs1n2 kernel: [19046.838713] LDISKFS-fs (dm-6): mounted
filesystem with ordered data mode
Dec 11 19:28:18 pfs1n2 kernel: [19046.855112] Lustre:
MGC172.26.1.1@o2ib: Reactivating import
Dec 11 19:28:18 pfs1n2 kernel: [19046.922722] Lustre: Enabling ACL
Dec 11 19:28:18 pfs1n2 kernel: [19047.259229] LustreError:
28547:0:(mdd_device.c:1164:mdd_prepare()) Error(-2) initializing .lustre
objects
Dec 11 19:28:18 pfs1n2 kernel: [19047.337228] LustreError:
28547:0:(mdt_handler.c:4606:mdt_init0()) Can't init device stack, rc -2
Dec 11 19:28:18 pfs1n2 kernel: [19047.417024] LustreError:
28547:0:(obd_config.c:565:class_setup()) setup pfs1work-MDT0000 failed (-2)
Dec 11 19:28:18 pfs1n2 kernel: [19047.426650] LustreError:
28547:0:(obd_config.c:1491:class_config_llog_handler()) Err -2 on cfg
command:
Dec 11 19:28:19 pfs1n2 kernel: [19047.436520] Lustre: cmd=cf003
0:pfs1work-MDT0000 1:pfs1work-MDT0000_UUID 2:0
3:pfs1work-MDT0000-mdtlov 4:f
Dec 11 19:28:19 pfs1n2 kernel: [19047.447504] LustreError: 15c-8:
MGC172.26.1.1@o2ib: The configuration from log 'pfs1work-MDT0000' failed
(-2). This may be the result of communication errors between this node
and the MGS, a bad configuration, or other errors. See the syslog for
more information.
Dec 11 19:28:19 pfs1n2 kernel: [19047.471946] LustreError:
28516:0:(obd_mount.c:1192:server_start_targets()) failed to start server
pfs1work-MDT0000: -2
Dec 11 19:28:19 pfs1n2 kernel: [19047.483194]
LustreError:28516:0:(obd_mount.c:1738:server_fill_super()) Unable to
start targets: -2
Dec 11 19:28:19 pfs1n2 kernel: [19047.492704] LustreError:
28516:0:(obd_config.c:610:class_cleanup()) Device 2 not setup
Dec 11 19:28:19 pfs1n2 kernel: [19047.501082] LustreError:
28516:0:(ldlm_request.c:1174:ldlm_cli_cancel_req()) Got rc -108 from
cancel RPC: canceling anyway
Dec 11 19:28:19 pfs1n2 kernel: [19047.512672] LustreError:
28516:0:(ldlm_request.c:1801:ldlm_cli_cancel_list())
ldlm_cli_cancel_list: -108
Dec 11 19:28:19 pfs1n2 kernel: [19047.554700] Lustre: server umount
pfs1work-MDT0000 complete
Dec 11 19:28:19 pfs1n2 kernel: [19047.560542] LustreError:
28516:0:(obd_mount.c:2203:lustre_fill_super()) Unable to mount (-2)
At the same time on the MGS:
Dec 11 19:27:47 pfs1n1 kernel: [18002.010225] LDISKFS-fs (dm-6): mounted
filesystem with ordered data mode
Dec 11 19:27:47 pfs1n1 kernel: [18002.029854] Lustre: MGS MGS started
Dec 11 19:27:47 pfs1n1 kernel: [18002.034066] Lustre: 23937:0
(ldlm_lib.c:952:target_handle_connect()) MGS: connection from
628d6315-3333-d644-d0b0-314bb162402d@0@lo t0 exp (null) cur 1386786467
last 0
Dec 11 19:27:47 pfs1n1 kernel: [18002.049753] Lustre:
23937:0:(ldlm_lib.c:952:target_handle_connect()) Skipped 1 previous
similar message
Dec 11 19:27:47 pfs1n1 kernel: [18002.060070] Lustre:
MGC172.26.1.1@o2ib: Reactivating import
Dec 11 19:28:03 pfs1n1 kernel: [18018.052321] Lustre:
23937:0:(ldlm_lib.c:952:target_handle_connect()) MGS: connection from
2abc20b4-6abb-0604-5760-8f458c1ef6c6@172.26.23.231@o2ib t0 exp (null)
cur 1386786483 last 0
Dec 11 19:28:18 pfs1n1 kernel: [18032.725732] Lustre: MGS: Logs for fs
pfs1work were removed by user request. All servers must be restarted in
order to regenerate the logs.
Dec 11 19:28:18 pfs1n1 kernel: [18032.740584] Lustre: Setting parameter
pfs1work-MDT0000.mdd.quota_type in log pfs1work-MDT0000
Thanks,
Roland
--
Karlsruhe Institute of Technology (KIT)
Steinbuch Centre for Computing (SCC)
Roland Laifer
Scientific Computing und Simulation (SCS)
Zirkel 2, Building 20.21, Room 209
76131 Karlsruhe, Germany
Phone: +49 721 608 44861
Fax: +49 721 32550
Email: roland.laifer(a)kit.edu
Web:
http://www.scc.kit.edu
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association