I came across this post where a user reported pretty much the same errors as you did:
https://lists.01.org/pipermail/hpdd-discuss/2013-April/000173.html
Is there any chance that the .lustre directory was removed/renamed?
--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu
On Dec 12, 2013, at 8:27 AM, "Laifer, Roland (SCC)"
<roland.laifer(a)kit.edu> wrote:
Dear list,
after a writeconf we cannot mount the MDT. This happened with
Lustre 2.1.3. Any hints for fixing this problem would be greatly
appreciated. For details and log messages see below.
Details:
The file system is already more than 5 years old and was created
with Lustre 1.6. Later it was running with Lustre 1.8 and we upgraded
to version 2.1.3 a year ago. Since that time we had very few problems.
However, we frequently got LustreError messages on clients because
some applications wanted to use ACLs and ACLs were not enabled.
In order to change the ACL configuration we did a writeconf which
probably was a bad idea since afterwards the MDT did not start.
Removing pfs1work-MDT0000 on MGS/MDS or pfs1work-client on the MGS
did not help. Upgrading to version 2.1.6 on MDS and MDT did not fix
this problem. We made a backup of the MDT device and downgraded MDS
and MDT to version 1.8 since the writeconf had worked with that version
and indeed we were able to start the MDT. However, after upgrading to
version 2.1.3 the MDT does not mount again. We ran a read-only e2fsck
on the MDT and this did not find any problems. We are wondering if an
upgrade to version 2.4 would fix the problem.
Here are the messages from the MDS:
Dec 11 19:28:18 pfs1n2 kernel: [19046.838713] LDISKFS-fs (dm-6): mounted filesystem with
ordered data mode
Dec 11 19:28:18 pfs1n2 kernel: [19046.855112] Lustre: MGC172.26.1.1@o2ib: Reactivating
import
Dec 11 19:28:18 pfs1n2 kernel: [19046.922722] Lustre: Enabling ACL
Dec 11 19:28:18 pfs1n2 kernel: [19047.259229] LustreError:
28547:0:(mdd_device.c:1164:mdd_prepare()) Error(-2) initializing .lustre objects
Dec 11 19:28:18 pfs1n2 kernel: [19047.337228] LustreError:
28547:0:(mdt_handler.c:4606:mdt_init0()) Can't init device stack, rc -2
Dec 11 19:28:18 pfs1n2 kernel: [19047.417024] LustreError:
28547:0:(obd_config.c:565:class_setup()) setup pfs1work-MDT0000 failed (-2)
Dec 11 19:28:18 pfs1n2 kernel: [19047.426650] LustreError:
28547:0:(obd_config.c:1491:class_config_llog_handler()) Err -2 on cfg command:
Dec 11 19:28:19 pfs1n2 kernel: [19047.436520] Lustre: cmd=cf003
0:pfs1work-MDT0000 1:pfs1work-MDT0000_UUID 2:0 3:pfs1work-MDT0000-mdtlov 4:f
Dec 11 19:28:19 pfs1n2 kernel: [19047.447504] LustreError: 15c-8: MGC172.26.1.1@o2ib: The
configuration from log 'pfs1work-MDT0000' failed (-2). This may be the result of
communication errors between this node and the MGS, a bad configuration, or other errors.
See the syslog for more information.
Dec 11 19:28:19 pfs1n2 kernel: [19047.471946] LustreError:
28516:0:(obd_mount.c:1192:server_start_targets()) failed to start server
pfs1work-MDT0000: -2
Dec 11 19:28:19 pfs1n2 kernel: [19047.483194]
LustreError:28516:0:(obd_mount.c:1738:server_fill_super()) Unable to start targets: -2
Dec 11 19:28:19 pfs1n2 kernel: [19047.492704] LustreError:
28516:0:(obd_config.c:610:class_cleanup()) Device 2 not setup
Dec 11 19:28:19 pfs1n2 kernel: [19047.501082] LustreError:
28516:0:(ldlm_request.c:1174:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling
anyway
Dec 11 19:28:19 pfs1n2 kernel: [19047.512672] LustreError:
28516:0:(ldlm_request.c:1801:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
Dec 11 19:28:19 pfs1n2 kernel: [19047.554700] Lustre: server umount pfs1work-MDT0000
complete
Dec 11 19:28:19 pfs1n2 kernel: [19047.560542] LustreError:
28516:0:(obd_mount.c:2203:lustre_fill_super()) Unable to mount (-2)
At the same time on the MGS:
Dec 11 19:27:47 pfs1n1 kernel: [18002.010225] LDISKFS-fs (dm-6): mounted filesystem with
ordered data mode
Dec 11 19:27:47 pfs1n1 kernel: [18002.029854] Lustre: MGS MGS started
Dec 11 19:27:47 pfs1n1 kernel: [18002.034066] Lustre: 23937:0
(ldlm_lib.c:952:target_handle_connect()) MGS: connection from
628d6315-3333-d644-d0b0-314bb162402d@0@lo t0 exp (null) cur 1386786467 last 0
Dec 11 19:27:47 pfs1n1 kernel: [18002.049753] Lustre:
23937:0:(ldlm_lib.c:952:target_handle_connect()) Skipped 1 previous similar message
Dec 11 19:27:47 pfs1n1 kernel: [18002.060070] Lustre: MGC172.26.1.1@o2ib: Reactivating
import
Dec 11 19:28:03 pfs1n1 kernel: [18018.052321] Lustre:
23937:0:(ldlm_lib.c:952:target_handle_connect()) MGS: connection from
2abc20b4-6abb-0604-5760-8f458c1ef6c6@172.26.23.231@o2ib t0 exp (null) cur 1386786483 last
0
Dec 11 19:28:18 pfs1n1 kernel: [18032.725732] Lustre: MGS: Logs for fs pfs1work were
removed by user request. All servers must be restarted in order to regenerate the logs.
Dec 11 19:28:18 pfs1n1 kernel: [18032.740584] Lustre: Setting parameter
pfs1work-MDT0000.mdd.quota_type in log pfs1work-MDT0000
Thanks,
Roland
--
Karlsruhe Institute of Technology (KIT)
Steinbuch Centre for Computing (SCC)
Roland Laifer
Scientific Computing und Simulation (SCS)
Zirkel 2, Building 20.21, Room 209
76131 Karlsruhe, Germany
Phone: +49 721 608 44861
Fax: +49 721 32550
Email: roland.laifer(a)kit.edu
Web:
http://www.scc.kit.edu
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss(a)lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss