Hello Rick,
Am 12.12.2013 20:36, schrieb Mohr Jr, Richard Frank (Rick Mohr):
I came across this post where a user reported pretty much the same
errors as you did:
https://lists.01.org/pipermail/hpdd-discuss/2013-April/000173.html
this is exactly the right hint which fixed our problem!
Actually we found that post, too, before we got your response.
The following detailed steps fixed the problem:
$ mount -t ldiskfs /dev/mapper/pfs1work-mdt_work_lv /mnt/lustre/mdt_work
$ cd /mnt/lustre/mdt_work/ROOT
$ mkdir .lustre
$ umount /mnt/lustre/mdt_work
$ mount -t lustre /dev/pfs1work/mdt_work_lv /mnt/lustre/mdt_work
Is there any chance that the .lustre directory was removed/renamed?
This is different from that post: We have no idea why the .lustre
directory was removed.
Thanks again,
Roland
-- Rick Mohr Senior HPC System Administrator National Institute for
Computational Sciences
http://www.nics.tennessee.edu On Dec 12, 2013, at
8:27 AM, "Laifer, Roland (SCC)" <roland.laifer(a)kit.edu> wrote:
> >Dear list,
> >
> >after a writeconf we cannot mount the MDT. This happened with
> >Lustre 2.1.3. Any hints for fixing this problem would be greatly
> >appreciated. For details and log messages see below.
> >
> >Details:
> >The file system is already more than 5 years old and was created
> >with Lustre 1.6. Later it was running with Lustre 1.8 and we upgraded
> >to version 2.1.3 a year ago. Since that time we had very few problems.
> >However, we frequently got LustreError messages on clients because
> >some applications wanted to use ACLs and ACLs were not enabled.
> >In order to change the ACL configuration we did a writeconf which
> >probably was a bad idea since afterwards the MDT did not start.
> >Removing pfs1work-MDT0000 on MGS/MDS or pfs1work-client on the MGS
> >did not help. Upgrading to version 2.1.6 on MDS and MDT did not fix
> >this problem. We made a backup of the MDT device and downgraded MDS
> >and MDT to version 1.8 since the writeconf had worked with that version
> >and indeed we were able to start the MDT. However, after upgrading to
> >version 2.1.3 the MDT does not mount again. We ran a read-only e2fsck
> >on the MDT and this did not find any problems. We are wondering if an
> >upgrade to version 2.4 would fix the problem.
> >
> >Here are the messages from the MDS:
> >Dec 11 19:28:18 pfs1n2 kernel: [19046.838713] LDISKFS-fs (dm-6): mounted
filesystem with ordered data mode
> >Dec 11 19:28:18 pfs1n2 kernel: [19046.855112] Lustre: MGC172.26.1.1@o2ib:
Reactivating import
> >Dec 11 19:28:18 pfs1n2 kernel: [19046.922722] Lustre: Enabling ACL
> >Dec 11 19:28:18 pfs1n2 kernel: [19047.259229] LustreError:
28547:0:(mdd_device.c:1164:mdd_prepare()) Error(-2) initializing .lustre objects
> >Dec 11 19:28:18 pfs1n2 kernel: [19047.337228] LustreError:
28547:0:(mdt_handler.c:4606:mdt_init0()) Can't init device stack, rc -2
> >Dec 11 19:28:18 pfs1n2 kernel: [19047.417024] LustreError:
28547:0:(obd_config.c:565:class_setup()) setup pfs1work-MDT0000 failed (-2)
> >Dec 11 19:28:18 pfs1n2 kernel: [19047.426650] LustreError:
28547:0:(obd_config.c:1491:class_config_llog_handler()) Err -2 on cfg command:
> >Dec 11 19:28:19 pfs1n2 kernel: [19047.436520] Lustre: cmd=cf003
> >0:pfs1work-MDT0000 1:pfs1work-MDT0000_UUID 2:0 3:pfs1work-MDT0000-mdtlov 4:f
> >Dec 11 19:28:19 pfs1n2 kernel: [19047.447504] LustreError: 15c-8:
MGC172.26.1.1@o2ib: The configuration from log 'pfs1work-MDT0000' failed (-2).
This may be the result of communication errors between this node and the MGS, a bad
configuration, or other errors. See the syslog for more information.
> >Dec 11 19:28:19 pfs1n2 kernel: [19047.471946] LustreError:
> >28516:0:(obd_mount.c:1192:server_start_targets()) failed to start server
pfs1work-MDT0000: -2
> >Dec 11 19:28:19 pfs1n2 kernel: [19047.483194]
LustreError:28516:0:(obd_mount.c:1738:server_fill_super()) Unable to start targets: -2
> >Dec 11 19:28:19 pfs1n2 kernel: [19047.492704] LustreError:
28516:0:(obd_config.c:610:class_cleanup()) Device 2 not setup
> >Dec 11 19:28:19 pfs1n2 kernel: [19047.501082] LustreError:
28516:0:(ldlm_request.c:1174:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling
anyway
> >Dec 11 19:28:19 pfs1n2 kernel: [19047.512672] LustreError:
28516:0:(ldlm_request.c:1801:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
> >Dec 11 19:28:19 pfs1n2 kernel: [19047.554700] Lustre: server umount
pfs1work-MDT0000 complete
> >Dec 11 19:28:19 pfs1n2 kernel: [19047.560542] LustreError:
28516:0:(obd_mount.c:2203:lustre_fill_super()) Unable to mount (-2)
> >
> >At the same time on the MGS:
> >Dec 11 19:27:47 pfs1n1 kernel: [18002.010225] LDISKFS-fs (dm-6): mounted
filesystem with ordered data mode
> >Dec 11 19:27:47 pfs1n1 kernel: [18002.029854] Lustre: MGS MGS started
> >Dec 11 19:27:47 pfs1n1 kernel: [18002.034066] Lustre: 23937:0
(ldlm_lib.c:952:target_handle_connect()) MGS: connection from
628d6315-3333-d644-d0b0-314bb162402d@0@lo t0 exp (null) cur 1386786467 last 0
> >Dec 11 19:27:47 pfs1n1 kernel: [18002.049753] Lustre:
23937:0:(ldlm_lib.c:952:target_handle_connect()) Skipped 1 previous similar message
> >Dec 11 19:27:47 pfs1n1 kernel: [18002.060070] Lustre: MGC172.26.1.1@o2ib:
Reactivating import
> >Dec 11 19:28:03 pfs1n1 kernel: [18018.052321] Lustre:
23937:0:(ldlm_lib.c:952:target_handle_connect()) MGS: connection
from2abc20b4-6abb-0604-5760-8f458c1ef6c6@172.26.23.231@o2ib t0 exp (null) cur 1386786483
last 0
> >Dec 11 19:28:18 pfs1n1 kernel: [18032.725732] Lustre: MGS: Logs for fs pfs1work
were removed by user request. All servers must be restarted in order to regenerate the
logs.
> >Dec 11 19:28:18 pfs1n1 kernel: [18032.740584] Lustre: Setting parameter
pfs1work-MDT0000.mdd.quota_type in log pfs1work-MDT0000
> >
> >Thanks,
> > Roland
> >
> >--
> >Karlsruhe Institute of Technology (KIT)
> >Steinbuch Centre for Computing (SCC)
> >
> >Roland Laifer
> >Scientific Computing und Simulation (SCS)
> >
> >Zirkel 2, Building 20.21, Room 209
> >76131 Karlsruhe, Germany
> >Phone: +49 721 608 44861
> >Fax: +49 721 32550
> >Email:roland.laifer@kit.edu
> >Web:http://www.scc.kit.edu
> >
> >KIT – University of the State of Baden-Wuerttemberg and
> >National Laboratory of the Helmholtz Association
> >_______________________________________________
> >HPDD-discuss mailing list
> >HPDD-discuss(a)lists.01.org
> >https://lists.01.org/mailman/listinfo/hpdd-discuss