Hmm. When you say with just "CONFIGS/testfs-* stuff", what do you mean?
I was suggesting you just a new MGT with the method in the manual and
then try to start the file system with that - No attempts to save any
settings or move anything to the new MGT.
Without digging in, the next thing I'd suggest is a write conf on all
the volumes of the FS and try starting again with the new MGT.
- Patrick
On 02/05/2014 09:30 AM, Anthony Alba wrote:
Patrick,
Thanks for the suggestion but no luck.
I recreated a clean MGT with just CONFIGS/testfs-* stuff.
When I bring up the MDS, I encounter the same issue. I get 8 OBDs UP
and I'm missing the 'UP lwp ' dev 9.
(On a newly created filesystem with only MDS and MGS I see I should
have 9 OBDS in lctl dl).
My mgc_process_recover_log() returning -EINVAL may be a red herring;
I see that it is just a warning.
Ultimately mgc_process_log() returns 0. At this stage, I expect
lustre_simple_start() or something equivalent to bring up the lwp, but
in this case nothing further happens after mgc_copy_llog() returns 0.
- Richard
On Wed, Feb 5, 2014 at 10:52 PM, Patrick Farrell <paf(a)cray.com
<mailto:paf@cray.com>> wrote:
Anthony,
If you have another volume available and don't mind losing any
settings stored on the MGS/MGT (No data -- just anything you set
as a conf_param), you could try formatting a different, new,
volume as the MGS/MGT. Use the usual process from the manual,
then try to start as normal.
(Note -- The manual doesn't really distinguish between the
management server (MGS) and management target (MGT).)
I suggest trying a new MGT because it seems likely your copy of
the MGS/MGT didn't get some things, and often, replacing the MGT
is fairly painless. And if it's missing something important, as
long as you used a new volume, you're just back where you were before.
About the file referenced -- I'm not really familiar with the
on-disk contents of the MGT, but you should be able to see that if
you mount the MGT as ldiskfs. The thing that's worrying is that
if you're missing that, what else are you missing/didn't get
copied, which is why I suggested trying a new MGS/MGT if that's an
option.
- Patrick
From: Anthony Alba <ascanio.alba7(a)gmail.com
<mailto:ascanio.alba7@gmail.com>>
Date: Wednesday, February 5, 2014 at 3:03 AM
To: "Dilger, Andreas" <andreas.dilger(a)intel.com
<mailto:andreas.dilger@intel.com>>
Cc: "hpdd-discuss(a)lists.01.org <mailto:hpdd-discuss@lists.01.org>"
<hpdd-discuss(a)lists.01.org <mailto:hpdd-discuss@lists.01.org>>
Subject: Re: [HPDD-discuss] Split MDS/MGS - process recover log
testfs-mtdir error -22
Further debugging through the logs
Comparing the start up of this MDS/MGS pair with a know good system,
On the known good system:
mgc_copy_llog()
lustre_start_simple() starting obd goodfs-MDT0000-lwp-MDT0000
(type=lwp)
..etc etc
On the bad system:
mgc_copy_llog()
/* no lwp stuff at all */
For some reason, my MDS/MGS pair are not invoking the lwp OBD.