I misunderstood; I think I see what you mean now: create an empty MGS and
let the MGD/OSSes reconnect from scratch rather than try to reproduce my
existing MGS.
Ok, gotta try that and see.
On Thu, Feb 6, 2014 at 12:25 AM, Patrick Farrell <paf(a)cray.com> wrote:
Hmm. When you say with just "CONFIGS/testfs-* stuff",
what do you mean?
I was suggesting you just a new MGT with the method in the manual and then
try to start the file system with that - No attempts to save any settings
or move anything to the new MGT.
Without digging in, the next thing I'd suggest is a write conf on all the
volumes of the FS and try starting again with the new MGT.
- Patrick
On 02/05/2014 09:30 AM, Anthony Alba wrote:
Patrick,
Thanks for the suggestion but no luck.
I recreated a clean MGT with just CONFIGS/testfs-* stuff.
When I bring up the MDS, I encounter the same issue. I get 8 OBDs UP and
I'm missing the 'UP lwp ' dev 9.
(On a newly created filesystem with only MDS and MGS I see I should have 9
OBDS in lctl dl).
My mgc_process_recover_log() returning -EINVAL may be a red herring; I
see that it is just a warning.
Ultimately mgc_process_log() returns 0. At this stage, I expect
lustre_simple_start() or something equivalent to bring up the lwp, but in
this case nothing further happens after mgc_copy_llog() returns 0.
- Richard
On Wed, Feb 5, 2014 at 10:52 PM, Patrick Farrell <paf(a)cray.com> wrote:
> Anthony,
>
> If you have another volume available and don't mind losing any settings
> stored on the MGS/MGT (No data - just anything you set as a conf_param),
> you could try formatting a different, new, volume as the MGS/MGT. Use the
> usual process from the manual, then try to start as normal.
>
> (Note - The manual doesn't really distinguish between the management
> server (MGS) and management target (MGT).)
>
> I suggest trying a new MGT because it seems likely your copy of the
> MGS/MGT didn't get some things, and often, replacing the MGT is fairly
> painless. And if it's missing something important, as long as you used a
> new volume, you're just back where you were before.
>
> About the file referenced - I'm not really familiar with the on-disk
> contents of the MGT, but you should be able to see that if you mount the
> MGT as ldiskfs. The thing that's worrying is that if you're missing that,
> what else are you missing/didn't get copied, which is why I suggested
> trying a new MGS/MGT if that's an option.
>
> - Patrick
>
> From: Anthony Alba <ascanio.alba7(a)gmail.com>
> Date: Wednesday, February 5, 2014 at 3:03 AM
> To: "Dilger, Andreas" <andreas.dilger(a)intel.com>
> Cc: "hpdd-discuss(a)lists.01.org" <hpdd-discuss(a)lists.01.org>
> Subject: Re: [HPDD-discuss] Split MDS/MGS - process recover log
> testfs-mtdir error -22
>
> Further debugging through the logs
>
> Comparing the start up of this MDS/MGS pair with a know good system,
>
> On the known good system:
>
> mgc_copy_llog()
> lustre_start_simple() starting obd goodfs-MDT0000-lwp-MDT0000 (type=lwp)
> ..etc etc
>
> On the bad system:
> mgc_copy_llog()
> /* no lwp stuff at all */
>
>
> For some reason, my MDS/MGS pair are not invoking the lwp OBD.
>