Patrick, 

Thanks for the suggestion but no luck.

I recreated a clean MGT with just CONFIGS/testfs-* stuff.

When I bring up the MDS, I encounter the same issue. I get 8 OBDs UP and I'm missing the 'UP lwp  ' dev 9. 
(On a newly created filesystem with only MDS and MGS I see I should have 9 OBDS in lctl dl).

My mgc_process_recover_log() returning -EINVAL may be a red herring;  I see that it is just a warning.
Ultimately mgc_process_log() returns 0. At this stage, I expect lustre_simple_start() or something equivalent to bring up the lwp, but in this case nothing further happens after mgc_copy_llog() returns 0.



- Richard












On Wed, Feb 5, 2014 at 10:52 PM, Patrick Farrell <paf@cray.com> wrote:
Anthony,

If you have another volume available and don’t mind losing any settings stored on the MGS/MGT (No data – just anything you set as a conf_param), you could try formatting a different, new, volume as the MGS/MGT.  Use the usual process from the manual, then try to start as normal.

(Note – The manual doesn’t really distinguish between the management server (MGS) and management target (MGT).)

I suggest trying a new MGT because it seems likely your copy of the MGS/MGT didn’t get some things, and often, replacing the MGT is fairly painless.  And if it’s missing something important, as long as you used a new volume, you’re just back where you were before.

About the file referenced – I’m not really familiar with the on-disk contents of the MGT, but you should be able to see that if you mount the MGT as ldiskfs.  The thing that’s worrying is that if you’re missing that, what else are you missing/didn’t get copied, which is why I suggested trying a new MGS/MGT if that’s an option.

- Patrick

From: Anthony Alba <ascanio.alba7@gmail.com>
Date: Wednesday, February 5, 2014 at 3:03 AM
To: "Dilger, Andreas" <andreas.dilger@intel.com>
Cc: "hpdd-discuss@lists.01.org" <hpdd-discuss@lists.01.org>
Subject: Re: [HPDD-discuss] Split MDS/MGS - process recover log testfs-mtdir error -22

Further debugging through the logs

Comparing the start up of this MDS/MGS pair  with a know good system,

On the known good system:

mgc_copy_llog()  
lustre_start_simple() starting obd goodfs-MDT0000-lwp-MDT0000 (type=lwp)
..etc etc

On the bad system:
mgc_copy_llog()
/* no lwp stuff at all */


For some reason, my MDS/MGS pair are not invoking the lwp OBD.