Thank you very much for your reply -- that definitely pointed me in a
better direction.
Actually, I have my fingers crossed, but it seems to have passed my first round of stress
tests without crashing, which is exciting.
Good to hear. The more we are who use the in-kernel client, the more momentum it'll
get!
One last question -- what type of striping configuration do you normally use for your
data? I'm wondering if some of the problems I was hitting before were related to
moving away from the default striping configuration.
We use *no* striping; each file remains on a single OST (chosen by the system), whatever
its size.
Best,
Cédric
Thanks,
Dan
On 02/03/2014 04:04 AM, Cédric Dufour - Idiap Research Institute wrote:
> Hello Dan,
>
> On 02/02/14 17:25, Dan Tascione wrote:
>>
>> Hi Cédric,
>>
>>
>>
>> I had a few hopefully easy questions about your Ubuntu setup, if you have the
time to answer them.
>>
>>
>>
>> Our server side is Lustre 2.4.2 on CentOS 6.4 (installed with the Whamcloud
RPMs). These nodes all seem to be operating fine.
>>
>
> We have 2.4.2 on Ubuntu 12.04 with 2.6.32 kernel (which our partner, Q-Leap GmbH, set
up and maintains)
>
>>
>>
>> Our client side is currently Ubuntu 12.04. I've tried:
>>
>> - Compiling Lustre client from the git tree (both 2.4.2 and master)
>>
>
> Haven't even tried it (being quite certain it would fail)
>
>> - Building the 3.13 kernel from Ubuntu, with the Lustre modules enabled
>>
>>
>>
>> Unfortunately, in all my tests, the Ubuntu nodes regularly panic or just outright
freeze entirely anywhere from 2 to 24 hours of operation.
>>
>
> In order to in-kernel Lustre client to work (on kernel 3.12 for sure and also 3.13 I
think), you *must* at least add the patches addressing:
> -
https://jira.hpdd.intel.com/browse/LU-4127
> -
https://jira.hpdd.intel.com/browse/LU-4157
>
>>
>>
>> For your Ubuntu clients, are you using the 3.12.8 that comes from Ubuntu, or from
kernel.org?
>>
>
> We started with an "apt-get source" in a Ubuntu/Trusty VM at the time its
kernel was 3.12.0-7.15 (corresponding to 3.12.4 upstream).
> We then added all incremental patches from
https://www.kernel.org/ to
"rebase" that kernel to 3.12.9.
>
>>
>>
>> It looks like you are just using the Lustre version that comes with the 3.12.8
kernel, and not the version from the Lustre source tree, is that correct?
>>
>
> Yes, absolutely.
>
> The Lustre source tree still targets kernel 2.6.32 (or the like). As such, it is not
suited for recent kernels :-(
>
> We started with stock in-kernel Lustre client from Ubuntu/Trusty 3.12.0-7.15, with
patches for:
> -
https://jira.hpdd.intel.com/browse/LU-4127 (*required*)
> -
https://jira.hpdd.intel.com/browse/LU-4157 (*required*)
> -
https://jira.hpdd.intel.com/browse/LU-4231 (for NFS re-export)
> -
https://jira.hpdd.intel.com/browse/LU-4400 (for NFS re-export)
>
> BUT, as we stumbled on other minor bugs:
> -
https://jira.hpdd.intel.com/browse/LU-4209
> -
https://jira.hpdd.intel.com/browse/LU-4520
> -
https://jira.hpdd.intel.com/browse/LU-4530
>
> We decided to pull the in-kernel Lustre client from the latest-to-date kernel source;
see
https://jira.hpdd.intel.com/browse/LU-4530 for a discussion on what that might be.
> Thus, we pull the in-kernel Lustre client from:
> -
https://github.com/verygreen/linux/tree/lustre-next
> (which incorporates a few of the patches mentioned above, plus many others)
> And added the patches fro the yet-not-integrated patches:
> -
https://jira.hpdd.intel.com/browse/LU-4231
> -
https://jira.hpdd.intel.com/browse/LU-4530
> -
https://jira.hpdd.intel.com/browse/LU-4520 (<-> 4152 <-> 4398
<-> 4429); this one is still unresolved as it requires server-side patches
> - other that I thought might help our LU-4520
>
>>
>>
>> Are you clients all Infiniband, or are they Ethernet? We're using Ethernet
here for the clients, and I am wondering if that's interacting badly somehow.
>>
>
> All clients are Ethernet
>
>>
>>
>> You mentioned "3.14rc1~patched" below, but I wasn't sure what this
version number referred to?
>>
>
> At the time it was
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git,
"staging-3.14rc1" branch, but it know no longer valid. Better off from
https://github.com/verygreen/linux/tree/lustre-next
>
>>
>>
>> Thanks,
>>
>> Dan
>>
>
> Best,
>
> Cédric
>
>>
>>
>>
>>
>> *From:*HPDD-discuss [mailto:hpdd-discuss-bounces@ml01.01.org] *On Behalf Of
*Cédric Dufour - Idiap Research Institute
>> *Sent:* Friday, January 24, 2014 7:17 AM
>> *To:* Lustre (HPDD-discuss)
>> *Subject:* [HPDD-discuss] 'lustre-dkms' (skeleton) package for
Debian/Ubuntu available
>>
>>
>>
>> Hello all,
>>
>> Newly subscribed to the list, I've been going through the archives and seen
some questions about Lustre client support on recent versions of Debian/Ubuntu
distributions.
>>
>> We have addressed that issue by:
>> - building a custom kernel with Lustre client *disabled*, based on Ubuntu's
latest available kernel + latest stable patchsets, 3.12.8 for us so far (PS:
'./debian/rules editconfigs' to disable Lustre)
>> - having a separate (easily upgrade-able) 'lustre-dkms' package based on
Lustre in-kernel client code + our patches, 3.14rc1~patched for us so far
>>
>> We use that 3.12.8 kernel + lustre-dkms (3.14rc1~patched) package without any
problem on:
>> - Ubuntu/Quantal (~100 workstations and computation nodes)
>> - Debian/Wheezy with the few libc (>= 2.14) dependencies pulled from
Debian/Testing (a few servers requiring Lustre access)
>> - (hopefully Ubuntu/Trusty 14.04 in a few weeks)
>> (against a Lustre 2.6.32/2.4.2 cluster)
>>
>> I have tarball-ed the required resources at
http://www.idiap.ch/~cdufour/download/lustre-dkms.tar.bz2
<
http://www.idiap.ch/%7Ecdufour/download/lustre-dkms.tar.bz2> . It contains the
skeleton directory and HOWO.TXT file that should get going those of you who are interested
to follow the same path.
>>
>> Hope it helps.
>>
>> Best regard,
>>
>> Cédric
>>
>> --
>>
>> *Cédric Dufour @ Idiap Research Institute*
>>
>