The question by us was whether to track your branch and merge it every time
we compile a kernel.
better idea.
I assume your branch did not make the 3.19 merge window, what are the
prospects for 3.20?
Thanks,
Eli
On Tue, Feb 3, 2015 at 6:59 PM, Drokin, Oleg <oleg.drokin(a)intel.com> wrote:
It depends on what your end goal is I guess.
My branch is probably the only upstream kernel branch that gets any lustre
specific testing (parts of our regression test suite, by no means full at
the moment),
so it means Lustre there actually works to a significant degree, but this
is about the only thing I test.
On the other hand it's based on top of staging tree that might contain
patches that broke other stuff for example (though hopefully not).
Once my current batch gets merged there there would be no difference
between using staging tree directly until such a time I accumulate some
more fixes.
Linus tree will get all the same patches much later on when the next merge
window opens and Greg KH pushes his stuff to Linus.
So in the end if you want the most stable Linus tree, it might even make
sense to get his tree and just pick the fixes for bugs you hit and then
hopefully when
it comes time to upgrade to the next release, all the necessary fixes are
in. Otherwise you can run on my tree or at the times when I do not have any
extra fixes
in my tree - on some snapshot of the staging tree.
On Feb 3, 2015, at 10:48 AM, E.S. Rosenberg wrote:
> Hi Oleg,
> Is your branch "the" lustre kernel branch to follow, or should we be
tracking a different git branch?
> Thanks,
> Eli
>
> On Mon, Feb 2, 2015 at 8:17 PM, Drokin, Oleg <oleg.drokin(a)intel.com>
wrote:
> Hello!
>
> This is a bug in our code.
> I just pushed a bunch of patches into upstream kernel including a fix
for this one:
>
https://github.com/verygreen/linux/commit/eadee26a9862f9be67134bc662c750e...
>
> There are other useful patches in that tree that you might want to get
while you are at it.
>
https://github.com/verygreen/linux/commits/lustre-next
>
> Bye,
> Oleg
> On Feb 2, 2015, at 12:31 PM, E.S. Rosenberg wrote:
>
> > Hi all,
> >
> > As of late I am experiencing kernel oops a lot, it seems to happen
when users try to list files on lustre.
> >
> > We recently updated our version of Debian, however the problem seems
to only occur on the submit node of the cluster to which the users have cli
access, other nodes aren't experiencing problems.
> >
> >
> > Further down is a trace of one such event, I am not really 100% sure
how to troubleshoot from here.
> >
> > Technical details:
> > OS: Debian testing/sid mix, 64 bit
> > Kernel: 3.17.1 (
kernel.org locally compiled)
> > lustre version: 2.5.3
> >
> > What we did so far:
> > - I switched between two nodes (changed dns) to see if it was a hw
problem, the problem migrated with the submit hosts' location
> >
> > I am considering recompiling the client, since it was compiled under
our previous debian freeze (as far as I remember), however the problem
seems to be in the kernel which is independent of our Debian freezes....
> >
> > Thanks,
> > Eli
> >
> >
> >
> > Feb 2 15:43:04 hm-01 kernel: BUG: unable to handle kernel paging
request at 0000003212ced00a
> > Feb 2 15:43:04 hm-01 kernel: IP: [<ffffffffa182467e>]
ll_get_dir_page+0x7a7/0xf21 [lustre]
> > Feb 2 15:43:04 hm-01 kernel: PGD f1c419067 PUD 0
> > Feb 2 15:43:04 hm-01 kernel: Oops: 0000 [#1] SMP
> > Feb 2 15:43:04 hm-01 kernel: Modules linked in: lmv(C) fld(C) mgc(C)
lustre(C) mdc(C) fid(C) lov(C) osc(C) ptlrpc(C) obdclass(C) lvfs(C)
binfmt_misc ko2iblnd(C) evdev joydev lnet(C) sha512_generic intel_rapl
x86_pkg_temp_thermal coretemp sha256_generic kvm sb_edac ipmi_si processor
ipmi_msghandler dcdbas edac_core crc32_pclmul pcspkr sg wmi button
sha1_ssse3 sha1_generic crc32 mei_me mei libcfs(C) fuse parport_pc lp
parport dm_crypt dm_mod autofs4
> > Feb 2 15:43:04 hm-01 kernel: CPU: 2 PID: 2883 Comm: csh Tainted: G
C 3.17.1-aufs-2 #1
> > Feb 2 15:43:04 hm-01 kernel: Hardware name: Dell Inc. PowerEdge
C6220/0TTH1R, BIOS 1.2.1 05/27/2013
> > Feb 2 15:43:04 hm-01 kernel: task: ffff880857a00000 ti:
ffff880f1feec000 task.ti: ffff880f1feec000
> > Feb 2 15:43:04 hm-01 kernel: RIP: 0010:[<ffffffffa182467e>]
[<ffffffffa182467e>] ll_get_dir_page+0x7a7/0xf21 [lustre]
> > Feb 2 15:43:04 hm-01 kernel: RSP: 0018:ffff880f1feefcf8 EFLAGS:
00010002
> > Feb 2 15:43:04 hm-01 kernel: RAX: 0000000000000001 RBX:
ffff88083e313bc8 RCX: 0000000000000000
> > Feb 2 15:43:04 hm-01 kernel: RDX: 0000003212ced00a RSI:
ffff880f1feefcb0 RDI: ffff88087f0dd6c8
> > Feb 2 15:43:04 hm-01 kernel: RBP: ffff880f1feefdf0 R08:
0000000000000000 R09: 000000000000003c
> > Feb 2 15:43:04 hm-01 kernel: R10: 0000000000000000 R11:
ffff8806a4733b48 R12: fffffffffffffffe
> > Feb 2 15:43:04 hm-01 kernel: R13: 0000000000000000 R14:
0000003212ced00a R15: ffff88083e313d10
> > Feb 2 15:43:04 hm-01 kernel: FS: 00007f3093d57700(0000)
GS:ffff88087fa40000(0000) knlGS:0000000000000000
> > Feb 2 15:43:04 hm-01 kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
> > Feb 2 15:43:04 hm-01 kernel: CR2: 0000003212ced00a CR3:
000000104f23c000 CR4: 00000000000407e0
> > Feb 2 15:43:04 hm-01 kernel: Stack:
> > Feb 2 15:43:04 hm-01 kernel: 0000000000018800 ffff88083e313d10
fffffffffffffffe ffff88083e313e40
> > Feb 2 15:43:04 hm-01 kernel: 6dbf322dc805b051 0000000000000000
ffff88083e313bc8 0000000000000000
> > Feb 2 15:43:04 hm-01 kernel: 0000000000000000 ffff880f1feefe30
ffff880f1feeff20 0000000000000002
> > Feb 2 15:43:04 hm-01 kernel: Call Trace:
> > Feb 2 15:43:04 hm-01 kernel: [<ffffffffa1824e80>]
ll_dir_read+0x88/0x2a5 [lustre]
> > Feb 2 15:43:04 hm-01 kernel: [<ffffffffa1825160>]
ll_readdir+0xc3/0x1dd [lustre]
> > Feb 2 15:43:04 hm-01 kernel: [<ffffffff81105cdd>]
iterate_dir+0x86/0x10a
> > Feb 2 15:43:04 hm-01 kernel: [<ffffffff811060ec>]
SyS_getdents+0x86/0xdb
> > Feb 2 15:43:04 hm-01 kernel: [<ffffffff81105e28>] ?
fillonedir+0xc7/0xc7
> > Feb 2 15:43:04 hm-01 kernel: [<ffffffff81816eed>]
system_call_fastpath+0x1a/0x1f
> > Feb 2 15:43:04 hm-01 kernel: [<ffffffff81816eed>] ?
system_call_fastpath+0x1a/0x1f
> > Feb 2 15:43:04 hm-01 kernel: Code: ff ff e8 5a 22 ff df 48 8b 95 18
ff ff ff 49 8d 7f 08 4c 89 f6 b9 01 00 00 00 e8 40 79 af df 85 c0 0f 8e 0d
01 00 00 4c 8b 75 90 <49> 8b 06 f6 c4 80 75 07 f0 41 ff 46 1c eb 0c 4c 89
f7 e8 9a 57
> > Feb 2 15:43:04 hm-01 kernel: RIP [<ffffffffa182467e>]
ll_get_dir_page+0x7a7/0xf21 [lustre]
> > Feb 2 15:43:04 hm-01 kernel: RSP <ffff880f1feefcf8>
> > Feb 2 15:43:04 hm-01 kernel: CR2: 0000003212ced00a
> > Feb 2 15:43:04 hm-01 kernel: ---[ end trace 20a12192acce9089 ]---
> >
> > _______________________________________________
> > HPDD-discuss mailing list
> > HPDD-discuss(a)lists.01.org
> >
https://lists.01.org/mailman/listinfo/hpdd-discuss
>
>