Hello!
On Sep 10, 2014, at 2:17 PM, Nico Budewitz wrote:
I recently had to set up a few post-processing machines based on
Ubuntu trusty kernel 2 3.13.0-32-generic #57-Ubuntu. The kernel lustre module worked fine,
until LNET complained about:
LNetError: 2026:0:(lib-lnet.h:399:lnet_md_alloc()) LNET: out of memory at
/build/buildd/linux-3.13.0/drivers/staging/lustre/include/linux/lnet/lib-lnet.h:399 (tried
to alloc '(md)' = 4208)
Sep 10 10:41:37 vis-m2 kernel: [180229.112013] LNetError:
2026:0:(lib-lnet.h:399:lnet_md_alloc()) LNET: 274426476 total bytes allocated by lnet
Has anyone seen this error before, or a fix? Attached you will find a more detailed
kernel log.
LU-3585 indicates a similar problem, but the bug seems to be resolved by now.
All machines based on Scientific Linux 6, kernel 2.6.32 are working as expected. When
hitting the bug the throughput to the file system drops quite a lot and the log shows
corresponding errors. The problem is reproducible.
Let me know, if you have an idea.
Well, LU-3585 was the crash in that location. Since you no longer crash, that is fixed
indeed.
Can you please make sure that this patch is included in your source:
https://github.com/torvalds/linux/commit/0be19afa74b73a2132dc02b4fea0c6b5...
If not, please apply it.
Bye,
Oleg