HI,
I recently had to set up a few post-processing machines based on Ubuntu trusty kernel 2
3.13.0-32-generic #57-Ubuntu. The kernel lustre module worked fine, until LNET complained
about:
LNetError: 2026:0:(lib-lnet.h:399:lnet_md_alloc()) LNET: out of memory at
/build/buildd/linux-3.13.0/drivers/staging/lustre/include/linux/lnet/lib-lnet.h:399 (tried
to alloc '(md)' = 4208)
Sep 10 10:41:37 vis-m2 kernel: [180229.112013] LNetError:
2026:0:(lib-lnet.h:399:lnet_md_alloc()) LNET: 274426476 total bytes allocated by lnet
Has anyone seen this error before, or a fix? Attached you will find a more detailed kernel
log.
LU-3585 indicates a similar problem, but the bug seems to be resolved by now.
All machines based on Scientific Linux 6, kernel 2.6.32 are working as expected. When
hitting the bug the throughput to the file system drops quite a lot and the log shows
corresponding errors. The problem is reproducible.
Let me know, if you have an idea.
Mvh,
Nico