On Fri, May 01, 2015 at 08:18:56PM +0000, Simmons, James A. wrote:
>> >From: Julia Lawall <Julia.Lawall(a)lip6.fr>
>> >
>> >Replace OBD_ALLOC, OBD_ALLOC_WAIT, OBD_ALLOC_PTR, and OBD_ALLOC_PTR_WAIT by
>> >kalloc/kcalloc, and OBD_FREE and OBD_FREE_PTR by kfree.
>>
>> Nak: James Simmons <jsimmons(a)infradead.org>
>>
>> A simple replace will not work. The OBD_ALLOC and OBD_FREE functions allocate
memory
>> anywhere from one page to 4MB in size. You can't use kmalloc for the 4MB
allocations.
>> Currently lustre uses a 4 page water mark to determine if we allocate using
vmalloc. Even
>> using kmalloc for 4 pages has shown high failure rates on some systems. It gets
even more
>> messy with 64K page systems like ppc64 boxes. Now I'm not suggesting to port
the larger
>> allocations to vmalloc either since issues have been founded with using vmalloc.
For example
>> when using large stripe count files the MDS rpc generated crosses the 4 page
line and vmalloc
>> is used. Using vmalloc caused a global spinlock to be taken which causes meta
data operations
>> to serialized on the MDS servers.
>
>It's not the LARGE functions that do the switching? For example OBD_ALLOC
>ends up at __OBD_MALLOC_VERBOSE, which as far as I can see calls kmalloc
>(with __GFP_ZERO, and hance the use of kzalloc).
Yes the LARGE functions do the switching. I was expecting also patches to remove the
OBD_ALLOC_LARGE functions as well which is not the case here. I do have one question
still. The
macro __OBD_MALLOC_VERBOSE allowed the ability to simulate memory allocation failures at
a certain percentage rate. Does something exist in the kernel to duplicate that
functionality?
Yes, no need for lustre to duplicate yet-another-thing the kernel
already provides :)