John,
I’m afraid that I am at a loss as well. I am not well enough versed with the Lustre
client caching code to be able to explain what is happening. I did a little digging in
the source code and was not able to identify where the 8GB limit was coming from either.
The only other thing I can think of to look at would be to map each OSC to its
corresponding OSS server to see if there is any correlation between the OSC’s with low
cache and a particular OSS server.
--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu
On Feb 5, 2015, at 8:46 AM, John Bauer <bauerj(a)iodoctors.com>
wrote:
Richard, Patrick
The more I look at this, the more bizarre it gets. Now when I run this iozone test, I
also track cached_mb for each OSC. This plot has the file position activity plot
overlayed
with the value of cached_mb for the 16 OST's that the file was stripped across.
Things are predictable until the size of the file being written exceeds the amount of
memory that
Lustre can use for caching ( during the first write). After that, the competition for
buffer memory by the OSC's starts. Further comment on the plot image.
<cached.png>
I still have not determined why cached_mb for any OSC never exceeds 8GB in the test cases
where I stripe across 2,3,4,5,6,or 7 OSTs. In those cases the sum of the cached_mb for
the used OSTs never exceeds the 50% of system memory limit.
John