Patrick

Here are a few images that should clarify what the blue ( writes ) and red ( reads ) lines are all about.  When plotting the millions of these for the entire iozone
run, each read and write plotted will end up on one pixel.  But now you get the aggregate view of the application's I/O pattern.  The slopes of the aggregate lines
give an indication of data delivery rate of the file system.  When reads are coming out of the system cache, the slope will be very steep, as opposed to the shallow
slope when having to go to the OST.

With this info, now go back and look at the image in the previous email with the file position activity overlayed with the OSC cached.  Now you can see how the
OSCs are responding to the iozone's writes and reads.

John








On 2/5/2015 11:27 AM, Patrick Farrell wrote:
John,

I don't have anything to add at the moment, but I am watching your explorations with interest.  Thanks for sharing this.

One question - The blue and red lines coming up the graph...  What are those?  (Particularly, the one which peaks and then heads back down?)

- Patrick

On 02/05/2015 07:46 AM, John Bauer wrote:
Richard, Patrick

The more I look at this, the more bizarre it gets.  Now when I run this iozone test, I also track cached_mb for each OSC.  This plot has the file position activity plot overlayed
with the value of cached_mb for the 16 OST's that the file was stripped across.  Things are predictable until the size of the file being written exceeds the amount of memory that
Lustre can use for caching ( during the first write).  After that, the competition for buffer memory by the OSC's starts.  Further comment on the plot image.





I still have not determined why cached_mb for any OSC never exceeds 8GB in the test cases where I stripe across 2,3,4,5,6,or 7 OSTs.  In those cases the sum of the cached_mb for
the used OSTs never exceeds the 50% of system memory limit.

John


On 2/3/2015 4:47 PM, Mohr Jr, Richard Frank (Rick Mohr) wrote:
On Feb 3, 2015, at 3:58 PM, Patrick Farrell <paf@cray.com> wrote:

Interesting that Rick's seeing 3/4 on his system.  The limit looks to be < 512MB, if I'm reading correctly.
I glanced at the Lustre source for my 2.5.3 client and found this:

pages = si.totalram - si.totalhigh;
if (pages >> (20 - PAGE_CACHE_SHIFT) < 512) {
                lru_page_max = pages / 2;
 } else {
                lru_page_max = (pages / 4) * 3;
 }

The way I am reading this is that if the system has <512MB of memory, the lru_page_max is 1/2 the system RAM.  Otherwise, it will be 3/4 of the system RAM.

--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu

_______________________________________________
HPDD-discuss mailing list
HPDD-discuss@lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss