I have been observing what I would think is unexpected behavior. I will
try to keep this short, and start with the question.
Should it be expected, when sequentially reading a striped file multiple
times, that the data from some OST's remains in the system cache
while others does not?
File is 80GB is size.
System has 64GB of memory.
File is striped 16 way, 1MB stripe size.
Application is iozone.
File is written forwards twice, then read forwards twice, then read
backwards twice.
Application request size is 1MB.
Run on the swan cluster at Cray, Inc.
lustre-cray_ari_s/2.5_3.0.101_0.31.1_1.0502.8394.10.1-1.0502.17198.8.51
The file is large enough to oversubscribe the system's memory. I would
expect that each OST would see uniform activity.
But that is far from the case. Here is the amount of data read by each
OST during the entire iozone job, ranges from 10G to 17G.
When I look at how much data the OST's have read versus time, some have
no activity during the entire 2nd backwards read.
The OST's that have the low amount of data read also have very high
application data delivery rates during these same periods, indicating
the data is in the system cache.
Is this to be expected?
Thanks
John