I have been observing what I would think is unexpected behavior. I
will try to keep this short, and start with the question.
Should it be expected, when sequentially reading a striped file
multiple times, that the data from some OST's remains in the system
cache
while others does not?
File is 80GB is size.
System has 64GB of memory.
File is striped 16 way, 1MB stripe size.
Application is iozone.
File is written forwards twice, then read forwards twice, then read
backwards twice.
Application request size is 1MB.
Run on the swan cluster at Cray, Inc. lustre-cray_ari_s/2.5_3.0.101_0.31.1_1.0502.8394.10.1-1.0502.17198.8.51
The file is large enough to oversubscribe the system's memory. I
would expect that each OST would see uniform activity.
But that is far from the case. Here is the amount of data read by
each OST during the entire iozone job, ranges from 10G to 17G.

When I look at how much data the OST's have read versus time, some
have no activity during the entire 2nd backwards read.
The OST's that have the low amount of data read also have very high
application data delivery rates during these same periods,
indicating the data is in the system cache.
Is this to be expected?
Thanks
John