Thanks, Andreas.
I've modified my iozone testing for now to just "-i0", i.e.,
write/rewrite. Here's the command:
iozone -e -M -m -r 1m -s 1g -0 -+n -+A 2 -+u -C -t -P 0 -+d -F
$TESTDIR/iozone-file_1GB.a_001.$$ 1>$LOG.10gb.write_00 &
where $TESTDIR points to a directory in my Lustre FS whose striping config
are (the files are created new on each test run so they inherit the
striping config from the directory):
lfs getstripe iozone-file_1GB.a_005.28844
iozone-file_1GB.a_005.28844
lmm_stripe_count: 2
lmm_stripe_size: 1048576
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 1
obdidx objid objid group
1 617201 0x96af1 0
0 616584 0x96888 0
The file size isn't quite the 1GB I was looking for either:
-rw-r-----. 1 root root 471859200 Oct 11 01:33 iozone-file_1GB.a_005.28844
Only reason we're running pre-release stuff is that's what we downloaded.
We could upgrade but it didn't seem necessary to chase the releases. Of
course, if there are bugs hampering our efforts then of course we'd
upgrade. Just hasn't seem to be necessary yet.
I ran some more experiments after I posted my question. When I saw that
osc_cached_mb's associated busy_cnt was also pegged at some high number, I
also noticed that max_dirty_mb was set to the 32MB default (as are most/all
of our config). Thinking our write rate far exceeded the ability of 32MB
to efficiently buffer the async writes,
When I bumped osc_cached_mb up to 2GB(!) I no longer saw busy_cnt pegged
either. However, the test's throughput was still in the 50MB/sec range.
Then I noticed these in the single client's /var/log/messages:
kernel: LustreError: 28855:0:(osc_request.c:854:osc_announce_cached())
dirty 0 - dirty_max 2147483648
too big???
Oct 11 01:21:07 hadoop23 kernel: LustreError:
28855:0:(osc_request.c:854:osc_announce_cached()) Skipped 21370 previous
similar messages.
I see that osc_announce_cached logs this message when there are too many
dirty pages. Effectively, the underlying issue is still there: Either the
OSS is out to lunch, not committing these writes back to the client, or
they aren't being drained by the OSS.
Saw this on the associated OSS:
kernel: LustreError: 20316:0:(ofd_grant.c:607:ofd_grant())
lustrewt-OST0000: client f4cc2e8f-022a-a927-e9d
1-b8540d7ad1a9/ffff880464abf800 requesting > 2GB grant 2147483648.
I don't know why the grant is so big!
Is there a knob on the OSS that's equivalent to the client's
osc_cached_mb? That is, is there a knob that defines the size of the OSS's
buffer for receiving the client's traffic?
Thanks,
Michael
On Fri, Oct 11, 2013 at 2:00 AM, Dilger, Andreas
<andreas.dilger(a)intel.com>wrote:
On 2013-10-10, at 12:16, "Michael Bloom"
<michael.bloom(a)trd2inc.com>
wrote:
> I'm looking for some help to understand why write throughput in my
Lustre IB cluster is only about 50-80 MB/sec, while read performance is 5-6
GB/sec.
We've gotten multi-GB/s with 2.4, so 50 MB/s is definitely not expected.
It isn't really possible to make any sensible guesses about your
performance problem without knowing what kind of writes you are doing.
It is also possible that your underlying storage is having problems. Did
you try mounting it locally and running iozone directly?
> I'm running 2.4.1-RC2-PRISTINE on my MDT and 2 OSS's. Also using 2.4.92
on my 16 clients, 3 of which run iozone write throughput tests.
Is there any reason to be running the pre-release code on the clients?
Testing is great, and the 2.5 client should have some improved performance
over 2.4, but that isn't necessarily production ready yet.
Cheers, Andreas
> I noticed a few threads in the 2.4.1 RC1 and RC2 timeframe discussing
low write performance. I noticed that curr_dirty_bytes start off at 0 at
the start of the test as one would expect. As the test proceeds, one OSS's
curr_dirty_bytes stays pegged at some huge number, implying it didn't see a
commit. The other OSS's curr_dirty_bytes varies during the test as iozone
writes data that gets committed. What can I look at to see why the commit
isn't happening?
>
> Thanks in advance,
> Michael
> _______________________________________________
> HPDD-discuss mailing list
> HPDD-discuss(a)lists.01.org
>
https://lists.01.org/mailman/listinfo/hpdd-discuss