You're cutting off all of the useful values from these stats. All this
tells us is that you sent 239189 RPCs from this client, but you chopped
off the actual stats that report how long the RPCs took (in microseconds:
min, max, sum, sum_squared for stddev). You can use something like:
llstat -i 10 fsname-OST0000-*
to get a nicely formatted output (condensed to 80 columns for this email):
/proc/fs/lustre/osc/myth-OST0000-osc-ffff880079252800/stats @
1418771188.848400
Name Count Rate Events Unit last min avg max
stddev
req_waittime 0 0 38104 [usec] 0 511 29747 859246
38441.08
req_active 0 0 38104 [reqs] 0 1 1.15 17
0.64
read_bytes 0 0 10967 [bytes] 0 0 644180 1048576
470678.11
write_bytes 0 0 8831 [bytes] 0 188 314025 1048576
154912.70
ost_read 0 0 10967 [usec] 0 1311 43729 859246
55680.31
ost_write 0 0 8831 [usec] 0 4003 21733 578621
22564.67
ost_connect 0 0 1 [usec] 0 1581 1581 1581
0.00
ost_punch 0 0 70 [usec] 0 981 10499 51744
10787.99
ost_statfs 0 0 8060 [usec] 0 511 2217 110251
2467.72
ost_sync 0 0 8240 [usec] 0 660 51698 265859
20041.93
ldlm_cancel 0 0 638 [usec] 0 688 16629 202345
25984.31
obd_ping 0 0 444 [usec] 0 840 3848 52907
2766.05
These are aggregate stats since the client/server was first mounted, or
last cleared (by writing into the /proc file or using "lctl set_param
NNNN=clear").
This will at least give you some more useful info about the stats to start
looking at what is going slowly.
Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division
On 2014/12/16, 3:51 PM, "Kumar, Amit" <ahkumar(a)mail.smu.edu> wrote:
Dear All,
I am trying figure out what is causing long req_waittime for our OST¹s.
This looks really bad. Any tips on digging into this is greatly
appreciated.
I am checking the health of OST and all look okay. I will collect RPC
stats and see if I see something there.
Here is a sample output from client
req_waittime 239189
req_active 239189
ost_connect 1
ost_statfs 254
ldlm_cancel 13
obd_ping 238907
sample from one OST:
snapshot_time 1418769865.299088
req_waittime 1142788387
req_qdepth 1142788387
req_active 1142788387
req_timeout 1142788387
reqbuf_avail 2389588796
ldlm_glimpse_enqueue 7125998
ldlm_extent_enqueue 86631679
ost_setattr 136096
ost_create 33258
ost_destroy 757369
ost_connect 13145
ost_disconnect 4302
obd_ping 1048086540
are these cumulative stats, don¹t believe so???
Best Regards,
Amit