Hi,
no, it should not lock and rather should be sending object destroys to OSTs "in
background".
it's not very efficient now, but we plan to improve this using batching.
thanks, Alex
On Aug 9, 2013, at 1:05 PM, Daire Byrne <daire(a)dneg.com> wrote:
Hi,
I have noticed when watching the metadata performance on a v2.4 MDS (llstat -i1 mdt) the
rates drop to zero for seconds at a time sporadically. I never saw this on a v1.8.x server
with comparable workloads. Looking at the debug logs when this happens it looks like it is
stuck doing this:
00000004:00080000:4.0:1375970781.689383:0:4870:0:(osp_sync.c:317:osp_sync_request_commit_cb())
commit req ffff8821062c4000, transno 8617232460
00000004:00080000:4.0:1375970781.689384:0:4870:0:(osp_sync.c:317:osp_sync_request_commit_cb())
commit req ffff882d394d4c00, transno 8617232461
00000004:00080000:4.0:1375970781.689384:0:4870:0:(osp_sync.c:317:osp_sync_request_commit_cb())
commit req ffff883e03485800, transno 8617232462
00000004:00080000:4.0:1375970781.689385:0:4870:0:(osp_sync.c:317:osp_sync_request_commit_cb())
commit req ffff883479fcdc00, transno 8617232463
00000004:00080000:4.0:1375970781.689386:0:4870:0:(osp_sync.c:317:osp_sync_request_commit_cb())
commit req ffff883515c1f400, transno 8617232464
00000004:00080000:4.0:1375970781.689387:0:4870:0:(osp_sync.c:317:osp_sync_request_commit_cb())
commit req ffff883d76783000, transno 8617232465
I'm assuming this code was added in v2.x? Is it the expected behaviour that the rate
of operations would "lock" until the syncs are completed?
/proc/fs/lustre/mds/MDS/mdt/stats @ 1375970781.677198
Name Cur.Count Cur.Rate #Events Unit last min
avg max stddev
req_waittime 0 0 803871957 [usec] 0 2
6.63 10477 8.15
req_qdepth 0 0 803871957 [reqs] 0 0
0.00 12 0.07
req_active 0 0 803871957 [reqs] 0 1
2.67 16 1.79
req_timeout 0 0 803871957 [sec] 0 1
15.50 37 14.29
reqbuf_avail 0 0 1709293570[bufs] 0 47
63.79 64 0.59
ldlm_ibits_enqueue 0 0 499159709 [reqs] 0 1
1.00 1 0.00
mds_getattr 0 0 3152776 [usec] 0 8
346.15 1453985 6393.00
mds_getattr_lock 0 0 158194 [usec] 0 9
83.28 381776 1418.81
mds_connect 0 0 76 [usec] 0 14
3477.13 147911 21249.32
mds_disconnect 0 0 19 [usec] 0 26
1178.21 15598 3757.50
mds_getstatus 0 0 4 [usec] 0 9
13.75 16 3.20
mds_statfs 0 0 2904 [usec] 0 5
19.87 2115 67.53
mds_sync 0 0 3 [usec] 0 116
10933.67 32562 18730.69
mds_getxattr 0 0 50841 [usec] 0 6
12.89 92 3.91
obd_ping 0 0 530449 [usec] 0 3
11.87 3851 8.76
I am investigating why this workload seems to run much slower on v2.4 than v1.8.9. The
workload is extremely hard link/unlink heavy as whole server filesystems are being backed
up using "rsync --link-dest". Perhaps the unlink performance is slower as the
MDS does it instead of the clients?
Regards,
Daire
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss(a)lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss
--------------------------------------------------------------------
Closed Joint Stock Company Intel A/O
Registered legal address: Krylatsky Hills Business Park,
17 Krylatskaya Str., Bldg 4, Moscow 121614,
Russian Federation
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.