The limitation is actually on the MDS, since it cannot handle more than a single
filesystem-modifying RPC at one time. There is only one slot in the MDT last_rcvd file for
each client to save the state for the reply in case it is lost.
In the past this was not a serious limitation, since there were only a few cores on the
client, and enough clients to saturate the MDS. With more cores on the client and faster
MDSes this problem is more evident.
There been some discussions to implement multi-slot last rcvd so that clients can have
multiple RPCs in flight at once, but there is no target version for implementing this
feature yet.
Cheers, Andreas
On Jun 16, 2014, at 9:46, "Grégoire Pichon"
<gregoire.pichon@bull.net<mailto:gregoire.pichon@bull.net>> wrote:
Hi all,
I have a question related to single client metadata operations.
While running mdtest benchmark, I have observed that file creation and unlink operations
from a single Lustre client quickly saturates to around 8000 iops: maximum is reached as
soon as with 4 tasks in parallel.
When using several Lustre mount points on a single client node, the file creation and
unlink rate do scale with the number of tasks, up to the 16 cores of my client node.
Looking at the code, it appears that most metadata operations are serialized by a mutex in
the MDC layer.
In mdc_reint() routine, request posting is protected by mdc_get_rpc_lock() and
mdc_put_rpc_lock(), where the lock is :
struct client_obd -> struct mdc_rpc_lock *cl_rpc_lock -> struct mutex rpcl_mutex.
What is the reason for this serialization ?
Is it a current limitation of MDC layer design ? If yes, are there plans to improve this
behavior ?
Thanks,
Grégoire Pichon.
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss@lists.01.org<mailto:HPDD-discuss@lists.01.org>
https://lists.01.org/mailman/listinfo/hpdd-discuss