Hi Wenhua,
On 5/20/20, 6:14 PM, "Wenhua Liu" <liuw(a)vmware.com> wrote:
Hi,
I see in the current implementation, for NVMe/TCP target, both controller MDTS and
NVMe/TCP have the same value.
spdk_nvmf_ctrlr_identify_ctrlr:
cdata->mdts = spdk_u32log2(transport->opts.max_io_size / 4096);
nvmf_tcp_icreq_handle:
ic_resp->maxh2cdata = ttransport->transport.opts.max_io_size;
On host side, upper layer (which is NVMe/TCP independent) uses MDTS to determine the
maximum IO size. But with the current implementation, there can only be H2CData PDU
transferred, the R2T mechanism is not fully utilized. Because of this, if the max_io_size
is small, the upper layer has to split large IO coming from application to multiple
smaller chunks and send to NVMe/TCP initiator driver. If the max_io_size is large, it may
take long time for one PDU to be transferred (I’m not expertized on Network and don’t know
what problem can happen).
[Jim] I don't really think splitting the large I/O is a problem. Both the Linux
kernel and SPDK drivers automatically do the splitting on behalf of the upper layer or
application. The overhead of the splitting is also minimal, since we are only doing one
split per 64KiB (or more).
[Jim] Regarding the PDU size - splitting into payload into multiple PDUs is possible but
we haven't seen a need to implement that yet with SPDK. One PDU may take a long time
to be transferred, but no longer than if it is broken into multiple PDUs. In either case,
the PDU will be split into multiple network packets based on MTU.
I found this when I ran fio with blocksize setting to 512KB. I wanted to test a fix I
made for MAXH2CDATA impacts the IO behavior but only saw one H2CData PDU for any IOs.
In my understanding, the MDTS and the MAXH2CDATA don’t have to be same, making them
initialized from different config parameters will give us more flexibility.
[Jim] It may give more flexibility, but it would also make the code more complex. But if
you can share performance data that shows a clear advantage to making those values
different, it could justify that extra complexity.
Regards,
Jim
Show replies by date