[SPDK] SPDK performance questions

Wodkowski, PawelX pawelx.wodkowski at intel.com
Wed Jun 7 06:22:11 PDT 2017


For SPDK vhost case, there are many possible settings that might be set in suboptimal way.

1.      I see you have 2 socket system. So first to check is the NUMA assignments. The QEMU memory and CPU pining should be on the same socket as the vhost reactor/poller is running.  And NVMe also should be on the same socket.

2.      What is the vhost config, fio job config, qemu launch command?

If you are interested on the throughput I think there should be no difference between SPDK and kernel vhost on IO block size > 32kb.  SPDK if focusing on getting more IO per second @ small block size (eg 4kb) using less CPU power than kernel.


From: SPDK [mailto:spdk-bounces at lists.01.org] On Behalf Of Abhik Sarkar
Sent: Tuesday, June 06, 2017 7:41 PM
To: Storage Performance Development Kit <spdk at lists.01.org>
Subject: [SPDK] SPDK performance questions

I have a couple of questions regarding the data I have gathered so far on SCSI based disks with SPDK using libaio on host.

1.      Our FIO results on host, a 2 socket, 16 core box is around 2.05 GBps. I have been able to push it to 1.88 GBps with SPDK-vhost running on a single core handling 7 request queues (1 queue per disk) or in other words from vhost point of view

it sees 7 different unix sockets and from guest point of view one disk per scsi host. But when I have a single scsi host with 7 target devices, the performance goes down to 1.38 GBps. Is this performance boost coming from guest being able to populate separate queues in parallel, as there is only a single vhost thread handling the requests. Is my understanding correct? Will we get the same performance by adding multiple virtio-scsi-pci devices and interfacing each disk with a different virtio device?

2.      Another observation is that, if I try to distribute it between 2 cores, the throughput drops down to bw=1408MiB/s (1476MB/s). Here, I see that the vhost app has an additional thread. With additional thread handling disjoint set of queues,

I would have imagined that there should be some performance boost. But here it seemed to have dropped. Is there a lock contention of some sort? On the host, all these devices are on a single SCSI host.

3.      Since, we are not using a userspace driver, rather relying on libaio, will it be more efficient to use kernel-vhost?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.01.org/pipermail/spdk/attachments/20170607/b489186a/attachment.html>

More information about the SPDK mailing list