Reporting intermittent test failures
by Harris, James R
Hi all,
I’ve seen a lot of cases recently where -1 votes from the test pool have been removed from a patch due to a failure unrelated to the patch, but then nothing was filed in GitHub for that failure. The filing in GitHub could be a new issue, or a comment on an existing issue.
Please make those GitHub updates a priority. It’s the only way the project can understand the frequency of those intermittent failures and gather to get them fixed. If you’re not sure if a failure has been seen before, search GitHub issues with the “Intermittent Failure” label, or ask on Slack if anyone else has seen the issue. There is no harm in filing a new issue that may be a duplicate – we can always clean these up later during the next bug scrub meeting. The important thing is that we get the failure tracked.
Thanks,
-Jim
5 months, 2 weeks
[Release] 20.07: SPDK CSI driver, new accel_fw commands, I/O abort support
by Zawadzki, Tomasz
On behalf of the SPDK community I'm pleased to announce the release of SPDK 20.07!
This release contains the following new features:
- SPDK CSI driver: Added CSI driver to bring SPDK to Kubernetes storage through NVMe-oF or iSCSI. Supporting dynamic volume provisioning and enabling Pods to use SPDK storage transparently. This feature is considered experimental. See https://github.com/spdk/spdk-csi repository for more details.
- Acceleration Framework: Added commands for compare, dualcast, crc32c, along with batching support for all commands in all plug-ins. See https://spdk.io/doc/accel_fw.html for detailed information.
- I/O abort: Added support for aborting I/O commands to NVMe, NVMe-oF and Bdev layers.
- Env PCI drivers: Added env APIs to provide greater flexibility in registering and accessing polled mode PCI drivers.
- RDMA library: Added `rdma` library providing an abstraction layer over different RDMA providers. Two providers available are verbs and mlx5 Direct Verbs.
- spdk_dd: Added an application for copying data to/from files and SPDK bdevs efficiently.
- bdevperf config: Added support for configuration files similar to FIO, to allow benchmarking more complex use cases. See https://spdk.io/doc/bdevperf.html for more details.
- DPDK: Added support for DPDK 20.05.
The full changelog for this release is available at:
https://github.com/spdk/spdk/releases/tag/v20.07
This release contains 842 commits from 44 authors with over 44k lines of code changed.
We'd especially like to recognize all of our first time contributors:
Dayu Liu
Haichao Li
Jörg Thalheim
Kyle Zhang
Monica Kenguva
Ntsaravana
Peng Yu
Simon A. F. Lund
Sochin Jiang
Sven Breuner
Wenhua Liu
Yibo Cai
Thanks to everyone for your contributions, participation, and effort!
Thanks,
Tomek
8 months, 3 weeks
How to accelerate the speed of constructing spdk nvme disk ?
by Vincent
Hello all,
Because there are many disks in our system and the startup time is
important for us. So we measured the speed of construct nvme disk in spdk
.We found it is about 220ms for construct a disk. ie.
The total time for construct total N nvme disks is (2.2 +0.22 *N )
second. The total construct disk time is linear increased when disk number
increases.
Then we detail analysis where the time is spent. We found the time is
spent in two ioctl
(1) ioctl(vfio_group_fd, VFIO_GROUP_GET_DEVICE_FD, dev_addr) ---> 110ms
(2)ioctl(vfio_dev_fd, VFIO_DEVICE_RESET) ---> 110ms
But the speed of constructing nvme disk is too slow for us.
Does anyone can give us suggestion for speedup nvme disk construction in
spdk??
Any suggestion is appreciated.
Thank you.
8 months, 4 weeks
spdk doesn't seem to recognize the drive in container with unprivileged mode
by Pharthiphan Asokan
Hi Folks,
I am trying to run k8s pod with unprivileged mode but spdk doesn't seem to
recognize the drive but the vfio device seems to be present on the pod.
nvme.c: 609:spdk_nvme_probe_internal: *ERROR*: NVMe ctrlr scan failed
bdev_nvme.c:1766:spdk_bdev_nvme_create: *ERROR*: No controller was found
with provided trid (traddr: 0000:88:00.0)
$ kubectl exec -it pod-0 /bin/bash
$ ls -ld /dev/vfio/91
crw------- 1 root root 241, 10 Jul 16 14:44 /dev/vfio/91
$ readlink /sys/bus/pci/devices/0000\:88\:00.0/iommu_group
../../../../../../kernel/iommu_groups/91
Note:- with privilege mode there are no issues
9 months
Re: Something cool coming for IOAT....
by Luse, Paul E
I guess I should comment on the latency cost of batching before someone calls me out on it ( Clearly there's a penalty and updates will be made to the tool to measure latency as well....
On 7/18/20, 4:42 PM, "Luse, Paul E" <paul.e.luse(a)intel.com> wrote:
So coming this release is a bunch of support for DSA but that silicon feature won’t be ready for a while. However, it did inspire a big revamp of what we used to call the “copy framework” now it’s known as the “accelerator framework”. DSA has a pretty cool batching feature that’s been coded into the general accel_fw API and is pretty easy to use and yes it supports for the IOAT engine and, for compatibility reasons only, the SW engine. The low level IOAT library already had it’s own version of batching via the ability to create a bunch of copy/fill descriptors and selectively send them later. Now one API can be used to batch all supported commands (copy, fill, compare, dual-cast, crc32c) for any of the engines. Note that for IOAT you can batch any combination of these through the accel_fw API however the only ones that will get hw benefit from batching for IOAT are copy/fill as that’s all it supports (DSA can batch everything).
Anyway, fun news for a Sat… here’s IOAT with 512B copies, 1024 outstanding IO and 4 cores using the new accel_perf app (many patches still under review). Below that the same exact thing but with batching. There’s likely a wide variety of benefits, I’m guessing this isn’t either the best or worst case but I’ll save you the math – this is a 167% improvement, not too shabby ☺
[peluse@localhost examples]$ sudo ./accel_perf -q 1024 --wait-for-rpc -w copy -m 0xf -o 512
SPDK Configuration:
Core mask: 0xf
Accel Perf Configuration:
Workload Type: copy
Transfer size: 512 bytes
Queue depth: 1024
Run time: 5 seconds
Batching: Disabled
Verify: No
Starting SPDK v20.07-pre git sha1 527d01212 / DPDK 19.11.2 initialization...
[ DPDK EAL parameters: (null) --no-shconf -c 0xf --log-level=lib.eal:6 --log-level=lib.cryptodev:5 --log-level=user1:6 --base-virtaddr=0x200000000000 --match-allocations --file-prefix=spdk_pid27832 ]
EAL: No available hugepages reported in hugepages-1048576kB
EAL: VFIO support initialized
[2020-07-18 19:33:12.857089] app.c: 652:spdk_app_start: *NOTICE*: Total cores available: 4
[2020-07-18 19:33:12.996874] reactor.c: 371:reactor_run: *NOTICE*: Reactor started on core 1
[2020-07-18 19:33:12.997441] reactor.c: 371:reactor_run: *NOTICE*: Reactor started on core 2
[2020-07-18 19:33:12.997966] reactor.c: 371:reactor_run: *NOTICE*: Reactor started on core 3
[2020-07-18 19:33:12.998396] reactor.c: 371:reactor_run: *NOTICE*: Reactor started on core 0
[2020-07-18 19:33:16.094129] accel_engine.c: 510:spdk_accel_engine_initialize: *NOTICE*: Accel engine initialized to use software engine.
EAL: using IOMMU type 1 (Type 1)
[2020-07-18 19:33:16.809376] accel_engine_ioat.c: 701:accel_engine_ioat_init: *NOTICE*: Accel engine updated to use IOAT engine.
Running for 5 seconds...
Core Transfers Bandwidth Failed Miscompares
-----------------------------------------------------------------
3 1966784/s 960 MiB/s 0 0
2 1966680/s 960 MiB/s 0 0
1 1965504/s 959 MiB/s 0 0
0 1963873/s 958 MiB/s 0 0
==================================================================
Total: 7862841/s 3839 MiB/s 0 0
[peluse@localhost examples]$ sudo ./accel_perf -q 1024 --wait-for-rpc -w copy -m 0xf -o 512 -b 256
SPDK Configuration:
Core mask: 0xf
Accel Perf Configuration:
Workload Type: copy
Transfer size: 512 bytes
Queue depth: 1024
Run time: 5 seconds
Batching: 256 reqs
Verify: No
Starting SPDK v20.07-pre git sha1 527d01212 / DPDK 19.11.2 initialization...
[ DPDK EAL parameters: (null) --no-shconf -c 0xf --log-level=lib.eal:6 --log-level=lib.cryptodev:5 --log-level=user1:6 --base-virtaddr=0x200000000000 --match-allocations --file-prefix=spdk_pid27818 ]
EAL: No available hugepages reported in hugepages-1048576kB
EAL: VFIO support initialized
[2020-07-18 19:32:55.246468] app.c: 652:spdk_app_start: *NOTICE*: Total cores available: 4
[2020-07-18 19:32:55.386718] reactor.c: 371:reactor_run: *NOTICE*: Reactor started on core 1
[2020-07-18 19:32:55.387289] reactor.c: 371:reactor_run: *NOTICE*: Reactor started on core 2
[2020-07-18 19:32:55.387802] reactor.c: 371:reactor_run: *NOTICE*: Reactor started on core 3
[2020-07-18 19:32:55.388245] reactor.c: 371:reactor_run: *NOTICE*: Reactor started on core 0
[2020-07-18 19:32:58.219421] accel_engine.c: 510:spdk_accel_engine_initialize: *NOTICE*: Accel engine initialized to use software engine.
EAL: using IOMMU type 1 (Type 1)
[2020-07-18 19:32:58.937321] accel_engine_ioat.c: 701:accel_engine_ioat_init: *NOTICE*: Accel engine updated to use IOAT engine.
Running for 5 seconds...
Core Transfers Bandwidth Failed Miscompares
-----------------------------------------------------------------
3 3299942/s 1611 MiB/s 0 0
2 3298748/s 1610 MiB/s 0 0
1 3298916/s 1610 MiB/s 0 0
0 3300608/s 1611 MiB/s 0 0
==================================================================
Total: 13198216/s 6444 MiB/s 0 0
_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org
9 months
Something cool coming for IOAT....
by Luse, Paul E
So coming this release is a bunch of support for DSA but that silicon feature won’t be ready for a while. However, it did inspire a big revamp of what we used to call the “copy framework” now it’s known as the “accelerator framework”. DSA has a pretty cool batching feature that’s been coded into the general accel_fw API and is pretty easy to use and yes it supports for the IOAT engine and, for compatibility reasons only, the SW engine. The low level IOAT library already had it’s own version of batching via the ability to create a bunch of copy/fill descriptors and selectively send them later. Now one API can be used to batch all supported commands (copy, fill, compare, dual-cast, crc32c) for any of the engines. Note that for IOAT you can batch any combination of these through the accel_fw API however the only ones that will get hw benefit from batching for IOAT are copy/fill as that’s all it supports (DSA can batch everything).
Anyway, fun news for a Sat… here’s IOAT with 512B copies, 1024 outstanding IO and 4 cores using the new accel_perf app (many patches still under review). Below that the same exact thing but with batching. There’s likely a wide variety of benefits, I’m guessing this isn’t either the best or worst case but I’ll save you the math – this is a 167% improvement, not too shabby ☺
[peluse@localhost examples]$ sudo ./accel_perf -q 1024 --wait-for-rpc -w copy -m 0xf -o 512
SPDK Configuration:
Core mask: 0xf
Accel Perf Configuration:
Workload Type: copy
Transfer size: 512 bytes
Queue depth: 1024
Run time: 5 seconds
Batching: Disabled
Verify: No
Starting SPDK v20.07-pre git sha1 527d01212 / DPDK 19.11.2 initialization...
[ DPDK EAL parameters: (null) --no-shconf -c 0xf --log-level=lib.eal:6 --log-level=lib.cryptodev:5 --log-level=user1:6 --base-virtaddr=0x200000000000 --match-allocations --file-prefix=spdk_pid27832 ]
EAL: No available hugepages reported in hugepages-1048576kB
EAL: VFIO support initialized
[2020-07-18 19:33:12.857089] app.c: 652:spdk_app_start: *NOTICE*: Total cores available: 4
[2020-07-18 19:33:12.996874] reactor.c: 371:reactor_run: *NOTICE*: Reactor started on core 1
[2020-07-18 19:33:12.997441] reactor.c: 371:reactor_run: *NOTICE*: Reactor started on core 2
[2020-07-18 19:33:12.997966] reactor.c: 371:reactor_run: *NOTICE*: Reactor started on core 3
[2020-07-18 19:33:12.998396] reactor.c: 371:reactor_run: *NOTICE*: Reactor started on core 0
[2020-07-18 19:33:16.094129] accel_engine.c: 510:spdk_accel_engine_initialize: *NOTICE*: Accel engine initialized to use software engine.
EAL: using IOMMU type 1 (Type 1)
[2020-07-18 19:33:16.809376] accel_engine_ioat.c: 701:accel_engine_ioat_init: *NOTICE*: Accel engine updated to use IOAT engine.
Running for 5 seconds...
Core Transfers Bandwidth Failed Miscompares
-----------------------------------------------------------------
3 1966784/s 960 MiB/s 0 0
2 1966680/s 960 MiB/s 0 0
1 1965504/s 959 MiB/s 0 0
0 1963873/s 958 MiB/s 0 0
==================================================================
Total: 7862841/s 3839 MiB/s 0 0
[peluse@localhost examples]$ sudo ./accel_perf -q 1024 --wait-for-rpc -w copy -m 0xf -o 512 -b 256
SPDK Configuration:
Core mask: 0xf
Accel Perf Configuration:
Workload Type: copy
Transfer size: 512 bytes
Queue depth: 1024
Run time: 5 seconds
Batching: 256 reqs
Verify: No
Starting SPDK v20.07-pre git sha1 527d01212 / DPDK 19.11.2 initialization...
[ DPDK EAL parameters: (null) --no-shconf -c 0xf --log-level=lib.eal:6 --log-level=lib.cryptodev:5 --log-level=user1:6 --base-virtaddr=0x200000000000 --match-allocations --file-prefix=spdk_pid27818 ]
EAL: No available hugepages reported in hugepages-1048576kB
EAL: VFIO support initialized
[2020-07-18 19:32:55.246468] app.c: 652:spdk_app_start: *NOTICE*: Total cores available: 4
[2020-07-18 19:32:55.386718] reactor.c: 371:reactor_run: *NOTICE*: Reactor started on core 1
[2020-07-18 19:32:55.387289] reactor.c: 371:reactor_run: *NOTICE*: Reactor started on core 2
[2020-07-18 19:32:55.387802] reactor.c: 371:reactor_run: *NOTICE*: Reactor started on core 3
[2020-07-18 19:32:55.388245] reactor.c: 371:reactor_run: *NOTICE*: Reactor started on core 0
[2020-07-18 19:32:58.219421] accel_engine.c: 510:spdk_accel_engine_initialize: *NOTICE*: Accel engine initialized to use software engine.
EAL: using IOMMU type 1 (Type 1)
[2020-07-18 19:32:58.937321] accel_engine_ioat.c: 701:accel_engine_ioat_init: *NOTICE*: Accel engine updated to use IOAT engine.
Running for 5 seconds...
Core Transfers Bandwidth Failed Miscompares
-----------------------------------------------------------------
3 3299942/s 1611 MiB/s 0 0
2 3298748/s 1610 MiB/s 0 0
1 3298916/s 1610 MiB/s 0 0
0 3300608/s 1611 MiB/s 0 0
==================================================================
Total: 13198216/s 6444 MiB/s 0 0
9 months, 1 week
ioat: "could not start channel" after abnormal exit
by lijh2015@mail.ustc.edu.cn
Hi SPDK team,
I'm using SPDK (v20-01-pre) to evaluate I/OAT performance. There are some bugs in my benchmark program,
but after I use ctrl+c to quit my program and restart it again, some error messages appeared:
ioat.c: 480:ioat_channel_start: *ERROR*: could not start channel: status = 0x3
error = 0
ioat.c: 547:ioat_enum_cb: *ERROR*: ioat_attach() failed
EAL: Requested device 0000:00:04.0 cannot be used
ioat.c: 480:ioat_channel_start: *ERROR*: could not start channel: status = 0x3
error = 0
ioat.c: 547:ioat_enum_cb: *ERROR*: ioat_attach() failed
EAL: Requested device 0000:00:04.1 cannot be used
ioat.c: 480:ioat_channel_start: *ERROR*: could not start channel: status = 0x3
error = 0
ioat.c: 547:ioat_enum_cb: *ERROR*: ioat_attach() failed
EAL: Requested device 0000:00:04.2 cannot be used
ioat.c: 480:ioat_channel_start: *ERROR*: could not start channel: status = 0x3
error = 0
ioat.c: 547:ioat_enum_cb: *ERROR*: ioat_attach() failed
EAL: Requested device 0000:00:04.3 cannot be used
ioat.c: 480:ioat_channel_start: *ERROR*: could not start channel: status = 0x3
error = 0
ioat.c: 547:ioat_enum_cb: *ERROR*: ioat_attach() failed
EAL: Requested device 0000:40:04.0 cannot be used
ioat.c: 480:ioat_channel_start: *ERROR*: could not start channel: status = 0x3
error = 0
ioat.c: 547:ioat_enum_cb: *ERROR*: ioat_attach() failed
EAL: Requested device 0000:40:04.1 cannot be used
ioat.c: 480:ioat_channel_start: *ERROR*: could not start channel: status = 0x3
error = 0
I checked the code in lib/ioat/ioat.c, it seems is_ioat_idle always returns negative results.
What should I do to deal with this problem without rebooting my server? Or, at least,
how to prevent it?
Thanks,
Jiahao Li
9 months, 1 week