Does BlobFS Asynchronous API support multi thread writing?
by chen.zhenghua@zte.com.cn
Hi everyone,
I simply tested the BlobFS Asynchronous API by using SPDK events framework to execute multi tasks, each task writes one file.
But it doesn't work, the spdk_file_write_async() reported an error when resizing the file size.
The call stack looks like this:
spdk_file_write_async() -> __readwrite() -> spdk_file_truncate_async() -> spdk_blob_resize()
The resize operation must be done in the metadata thread which invoked the spdk_fs_load(), so only the task dispatched to the metadata CPU core works.
That's to say only one thread can be used to write files. It's hard to use, and performance issues may arise.
Does anyone knows further more about this?
thanks very much
1 month
Reporting intermittent test failures
by Harris, James R
Hi all,
I’ve seen a lot of cases recently where -1 votes from the test pool have been removed from a patch due to a failure unrelated to the patch, but then nothing was filed in GitHub for that failure. The filing in GitHub could be a new issue, or a comment on an existing issue.
Please make those GitHub updates a priority. It’s the only way the project can understand the frequency of those intermittent failures and gather to get them fixed. If you’re not sure if a failure has been seen before, search GitHub issues with the “Intermittent Failure” label, or ask on Slack if anyone else has seen the issue. There is no harm in filing a new issue that may be a duplicate – we can always clean these up later during the next bug scrub meeting. The important thing is that we get the failure tracked.
Thanks,
-Jim
1 year, 6 months
Re: Print backtrace in SPDK
by Wenhua Liu
Hi Ziye,
I'm using SPDK NVMe-oF target.
I used some other way and figured out the following call path:
posix_sock_group_impl_poll
-> _sock_flush <------------------ failed
-> spdk_sock_abort_requests
-> _pdu_write_done
-> nvmf_tcp_qpair_disconnect
-> spdk_nvmf_qpair_disconnect
-> _nvmf_qpair_destroy
-> spdk_nvmf_poll_group_remove
-> nvmf_transport_poll_group_remove
-> nvmf_tcp_poll_group_remove
-> spdk_sock_group_remove_sock
-> posix_sock_group_impl_remove_sock
-> spdk_sock_abort_requests
-> _nvmf_ctrlr_free_from_qpair
-> _nvmf_transport_qpair_fini
-> nvmf_transport_qpair_fini
-> nvmf_tcp_close_qpair
-> spdk_sock_close
The _sock_flush calls sendmsg to write the data to the socket. It's sendmsg failing with return value -1. I captured wire data. In Wireshark, I can see the READ command has been received by the target as a TCP packet. As the response to this TCP packet, a TCP packet with FIN flag set is sent to the initiator. The FIN is to close the socket connection.
I'm running SPDK target inside a VM. My NVMe/TCP initiator runs inside another VM. I'm going to try with another SPDK target which runs on a physical machine.
By the way, I noticed there is a uring based sock implementation, how do I switch to this sock implementation. It seems the default is posix sock implementation.
Thanks,
-Wenhua
On 8/23/20, 9:55 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:
Hi Wenhua,
Which applications are you using from SPDK?
1 SPDK NVMe-oF target in target side?
2 SPDK NVMe perf or others?
For nvmf_tcp_close_qpair will be called in the following possible cases (not all listed) for TCP transport. But it will be called by spdk_nvmf_qpair_disconnect as the entry.
1 qpair is not in polling group
spdk_nvmf_qpair_disconnect
nvmf_transport_qpair_fini
2 spdk_nvmf_qpair_disconnect
....
_nvmf_qpair_destroy
nvmf_transport_qpair_fini
..
nvmf_tcp_close_qpair
3 spdk_nvmf_qpair_disconnect
....
_nvmf_qpair_destroy
_nvmf_ctrlr_free_from_qpair
_nvmf_transport_qpair_fini
..
nvmf_tcp_close_qpair
spdk_nvmf_qpair_disconnect is called by nvmf_tcp_qpair_disconnect in tcp.c. nvmf_tcp_qpair_disconnect is called in the following cases:
(1) _pdu_write_done (if there is error for write);
(2) nvmf_tcp_qpair_handle_timeout.( No response from initiator in 30s if targets sends c2h_term_req)
(3) nvmf_tcp_capsule_cmd_hdr_handle. (Cannot get tcp req)
(4) nvmf_tcp_sock_cb. TCP PDU related handling issue.
Also in lib/nvmf/ctrlr.c Target side has a timer poller: nvmf_ctrlr_keep_alive_poll. If there is no keep alive command sent from host, it will call spdk_nvmf_qpair_disconnect in related polling group assoicated with the controller.
Best Regards
Ziye Yang
-----Original Message-----
From: Wenhua Liu <liuw(a)vmware.com>
Sent: Saturday, August 22, 2020 3:15 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: [SPDK] Print backtrace in SPDK
Hi,
Does anyone know if there is a function in SPDK that prints the backtrace?
I run into a “Connection Reset by Peer” issue on host side when testing NVMe/TCP. I identified it’s because some queue pairs are closed unexpectedly by calling nvmf_tcp_close_qpair, but I could not figure out how/why this function is called. I thought if the backtrace can be printed when calling this function, it might be helpful to me to find the root cause.
Thanks,
-Wenhua
_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org
_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org
1 year, 8 months
spdk_top enhancements
by Szwed, Maciej
Hi,
I’m planning to extend spdk_top functionality in upcoming weeks. I’ve been seeing that some of you are already using this tool and therefore I’d to hear from you what new features would you like to see in the tool. Feel free to send some ideas here or join next euro friendly community meeting to discuss it 😊
Regards,
Maciek
1 year, 8 months
20.10 release merge window
by Zawadzki, Tomasz
Hello all,
The merge window for SPDK 20.10 release will close by October 23rd.
Please ensure all patches that should be included in the release are merged by this date.
You can do it by adding a hashtag '20.10' in Gerrit on those patches.
The list of tagged patches can be seen here:
https://review.spdk.io/gerrit/q/hashtag:%2220.10%22+status:open
On October 23rd new branch 'v20.10.x' will be created, and a patch on it
will be tagged as release candidate.
Then, by October 30th, a formal release will take place tagging the last
patch on the branch as SPDK 20.10.
Between release candidate and formal release, only critical fixes
shall be backported to the 'v20.10.x' branch.
Thanks,
Tomek
1 year, 8 months
SPDK NVMe/TCP target cannot disconnect for keep alive timeout if build with "--with-uring"
by Wenhua Liu
Hi,
While running SPDK NVMe/TCP target which is built with “—with-uring” option, if shutting down host without disconnecting controllers, target side gets keep alive timeout. In response to keep alive timeout, target should terminate any connections and destroy the controller instances. But below log messages show this not happening.
When this happens, if I press “Ctrl-C”, the target process does not exit. To terminate the target process, I have to run “kill -9 <pid>” from another ssh session to kill the process.
When using posix socket implementation, this problem does not happen, the nvmf_tgt can always exit when pressing “Ctrl-C”.
Starting SPDK v20.07 git sha1 1a527e501 / DPDK 20.05.0 initialization...
[ DPDK EAL parameters: nvmf --no-shconf -c 0x3f --log-level=lib.eal:6 --log-level=lib.cryptodev:5 --log-level=user1:6 --iova-mode=pa --base-virtaddr=0x200000000000 --match-allocations --file-prefix=spdk_pid1253923 ]
EAL: No available hugepages reported in hugepages-1048576kB
EAL: VFIO support initialized
EAL: No legacy callbacks, legacy socket not created
[2020-08-30 21:05:09.498321] app.c: 666:spdk_app_start: *NOTICE*: Total cores available: 6
[2020-08-30 21:05:09.692271] reactor.c: 371:reactor_run: *NOTICE*: Reactor started on core 1
[2020-08-30 21:05:09.692833] reactor.c: 371:reactor_run: *NOTICE*: Reactor started on core 2
[2020-08-30 21:05:09.693361] reactor.c: 371:reactor_run: *NOTICE*: Reactor started on core 3
[2020-08-30 21:05:09.694206] reactor.c: 371:reactor_run: *NOTICE*: Reactor started on core 4
[2020-08-30 21:05:09.694754] reactor.c: 371:reactor_run: *NOTICE*: Reactor started on core 5
[2020-08-30 21:05:09.695259] reactor.c: 371:reactor_run: *NOTICE*: Reactor started on core 0
[2020-08-30 21:05:09.695340] accel_engine.c: 509:spdk_accel_engine_initialize: *NOTICE*: Accel engine initialized to use software engine.
[2020-08-30 21:06:27.605442] nvmf_rpc.c:1585:nvmf_rpc_decode_max_qpairs: *WARNING*: Parameter max_qpairs_per_ctrlr is deprecated, use max_io_qpairs_per_ctrlr instead.
[2020-08-30 21:06:27.605570] nvmf_rpc.c:1585:nvmf_rpc_decode_max_qpairs: *WARNING*: Parameter max_qpairs_per_ctrlr is deprecated, use max_io_qpairs_per_ctrlr instead.
[2020-08-30 21:06:27.605595] tcp.c: 469:nvmf_tcp_create: *NOTICE*: *** TCP Transport Init ***
[2020-08-30 21:06:27.653656] tcp.c: 650:nvmf_tcp_listen: *NOTICE*: *** NVMe/TCP Target Listening on 192.168.30.150 port 4420 ***
[2020-08-31 05:24:04.951614] ctrlr.c: 166:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host from subsystem nqn.2020-04.com.vmware.eng:tgt1 due to keep alive timeout.
[2020-08-31 05:24:04.951891] ctrlr.c: 166:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host from subsystem nqn.2020-04.com.vmware.eng:tgt2 due to keep alive timeout.
[2020-08-31 05:24:34.951857] ctrlr.c: 166:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host from subsystem nqn.2020-04.com.vmware.eng:tgt1 due to keep alive timeout.
[2020-08-31 05:24:34.952134] ctrlr.c: 166:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host from subsystem nqn.2020-04.com.vmware.eng:tgt2 due to keep alive timeout.
[2020-08-31 05:25:04.952101] ctrlr.c: 166:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host from subsystem nqn.2020-04.com.vmware.eng:tgt1 due to keep alive timeout.
[2020-08-31 05:25:04.952378] ctrlr.c: 166:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host from subsystem nqn.2020-04.com.vmware.eng:tgt2 due to keep alive timeout.
[2020-08-31 05:25:34.952344] ctrlr.c: 166:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host from subsystem nqn.2020-04.com.vmware.eng:tgt1 due to keep alive timeout.
[2020-08-31 05:25:34.952621] ctrlr.c: 166:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host from subsystem nqn.2020-04.com.vmware.eng:tgt2 due to keep alive timeout.
[2020-08-31 05:26:04.952588] ctrlr.c: 166:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host from subsystem nqn.2020-04.com.vmware.eng:tgt1 due to keep alive timeout.
[2020-08-31 05:26:04.952864] ctrlr.c: 166:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host from subsystem nqn.2020-04.com.vmware.eng:tgt2 due to keep alive timeout.
[2020-08-31 05:26:34.952830] ctrlr.c: 166:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host from subsystem nqn.2020-04.com.vmware.eng:tgt1 due to keep alive timeout.
[2020-08-31 05:26:34.953107] ctrlr.c: 166:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host from subsystem nqn.2020-04.com.vmware.eng:tgt2 due to keep alive timeout.
Thanks,
-Wenhua
1 year, 8 months
Unable to run OCF bdev, bdev not attached
by a175818323@gmail.com
./rpc.py bdev_nvme_attach_controller -b Nvme0 -t PCIe -a 0000:3b:00.0
./rpc.py bdev_aio_create /dev/sdk1 Aio0 512
./rpc.py bdev_ocf_create Cache0 wb Nvme0n1 Aio0 --cache-line-size 64
Inserting cache Cache0
Cache0: Metadata initialized
Cache0: Successfully added
Cache0: Cache mode : wb
Cache0: Attaching cache device failed
vbdev_ocf.c:1016:start_cache_cmpl: *ERROR*: Error -1000013 during start cache Cache0, starting rollback
Cache0: Cannot flush cache - cache device is detached
Cache0: Cache Cache0 successfully stopped
Looks like the cache/core device is not attached, what else should I do to make it work?
P.S. I have run Open-CAS-Linux successfully on this machine, so the cas_cache and cas_cache ko are currently loaded. casctl and casadm tools exist as well.
1 year, 8 months
Intel SPDK Jenkins CI Shutdown - September 4th to September 7th
by Latecki, Karol
SPDK Jenkins CI system will be shut down
Why?
Scheduled for planned electrical and network works.
When?
Shutdown is planned for September 4th 2:00 PM GMT to September 7th 8:00AM GMT.
How does that affect us?
CI will be unable to pick up changes and perform tests during this time.
Thanks,
Karol
1 year, 8 months
Print backtrace in SPDK
by Wenhua Liu
Hi,
Does anyone know if there is a function in SPDK that prints the backtrace?
I run into a “Connection Reset by Peer” issue on host side when testing NVMe/TCP. I identified it’s because some queue pairs are closed unexpectedly by calling nvmf_tcp_close_qpair, but I could not figure out how/why this function is called. I thought if the backtrace can be printed when calling this function, it might be helpful to me to find the root cause.
Thanks,
-Wenhua
1 year, 8 months
Re: nvmf_tgt does not start with -L flag though SPDK configured with --enable-debug option
by Wenhua Liu
Hi Xiaodong,
Thank you for answering my question, your guess is correct. Using build/bin/nvmf_tgt, I'm able to start NVMe-oF target with -L flag.
With this, can I delete the app directory?
Regards,
-Wenhua
On 8/19/20, 6:24 PM, "Liu, Xiaodong" <xiaodong.liu(a)intel.com> wrote:
Hi, Wenhua
I guess you were using a stale nvmf_tgt, not the newly built nvmf_tgt.
From v20.07, applications will be built at build/bin/*.
So just try: build/bin/nvmf_tgt -m 0x3f -e 0x20 -L nvme -L nvmf -L nvmf_tcp
--Thanks
From Xiaodong
-----Original Message-----
From: Wenhua Liu <liuw(a)vmware.com>
Sent: Thursday, August 20, 2020 9:04 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: [SPDK] nvmf_tgt does not start with -L flag though SPDK configured with --enable-debug option
Hi,
I just had my SPDK system updated to v20.07.
While testing my NVMe-oF initiator, I wanted to check target side log, so I configured the SPDK with the –enable-debug option and ran make.
1. make clean
2. ./configure –enable-debug
3. Make
I then tried to start NVMe-oF target with -L flag but nvmf_tgt could not start.
~/spdk/spdk$ sudo app/nvmf_tgt/nvmf_tgt -m 0x3f -e 0x20 -L nvme -L nvmf -L nvmf_tcp
2020-08-20 00:50:38.757781 [0x7f55b3d75980] app.c: 943:spdk_app_parse_args: *ERROR*: app/nvmf_tgt/nvmf_tgt must be configured with --enable-debug for -L flag app/nvmf_tgt/nvmf_tgt [options]
options:
-c, --config <config> config file (default none)
--json <config> JSON config file (default none)
--json-ignore-init-errors
don't exit on invalid config entry ….
This used to work with v20.01 (or v20.04).
What could be wrong here?
Thanks,
-Wenhua Liu
_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org
_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org
1 year, 8 months