[SPDK] The disk hot remove function of SPDK

Andrey Kuzmin andrey.v.kuzmin at gmail.com
Sun Oct 28 04:33:09 PDT 2018


On Sun, Oct 28, 2018 at 7:19 AM Vincent <cockroach1136 at gmail.com> wrote:

> Hello all,
>      Recently we are trying the disk hot remove property of SPDK.
>
> We have a counter to record the IO send out for a io channel
>
> The roughly hot remove procedure in my code is
>
> (1) when receive the disk hot remove call back from spdk, we stop sending
> IO
> (2) Because we have a counter to record the IO send out for  IO channel, we
> wait all IOs call back(complete)
> (3) close io channel
> (4)close bdev desc
>
> But sometime we  crashed (the crash rate is about 1/10) , the call stack is
> attached
> crash in function nvme_free_request
> void
> nvme_free_request(struct nvme_request *req)
> {
> assert(req != NULL);
> assert(req->num_children == 0);
> assert(req->qpair != NULL);
>
> STAILQ_INSERT_HEAD(&req->qpair->free_req, req, stailq);
>  <-------------this line
> }
>
> Does any one can give me a hint ??
>

It looks like nvme_pcie_qpair s not reference counted, and thus the nvme
completion path below does not account for the possibility that the user
callback  fired by nvme_complete_request() can close I/O channel (which,
for nvme bdev, will destroy the underlying qpair) before freeing the
associated nvme request. This, if happens, will result in
nvme_free_request() being entered after the underlying qpair has been
destroyed, potentially crashing the app.

Regards,
Andrey

static void
nvme_pcie_qpair_complete_tracker(struct spdk_nvme_qpair *qpair, struct
nvme_tracker *tr,
struct spdk_nvme_cpl *cpl, bool print_on_error)
{
[snip]
if (retry) {
req->retries++;
nvme_pcie_qpair_submit_tracker(qpair, tr);
} else {
if (was_active) {
/* Only check admin requests from different processes. */
if (nvme_qpair_is_admin_queue(qpair) && req->pid != getpid()) {
req_from_current_proc = false;
nvme_pcie_qpair_insert_pending_admin_request(qpair, req, cpl);
} else {
nvme_complete_request(req, cpl);
}
}

if (req_from_current_proc == true) {
nvme_free_request(req);
}



>
> Any suggestion is appreciated
>
> Thank you in advance
>
>
> --------------------------------------------------------------------------------------------------------------------------
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `./smistor_iscsi_tgt -c
> /usr/smistor/config/smistor_iscsi_perf.conf'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x0000000000414b87 in nvme_free_request (req=req at entry=0x7fe4edbf3100)
> at nvme.c:227
> 227 nvme.c: No such file or directory.
> Missing separate debuginfos, use: debuginfo-install
> bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.170-4.el7.x86_64
> elfutils-libs-0.170-4.el7.x86_64 glibc-2.17-222.el7.x86_64
> libaio-0.3.109-13.el7.x86_64 libattr-2.4.46-13.el7.x86_64
> libcap-2.22-9.el7.x86_64 libgcc-4.8.5-28.el7_5.1.x86_64
> libuuid-2.23.2-52.el7.x86_64 lz4-1.7.5-2.el7.x86_64
> numactl-libs-2.0.9-7.el7.x86_64 openssl-libs-1.0.2k-12.el7.x86_64
> systemd-libs-219-57.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64
> zlib-1.2.7-17.el7.x86_64
> (gdb) bt
> #0  0x0000000000414b87 in nvme_free_request (req=req at entry=0x7fe4edbf3100)
> at nvme.c:227
> #1  0x0000000000412056 in nvme_pcie_qpair_complete_tracker
> (qpair=qpair at entry=0x7fe4c5376ef8, tr=0x7fe4ca8ad000,
>     cpl=cpl at entry=0x7fe4c8e0a840, print_on_error=print_on_error at entry
> =true)
> at nvme_pcie.c:1170
> #2  0x0000000000413be0 in nvme_pcie_qpair_process_completions
> (qpair=qpair at entry=0x7fe4c5376ef8, max_completions=64,
>     max_completions at entry=0) at nvme_pcie.c:2013
> #3  0x0000000000415d7b in nvme_transport_qpair_process_completions
> (qpair=qpair at entry=0x7fe4c5376ef8,
>     max_completions=max_completions at entry=0) at nvme_transport.c:201
> #4  0x000000000041449d in spdk_nvme_qpair_process_completions
> (qpair=0x7fe4c5376ef8, max_completions=max_completions at entry=0)
>     at nvme_qpair.c:368
> #5  0x000000000040a289 in bdev_nvme_poll (arg=0x7fe08c0012a0) at
> bdev_nvme.c:208
> #6  0x0000000000499baa in _spdk_reactor_run (arg=0x6081dc0) at
> reactor.c:452
> #7  0x00000000004a4284 in eal_thread_loop ()
> #8  0x00007fe8fb276e25 in start_thread () from /lib64/libpthread.so.0
> #9  0x00007fe8fafa0bad in clone () from /lib64/libc.so.6
> _______________________________________________
> SPDK mailing list
> SPDK at lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>


More information about the SPDK mailing list