[SPDK] The disk hot remove function of SPDK

Vincent cockroach1136 at gmail.com
Mon Oct 29 01:28:47 PDT 2018


Hello  Andrey,

 Thank you for your information,

   I will investigate it ASAP

Andrey Kuzmin <andrey.v.kuzmin at gmail.com> 於 2018年10月28日 週日 下午7:33寫道:

> On Sun, Oct 28, 2018 at 7:19 AM Vincent <cockroach1136 at gmail.com> wrote:
>
> > Hello all,
> >      Recently we are trying the disk hot remove property of SPDK.
> >
> > We have a counter to record the IO send out for a io channel
> >
> > The roughly hot remove procedure in my code is
> >
> > (1) when receive the disk hot remove call back from spdk, we stop sending
> > IO
> > (2) Because we have a counter to record the IO send out for  IO channel,
> we
> > wait all IOs call back(complete)
> > (3) close io channel
> > (4)close bdev desc
> >
> > But sometime we  crashed (the crash rate is about 1/10) , the call stack
> is
> > attached
> > crash in function nvme_free_request
> > void
> > nvme_free_request(struct nvme_request *req)
> > {
> > assert(req != NULL);
> > assert(req->num_children == 0);
> > assert(req->qpair != NULL);
> >
> > STAILQ_INSERT_HEAD(&req->qpair->free_req, req, stailq);
> >  <-------------this line
> > }
> >
> > Does any one can give me a hint ??
> >
>
> It looks like nvme_pcie_qpair s not reference counted, and thus the nvme
> completion path below does not account for the possibility that the user
> callback  fired by nvme_complete_request() can close I/O channel (which,
> for nvme bdev, will destroy the underlying qpair) before freeing the
> associated nvme request. This, if happens, will result in
> nvme_free_request() being entered after the underlying qpair has been
> destroyed, potentially crashing the app.
>
> Regards,
> Andrey
>
> static void
> nvme_pcie_qpair_complete_tracker(struct spdk_nvme_qpair *qpair, struct
> nvme_tracker *tr,
> struct spdk_nvme_cpl *cpl, bool print_on_error)
> {
> [snip]
> if (retry) {
> req->retries++;
> nvme_pcie_qpair_submit_tracker(qpair, tr);
> } else {
> if (was_active) {
> /* Only check admin requests from different processes. */
> if (nvme_qpair_is_admin_queue(qpair) && req->pid != getpid()) {
> req_from_current_proc = false;
> nvme_pcie_qpair_insert_pending_admin_request(qpair, req, cpl);
> } else {
> nvme_complete_request(req, cpl);
> }
> }
>
> if (req_from_current_proc == true) {
> nvme_free_request(req);
> }
>
>
>
> >
> > Any suggestion is appreciated
> >
> > Thank you in advance
> >
> >
> >
> --------------------------------------------------------------------------------------------------------------------------
> > Using host libthread_db library "/lib64/libthread_db.so.1".
> > Core was generated by `./smistor_iscsi_tgt -c
> > /usr/smistor/config/smistor_iscsi_perf.conf'.
> > Program terminated with signal 11, Segmentation fault.
> > #0  0x0000000000414b87 in nvme_free_request (req=req at entry
> =0x7fe4edbf3100)
> > at nvme.c:227
> > 227 nvme.c: No such file or directory.
> > Missing separate debuginfos, use: debuginfo-install
> > bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.170-4.el7.x86_64
> > elfutils-libs-0.170-4.el7.x86_64 glibc-2.17-222.el7.x86_64
> > libaio-0.3.109-13.el7.x86_64 libattr-2.4.46-13.el7.x86_64
> > libcap-2.22-9.el7.x86_64 libgcc-4.8.5-28.el7_5.1.x86_64
> > libuuid-2.23.2-52.el7.x86_64 lz4-1.7.5-2.el7.x86_64
> > numactl-libs-2.0.9-7.el7.x86_64 openssl-libs-1.0.2k-12.el7.x86_64
> > systemd-libs-219-57.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64
> > zlib-1.2.7-17.el7.x86_64
> > (gdb) bt
> > #0  0x0000000000414b87 in nvme_free_request (req=req at entry
> =0x7fe4edbf3100)
> > at nvme.c:227
> > #1  0x0000000000412056 in nvme_pcie_qpair_complete_tracker
> > (qpair=qpair at entry=0x7fe4c5376ef8, tr=0x7fe4ca8ad000,
> >     cpl=cpl at entry=0x7fe4c8e0a840, print_on_error=print_on_error at entry
> > =true)
> > at nvme_pcie.c:1170
> > #2  0x0000000000413be0 in nvme_pcie_qpair_process_completions
> > (qpair=qpair at entry=0x7fe4c5376ef8, max_completions=64,
> >     max_completions at entry=0) at nvme_pcie.c:2013
> > #3  0x0000000000415d7b in nvme_transport_qpair_process_completions
> > (qpair=qpair at entry=0x7fe4c5376ef8,
> >     max_completions=max_completions at entry=0) at nvme_transport.c:201
> > #4  0x000000000041449d in spdk_nvme_qpair_process_completions
> > (qpair=0x7fe4c5376ef8, max_completions=max_completions at entry=0)
> >     at nvme_qpair.c:368
> > #5  0x000000000040a289 in bdev_nvme_poll (arg=0x7fe08c0012a0) at
> > bdev_nvme.c:208
> > #6  0x0000000000499baa in _spdk_reactor_run (arg=0x6081dc0) at
> > reactor.c:452
> > #7  0x00000000004a4284 in eal_thread_loop ()
> > #8  0x00007fe8fb276e25 in start_thread () from /lib64/libpthread.so.0
> > #9  0x00007fe8fafa0bad in clone () from /lib64/libc.so.6
> > _______________________________________________
> > SPDK mailing list
> > SPDK at lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> >
> _______________________________________________
> SPDK mailing list
> SPDK at lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>


More information about the SPDK mailing list