[SPDK] VM oops while testing SPDK hotplug.sh

Chang, Cunyin cunyin.chang at intel.com
Sun Jun 4 20:12:12 PDT 2017


Hi Isaac,

Sorry for Typo, I mean run setup on VM, not host. The hotplug.sh already run the setup.sh, so you could just ignore the step #3.

Thanks,
Cunyin

From: SPDK [mailto:spdk-bounces at lists.01.org] On Behalf Of Isaac Otsiabah
Sent: Tuesday, May 30, 2017 2:22 AM
To: spdk at lists.01.org
Subject: Re: [SPDK] VM oops while testing SPDK hotplug.sh


Hello Cunyin, according to the steps you indicated below, setup.sh was ran on the host. However, setup.sh should be run in the VM. Please, can you verify? If you run setup.sh in the VM after device_add statement (as written in hotplug.sh), you should see the error. Thank you.

Isaac
From: SPDK [mailto:spdk-bounces at lists.01.org] On Behalf Of Chang, Cunyin
Sent: Tuesday, May 23, 2017 5:03 PM
To: Storage Performance Development Kit <spdk at lists.01.org<mailto:spdk at lists.01.org>>
Subject: Re: [SPDK] VM oops while testing SPDK hotplug.sh

Hi Isaac,

I have tested the hotplug.sh as you mentioned:
1 start VM
2 comments out all the creation and kill for VM in the hotplug.sh
3 setup.sh on host
4 run hotplug.sh for about 20 times.

I use Fedora release 25 (kernel 4.8.5)as the VM image, and the host system is Fedora release 21 (kernel 4.1.13)
I do not found the issue just as you mentioned, but I found another issue, the setup.sh will consume long time(5 - 10s)
It could cause the test failed, because the hotplug instance will exit before all the hotplug event get done. So I increase
the testing time to 20S.

could you please try the OS as I used and to see if the issue still there?

Thanks,
Cunyin

From: SPDK [mailto:spdk-bounces at lists.01.org] On Behalf Of Isaac Otsiabah
Sent: Saturday, May 20, 2017 2:33 AM
To: spdk at lists.01.org<mailto:spdk at lists.01.org>
Subject: [SPDK] VM oops while testing SPDK hotplug.sh

Daniel. I have done more testing using hotplug.sh and here is why one does not see the problem by running hotplug.sh as is. When hotplug.sh is run, it always creates a new VM to run the test on and, destroys the VM when completed. Therefore, it uses a fresh VM all the time and never gets the chance to do the inserts and remove on a VM that has run the same test several times (ie. 4 to 5 times) before. When I run hotplug.sh with a fresh VM, it passes, On the other hand, when I use the same VM that has run the test several times before, the VM oops, this is the problem. I also think this is the likely scenario the customer will experience.

I wasn't clear in my earlier emails because i was still determining the cause of the problem (at the higher level at least). I think you will see the problem if you do this


i.                     In an xterm window, Create the VM separately (without the -daemon flag).  for example

IMAGE=/home/fedora24/fedora24-2.img

qemu-img create -f qcow2 $IMAGE 50G

MEM=8192M

FEDORA_ISO=/tmp/Fedora-Server-dvd-x86_64-24-1.2.iso

qemu_pidfile=/tmp/qemu_pidfile

qemu-2.7.1/x86_64-softmmu/qemu-system-x86_64 \

                -hda $IMAGE \

                -net user,hostfwd=tcp::10022-:22 \

                -net nic \

                -cpu host \

                -m ${MEM} \

                -pidfile "/tmp/qemu_pidfile" \

                --enable-kvm \

                -chardev socket,id=mon0,host=localhost,port=4444,ipv4,server,nowait \

                -mon chardev=mon0,mode=readline \

                -cdrom $FEDORA_ISO


ii.                   Then comment out in hotplug.sh the portion that creates the VM.



iii.                 Also comment out these 4 lines at the bottom in hotplug.sh to avoid killing the VM.

qemupid=`cat "$qemu_pidfile" | awk '{printf $0}'`

kill -9 $qemupid

rm "$qemu_pidfile"

rm "$test_img"



iv.                 Run hotplug.sh (about 5 times and you will see the oops on VM console)

The host system I am using is a Centos 7.2

Isaac
From: SPDK [mailto:spdk-bounces at lists.01.org] On Behalf Of Verkamp, Daniel
Sent: Wednesday, May 17, 2017 1:09 PM
To: Storage Performance Development Kit <spdk at lists.01.org<mailto:spdk at lists.01.org>>
Subject: Re: [SPDK] VM crashes while testing SPDK hotplug

Hi Isaac,

The version of the hotplug script in the repository (test/lib/nvme/hotplug.sh) is the current version we are running in our automated test pool.

We haven't hit the -net/--netdev issue that you mentioned yet because the version of qemu we are using is older (the current host system running this test is Fedora 25 with qemu 2.7.1).  It looks like we'll need to update the script for that.  We would be happy to accept a patch to hotplug.sh if the --netdev option also works on older qemu.

If the kernel crashes due to user program behavior, it sounds like there is a bug in the kernel uio driver.  We haven't seen this crash in our automated testing, so I am not sure what the cause could be.  It is also worth trying a newer kernel version (we are just using Linux 4.5.5 because the test VM image hasn't been updated in a while).

-- Daniel

From: SPDK [mailto:spdk-bounces at lists.01.org] On Behalf Of Isaac Otsiabah
Sent: Monday, May 15, 2017 1:43 PM
To: Storage Performance Development Kit <spdk at lists.01.org<mailto:spdk at lists.01.org>>
Subject: Re: [SPDK] VM crashes while testing SPDK hotplug

Daniel, please, do you have an updated version of the hotplug.sh script you can share with us? I created the VM using this exact command on my Centos 7 host

IMAGE=/home/fedora24/fedora24.img
qemu-img create -f qcow2 $IMAGE 50G
MEM=8192M
FEDORA_ISO=/tmp/Fedora-Server-dvd-x86_64-24-1.2.iso

/tmp/qemu-2.9.0/x86_64-softmmu/qemu-system-x86_64 \
                -hda $IMAGE \
                -net nic,model=virtio \
                -net bridge,br=br1 \
                -netdev user,id=hotplug,hostfwd=tcp::10022-:22 \
                -m ${MEM} \
                -pidfile "/tmp/qemu_pid_fedora.txt" \
                --enable-kvm \
                -cpu host \
                -chardev socket,id=mon0,host=localhost,port=4444,ipv4,server,nowait \
                -mon chardev=mon0,mode=readline \
                -cdrom $FEDORA_ISO

After the install is complete, I setup the guest IP address in /etc/sysconfig/network-scripts/ifcfg-ens3 and brings up the interface with ./ifup ens3
>From the VM, I clone spdk and build.
Then I run the group of test in hotplug.sh skipping the VM creation and copying spdk to the VM sections.
I mentioned the -netdev flag in my earlier email

Isaac
From: Isaac Otsiabah
Sent: Monday, May 15, 2017 1:08 PM
To: Storage Performance Development Kit <spdk at lists.01.org<mailto:spdk at lists.01.org>>
Cc: Isaac Otsiabah <IOtsiabah at us.fujitsu.com<mailto:IOtsiabah at us.fujitsu.com>>; Paul Von-Stamwitz <PVonStamwitz at us.fujitsu.com<mailto:PVonStamwitz at us.fujitsu.com>>; Edward Yang <eyang at us.fujitsu.com<mailto:eyang at us.fujitsu.com>>
Subject: RE: VM crashes while testing SPDK hotplug

Daniel, i installed a Fedora 24 VM and test it. After running the test twice or more, the VM oops. Unlike the previous failure on Centos, this failure does not reboot but VM oops after two or more test run. My host is a Centos machine. I found the qemu-kvm   which comes with the OS installation does not support nvme so I build qemu-system-x86_64 version 2.9.

[root at host1 spdk]# /tmp/qemu-2.9.0/x86_64-softmmu/qemu-system-x86_64  -version
QEMU emulator version 2.9.0
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

One observation on (although this is not the problem because I executed the scripts/setup.sh and the hotplug binary from vm console during appropriate breakpoints because local port 10022 was not responsive), the hotplug.h has the flag  "-net user,hostfwd=tcp::10022-:22 \" to redirect gust ssh port 22 to host port 10022. However, qemu-system-x86_64 version 2.9 does not have this option but it has -netdev option but is  is different. The qemu-system-86_64 man page on -netdev flag is as follows:
-netdev user,id=str[,ipv4[=on|off]][,net=addr[/mask]][,host=addr]
         [,ipv6[=on|off]][,ipv6-net=addr[/int]][,ipv6-host=addr]
         [,restrict=on|off][,hostname=host][,dhcpstart=addr]
         [,dns=addr][,ipv6-dns=addr][,dnssearch=domain][,tftp=dir]
         [,bootfile=f][,hostfwd=rule][,guestfwd=rule][,smb=dir[,smbserver=addr]]
                configure a user mode network backend with ID 'str',
                its DHCP server and optional services

It says hostfwd=rule and does not give detail of the rule. I used tcp so I specified it as
-netdev user,id=hotplug,hostfwd=tcp::10022-:22 \

>From the host "netstat -an |egrep -I listen |less" I see local port 10022 is being listened on. I installed sshpass and tested this -netdev flag redirection with a simple sshpass command to the vm but got no response. Therefore, i bypassed executing scripts/setup.sh and the hotplug binary using sshpass command.

So I can test it without executing setup.sh and the hotplug binary through sshpass on port 10022. The main issue is why does it oops after I run it 2 or more times?

Isaac
From: SPDK [mailto:spdk-bounces at lists.01.org] On Behalf Of Verkamp, Daniel
Sent: Tuesday, May 09, 2017 3:33 PM
To: Storage Performance Development Kit <spdk at lists.01.org<mailto:spdk at lists.01.org>>
Subject: Re: [SPDK] VM crashes while testing SPDK hotplug

Hi Isaac,

Our hotplug tests with a VM (test/lib/nvme/hotplug.sh) are working with a Fedora 24 VM guest running kernel 4.5.5.  I suspect there is a bug in the CentOS kernel version (3.10 is fairly old and is probably missing uio/hotplug-related bug fixes from the mainline kernels).

Can you try to reproduce your problem on a newer kernel version and see if that is the cause of the issue?

Thanks,
-- Daniel

From: SPDK [mailto:spdk-bounces at lists.01.org] On Behalf Of Isaac Otsiabah
Sent: Tuesday, May 9, 2017 2:11 PM
To: spdk at lists.01.org<mailto:spdk at lists.01.org>
Subject: [SPDK] VM crashes while testing SPDK hotplug

I created a VM on a Centos 7 with a listening socket on port 4449 and tested the hotplug.

1.   VM creation is as follows

IMAGE=/home/centos7/centos72.img
qemu-img create -f qcow2 $IMAGE 50G
MEM=8192M
ISO=/tmp/CentOS-7-x86_64-Everything-1611.iso

[root at host1]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
[root at host1]# ls -l /tmp/CentOS-7-x86_64-Everything-1611.iso
-r--------. 1 qemu qemu 8280604672 Apr 12 13:37 /tmp/CentOS-7-x86_64-Everything-1611.iso

qemu-2.9.0/x86_64-softmmu/qemu-system-x86_64 \
                -hda $IMAGE \
                -net nic,model=virtio \
                -net bridge,br=br1 \
                -m ${MEM} \
                -pidfile "/tmp/qemu_pid2.txt" \
                --enable-kvm \
                -cpu host \
                -chardev socket,id=mon0,host=localhost,port=4449,ipv4,server,nowait \
                -mon chardev=mon0,mode=readline \
                -cdrom $ISO

2.    Without  running the SPDK  ( ie. examples/nvme/hotplug/hotplug -i 0 -t 15 -n 4 -r 8 ), the qemu commands to insert fake nvme devices work, i can see the nvme devices in /dev/

         echo " drive_add 0 file=/root/test0,format=raw,id=drive0,if=none" | nc localhost 4449
         echo " drive_add 1 file=/root/test1,format=raw,id=drive1,if=none" | nc localhost 4449
         echo  "drive_add 2 file=/root/test2,format=raw,id=drive2,if=none"  | nc localhost 4449
         echo  "drive_add 3 file=/root/test3,format=raw,id=drive3,if=none"  | nc localhost 4449

         echo "device_add nvme,drive=drive0,id=nvme0,serial=nvme0"  |nc localhost 4449
         echo "device_add nvme,drive=drive1,id=nvme1,serial=nvme1" |nc localhost 4449
        echo "device_add nvme,drive=drive2,id=nvme2,serial=nvme2"  |nc localhost 4449
         echo "device_add nvme,drive=drive3,id=nvme3,serial=nvme3" |nc localhost 4449

        Also, commands to delete the devices work without crashing the VM
        echo "device_del nvme0" | nc localhost 4449
        echo "device_del nvme1" | nc localhost 4449
        echo "device_del nvme2" | nc localhost 4449
        echo "device_del nvme3" | nc localhost 4449

3.   However, with the SPDK hotplug test application (examples/nvme/hotplug/hotplug -i 0 -t 15 -n 4 -r 8), the device_del command causes a fault and crashes the VM and it reboot as a result. /var/log/message and I created a VM on a Centos 7 with a listening socket on port 4449 and tested the hotplug.

1.   VM creation is as follows

IMAGE=/home/centos7/centos72.img
qemu-img create -f qcow2 $IMAGE 50G
MEM=8192M
ISO=/tmp/CentOS-7-x86_64-Everything-1611.iso

[root at host1]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
[root at host1]# ls -l /tmp/CentOS-7-x86_64-Everything-1611.iso
-r--------. 1 qemu qemu 8280604672 Apr 12 13:37 /tmp/CentOS-7-x86_64-Everything-1611.iso

qemu-2.9.0/x86_64-softmmu/qemu-system-x86_64 \
                -hda $IMAGE \
                -net nic,model=virtio \
                -net bridge,br=br1 \
                -m ${MEM} \
                -pidfile "/tmp/qemu_pid2.txt" \
                --enable-kvm \
                -cpu host \
                -chardev socket,id=mon0,host=localhost,port=4449,ipv4,server,nowait \
                -mon chardev=mon0,mode=readline \
                -cdrom $ISO

2.    Without  running the SPDK  ( ie. examples/nvme/hotplug/hotplug -i 0 -t 15 -n 4 -r 8 ), the qemu commands to insert fake nvme devices work, i can see the nvme devices in /dev/

         echo " drive_add 0 file=/root/test0,format=raw,id=drive0,if=none" | nc localhost 4449
         echo " drive_add 1 file=/root/test1,format=raw,id=drive1,if=none" | nc localhost 4449
         echo  "drive_add 2 file=/root/test2,format=raw,id=drive2,if=none"  | nc localhost 4449
         echo  "drive_add 3 file=/root/test3,format=raw,id=drive3,if=none"  | nc localhost 4449

         echo "device_add nvme,drive=drive0,id=nvme0,serial=nvme0"  |nc localhost 4449
         echo "device_add nvme,drive=drive1,id=nvme1,serial=nvme1" |nc localhost 4449
        echo "device_add nvme,drive=drive2,id=nvme2,serial=nvme2"  |nc localhost 4449
         echo "device_add nvme,drive=drive3,id=nvme3,serial=nvme3" |nc localhost 4449

        Also, commands to delete the devices work without crashing the VM
        echo "device_del nvme0" | nc localhost 4449
        echo "device_del nvme1" | nc localhost 4449
        echo "device_del nvme2" | nc localhost 4449
        echo "device_del nvme3" | nc localhost 4449

3.   However, with the SPDK hotplug test application (examples/nvme/hotplug/hotplug -i 0 -t 15 -n 4 -r 8), the device_del command causes a fault and crashes the VM and it reboot as a result. The /var/log/message and vmcore-dmesg.txt files are in the attached tar file. Would appreciate any help in why a bug in SPDK crashes the VM. Thanks.

Isaac











-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.01.org/pipermail/spdk/attachments/20170605/107d3465/attachment.html>


More information about the SPDK mailing list