Topic from last week's community meeting
by Luse, Paul E
Hi Shuhei,
I was out of town last week and missed the meeting but saw on Trello you had the topic below:
"a few idea: log structured data store , data store with compression, and metadata replication of Blobstore"
Which I'd be pretty interested in working on with you or at least hearing more about it. When you get a chance, no hurry, can you please expand a little on how the conversation went and what you're looking at specifically?
Thanks!
Paul
2 years, 7 months
Add py-spdk client for SPDK
by We We
Hi, all
I have submitted the py-spdk code on https://review.gerrithub.io/#/c/379741/, please take some time to visit it, I will be very grateful to you.
The py-spdk is client which can help the upper-level app to communicate with the SPDK-based app (such as: nvmf_tgt, vhost, iscsi_tgt, etc.). Should I submit it into the other repo I rebuild rather than SPDK repo? Because I think it is a relatively independent kit upon the SPDK.
If you have some thoughts about the py-spdk, please share with me.
Regards,
Helloway
2 years, 7 months
Re: [SPDK] anyone ran the SPDK ( app/iscsi_tgt/iscsi_tgt ) with VPP?
by Isaac Otsiabah
Hi Tomasz, I got the SPDK patch. My network topology is simple but making the network ip address accessible to the iscsi_tgt application and to vpp is not working. From my understanding, vpp is started first on the target host and then iscsi_tgt application is started after the network setup is done (please, correct me if this is not the case).
------- 192.168.2.10
| | initiator
-------
|
|
|
-------------------------------------------- 192.168.2.0
|
|
| 192.168.2.20
-------------- vpp, vppctl
| | iscsi_tgt
--------------
Both system have a 10GB NIC
(On target Server):
I set up the vpp environment variables through sysctl command.
I unbind the kernel driver and loaded the DPDK uio_pci_generic driver for the first 10GB NIC (device address= 0000:82:00.0).
That worked so I started the vpp application and from the startup output, the NIC is in used by vpp
[root@spdk2 ~]# vpp -c /etc/vpp/startup.conf
vlib_plugin_early_init:356: plugin path /usr/lib/vpp_plugins
load_one_plugin:184: Loaded plugin: acl_plugin.so (Access Control Lists)
load_one_plugin:184: Loaded plugin: dpdk_plugin.so (Data Plane Development Kit (DPDK))
load_one_plugin:184: Loaded plugin: flowprobe_plugin.so (Flow per Packet)
load_one_plugin:184: Loaded plugin: gtpu_plugin.so (GTPv1-U)
load_one_plugin:184: Loaded plugin: ila_plugin.so (Identifier-locator addressing for IPv6)
load_one_plugin:184: Loaded plugin: ioam_plugin.so (Inbound OAM)
load_one_plugin:114: Plugin disabled (default): ixge_plugin.so
load_one_plugin:184: Loaded plugin: kubeproxy_plugin.so (kube-proxy data plane)
load_one_plugin:184: Loaded plugin: l2e_plugin.so (L2 Emulation)
load_one_plugin:184: Loaded plugin: lb_plugin.so (Load Balancer)
load_one_plugin:184: Loaded plugin: libsixrd_plugin.so (IPv6 Rapid Deployment on IPv4 Infrastructure (RFC5969))
load_one_plugin:184: Loaded plugin: memif_plugin.so (Packet Memory Interface (experimetal))
load_one_plugin:184: Loaded plugin: nat_plugin.so (Network Address Translation)
load_one_plugin:184: Loaded plugin: pppoe_plugin.so (PPPoE)
load_one_plugin:184: Loaded plugin: stn_plugin.so (VPP Steals the NIC for Container integration)
vpp[4168]: load_one_plugin:63: Loaded plugin: /usr/lib/vpp_api_test_plugins/acl_test_plugin.so
vpp[4168]: load_one_plugin:63: Loaded plugin: /usr/lib/vpp_api_test_plugins/dpdk_test_plugin.so
vpp[4168]: load_one_plugin:63: Loaded plugin: /usr/lib/vpp_api_test_plugins/flowprobe_test_plugin.so
vpp[4168]: load_one_plugin:63: Loaded plugin: /usr/lib/vpp_api_test_plugins/gtpu_test_plugin.so
vpp[4168]: load_one_plugin:63: Loaded plugin: /usr/lib/vpp_api_test_plugins/ioam_export_test_plugin.so
vpp[4168]: load_one_plugin:63: Loaded plugin: /usr/lib/vpp_api_test_plugins/ioam_pot_test_plugin.so
vpp[4168]: load_one_plugin:63: Loaded plugin: /usr/lib/vpp_api_test_plugins/ioam_trace_test_plugin.so
vpp[4168]: load_one_plugin:63: Loaded plugin: /usr/lib/vpp_api_test_plugins/ioam_vxlan_gpe_test_plugin.so
vpp[4168]: load_one_plugin:63: Loaded plugin: /usr/lib/vpp_api_test_plugins/kubeproxy_test_plugin.so
vpp[4168]: load_one_plugin:63: Loaded plugin: /usr/lib/vpp_api_test_plugins/lb_test_plugin.so
vpp[4168]: load_one_plugin:63: Loaded plugin: /usr/lib/vpp_api_test_plugins/memif_test_plugin.so
vpp[4168]: load_one_plugin:63: Loaded plugin: /usr/lib/vpp_api_test_plugins/nat_test_plugin.so
vpp[4168]: load_one_plugin:63: Loaded plugin: /usr/lib/vpp_api_test_plugins/pppoe_test_plugin.so
vpp[4168]: load_one_plugin:63: Loaded plugin: /usr/lib/vpp_api_test_plugins/udp_ping_test_plugin.so
vpp[4168]: load_one_plugin:63: Loaded plugin: /usr/lib/vpp_api_test_plugins/vxlan_gpe_ioam_export_test_plugin.so
vpp[4168]: dpdk_config:1240: EAL init args: -c 1 -n 4 --huge-dir /run/vpp/hugepages --file-prefix vpp -w 0000:82:00.0 --master-lcore 0 --socket-mem 64,64
EAL: No free hugepages reported in hugepages-1048576kB
EAL: VFIO support initialized
DPDK physical memory layout:
Segment 0: IOVA:0x2200000, len:2097152, virt:0x7f919c800000, socket_id:0, hugepage_sz:2097152, nchannel:0, nrank:0
Segment 1: IOVA:0x3e000000, len:16777216, virt:0x7f919b600000, socket_id:0, hugepage_sz:2097152, nchannel:0, nrank:0
Segment 2: IOVA:0x3fc00000, len:2097152, virt:0x7f919b200000, socket_id:0, hugepage_sz:2097152, nchannel:0, nrank:0
Segment 3: IOVA:0x54c00000, len:46137344, virt:0x7f917ae00000, socket_id:0, hugepage_sz:2097152, nchannel:0, nrank:0
Segment 4: IOVA:0x1f2e400000, len:67108864, virt:0x7f8f9c200000, socket_id:1, hugepage_sz:2097152, nchannel:0, nran
STEP1:
Then from vppctl command prompt, I set up ip address for the 10G interface and up it. From vpp, I can ping the initiator machine and vice versa as shown below.
vpp# show int
Name Idx State Counter Count
TenGigabitEthernet82/0/0 1 down
local0 0 down
vpp# set interface ip address TenGigabitEthernet82/0/0 192.168.2.20/24
vpp# set interface state TenGigabitEthernet82/0/0 up
vpp# show int
Name Idx State Counter Count
TenGigabitEthernet82/0/0 1 up
local0 0 down
vpp# show int address
TenGigabitEthernet82/0/0 (up):
192.168.2.20/24
local0 (dn):
/* ping initiator from vpp */
vpp# ping 192.168.2.10
64 bytes from 192.168.2.10: icmp_seq=1 ttl=64 time=.0779 ms
64 bytes from 192.168.2.10: icmp_seq=2 ttl=64 time=.0396 ms
64 bytes from 192.168.2.10: icmp_seq=3 ttl=64 time=.0316 ms
64 bytes from 192.168.2.10: icmp_seq=4 ttl=64 time=.0368 ms
64 bytes from 192.168.2.10: icmp_seq=5 ttl=64 time=.0327 ms
(On Initiator):
/* ping vpp interface from initiator*/
[root@spdk1 ~]# ping -c 2 192.168.2.20
PING 192.168.2.20 (192.168.2.20) 56(84) bytes of data.
64 bytes from 192.168.2.20: icmp_seq=1 ttl=64 time=0.038 ms
64 bytes from 192.168.2.20: icmp_seq=2 ttl=64 time=0.031 ms
STEP2:
However, when I start the iscsi_tgt server, it does not have access to the above 192.168.2.x subnet so I ran these commands on the target server to create veth and then connected it to a vpp host-interface as follows:
ip link add name vpp1out type veth peer name vpp1host
ip link set dev vpp1out up
ip link set dev vpp1host up
ip addr add 192.168.2.201/24 dev vpp1host
vpp# create host-interface name vpp1out
vpp# set int state host-vpp1out up
vpp# set int ip address host-vpp1out 192.168.2.202
vpp# show int addr
TenGigabitEthernet82/0/0 (up):
192.168.2.20/24
host-vpp1out (up):
192.168.2.202/24
local0 (dn):
vpp# trace add af-packet-input 10
/* From host, ping vpp */
[root@spdk2 ~]# ping -c 2 192.168.2.202
PING 192.168.2.202 (192.168.2.202) 56(84) bytes of data.
64 bytes from 192.168.2.202: icmp_seq=1 ttl=64 time=0.130 ms
64 bytes from 192.168.2.202: icmp_seq=2 ttl=64 time=0.067 ms
/* From vpp, ping host */
vpp# ping 192.168.2.201
64 bytes from 192.168.2.201: icmp_seq=1 ttl=64 time=.1931 ms
64 bytes from 192.168.2.201: icmp_seq=2 ttl=64 time=.1581 ms
64 bytes from 192.168.2.201: icmp_seq=3 ttl=64 time=.1235 ms
64 bytes from 192.168.2.201: icmp_seq=4 ttl=64 time=.1032 ms
64 bytes from 192.168.2.201: icmp_seq=5 ttl=64 time=.0688 ms
Statistics: 5 sent, 5 received, 0% packet loss
>From the target host,I still cannot ping the initiator (192.168.2.10), it does not go through the vpp interface so my vpp interface connection is not correct.
Please, how does one create the vpp host interface and connect it, so that host applications (ie. iscsi_tgt) can communicate in the 192.168.2 subnet? In STEP2, should I use a different subnet like 192.168.3.X and turn on IP forwarding add a route to the routing table?
Isaac
From: Zawadzki, Tomasz [mailto:tomasz.zawadzki@intel.com]
Sent: Thursday, April 12, 2018 12:27 AM
To: Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com>
Cc: Harris, James R <james.r.harris(a)intel.com>; Verkamp, Daniel <daniel.verkamp(a)intel.com>; Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com>
Subject: RE: anyone ran the SPDK ( app/iscsi_tgt/iscsi_tgt ) with VPP?
Hello Isaac,
Are you using following patch ? (I suggest cherry picking it)
https://review.gerrithub.io/#/c/389566/
SPDK iSCSI target can be started without specific interface to bind on, by not specifying any target nodes or portal groups. They can be added later via RPC http://www.spdk.io/doc/iscsi.html#iscsi_rpc.
Please see https://github.com/spdk/spdk/blob/master/test/iscsi_tgt/lvol/iscsi.conf for example of minimal iSCSI config.
Suggested flow of starting up applications is:
1. Unbind interfaces from kernel
2. Start VPP and configure the interface via vppctl
3. Start SPDK
4. Configure the iSCSI target via RPC, at this time it should be possible to use the interface configured in VPP
Please note, there is some leeway here. The only requirement is having VPP app started before SPDK app.
Interfaces in VPP can be created (like tap or veth) and configured at runtime, and are available for use in SPDK as well.
Let me know if you have any questions.
Tomek
From: Isaac Otsiabah [mailto:IOtsiabah@us.fujitsu.com]
Sent: Wednesday, April 11, 2018 8:47 PM
To: Zawadzki, Tomasz <tomasz.zawadzki(a)intel.com<mailto:tomasz.zawadzki@intel.com>>
Cc: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris@intel.com>>; Verkamp, Daniel <daniel.verkamp(a)intel.com<mailto:daniel.verkamp@intel.com>>; Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com<mailto:PVonStamwitz@us.fujitsu.com>>
Subject: anyone ran the SPDK ( app/iscsi_tgt/iscsi_tgt ) with VPP?
Hi Tomaz, Daniel and Jim, i am trying to test VPP so build the VPP on a Centos 7.4 (x86_64), build the SPDK and tried to run the ./app/iscsi_tgt/iscsi_tgt application.
For VPP, first, I unbind the nick from the kernel as and start VPP application.
./usertools/dpdk-devbind.py -u 0000:07:00.0
vpp unix {cli-listen /run/vpp/cli.sock}
Unbinding the nic takes down the interface, however, the ./app/iscsi_tgt/iscsi_tgt -m 0x101 application needs an interface to bind to during startup so it fails to start. The information at:
"Running SPDK with VPP
VPP application has to be started before SPDK iSCSI target, in order to enable usage of network interfaces. After SPDK iSCSI target initialization finishes, interfaces configured within VPP will be available to be configured as portal addresses. Please refer to Configuring iSCSI Target via RPC method<http://www.spdk.io/doc/iscsi.html#iscsi_rpc>."
is not clear because the instructions at the "Configuring iSCSI Traget via RPC method" suggest the iscsi_tgt server is running for one to be able to execute the RPC commands but, how do I get the iscsi_tgt server running without an interface to bind on during its initialization?
Please, can anyone of you help to explain how to run the SPDK iscsi_tgt application with VPP (for instance, what should change in iscsi.conf?) after unbinding the nic, how do I get the iscsi_tgt server to start without an interface to bind to, what address should be assigned to the Portal in iscsi.conf.. etc)?
I would appreciate if anyone would help. Thank you.
Isaac
3 years, 8 months
Building spdk on CentOS6
by Shahar Salzman
Hi,
Finally got to looking at support of spdk build on CentOS6, things look good, except for one issue.
spdk is latest 18.01.x version, dpdk is 16.07 (+3 dpdk patches to allow compilation) and some minor patches (mainly some memory configuration stuff), kernel is a patched 4.9.6.
build succeeded except for the usage of the dpdk function pci_vfio_is_enabled.
I had to apply the patch bellow, removing the usage of this function and then compilation completed without any issues.
It seems that I am missing some sort of dpdk configuration as I see that the function is built, but not packaged into the generated archive.
I went back to square one and ran the instructions in http://www.spdk.io/doc/getting_started.html, but I see no mention of dpdk there. Actually the ./configure requires it.
My next step is to use a more recent dpdk, but shouldn't this work with my version? Am I missing some dpdk configuration?
BTW, as we are not using vhost, on our 17.07 version we simply use CONFIG_VHOST=n in order to skip this, but I would be happier if we used a better solution.
Shahar
P.S. Here is the patch to remove use of this function:
diff --git a/lib/env_dpdk/vtophys.c b/lib/env_dpdk/vtophys.c
index 92aa256..f38929f 100644
--- a/lib/env_dpdk/vtophys.c
+++ b/lib/env_dpdk/vtophys.c
@@ -53,8 +53,10 @@
#define SPDK_VFIO_ENABLED 1
#include <linux/vfio.h>
+#if 0
/* Internal DPDK function forward declaration */
int pci_vfio_is_enabled(void);
+#endif
struct spdk_vfio_dma_map {
struct vfio_iommu_type1_dma_map map;
@@ -341,9 +343,11 @@ spdk_vtophys_iommu_init(void)
DIR *dir;
struct dirent *d;
+#if 0
if (!pci_vfio_is_enabled()) {
return;
}
+#endif
dir = opendir("/proc/self/fd");
if (!dir) {
3 years, 8 months
SPDK + user space appliance
by Shahar Salzman
Hi all,
Sorry for the delay, had to solve a quarantine issue in order to get access to the list.
Some clarifications regarding the user space application:
1. The application is not the nvmf_tgt, we have an entire applicance to which we are integrating spdk
2. We are currently using nvmf_tgt functions in order to activate spdk, and the bdev_user in order to handle IO
3. This is all in user space (I am used to the kernel/user distinction in order to separate protocol/appliance).
4. The bdev_user will also notify spdk of changes to namespaces (e.g. a new namespace has been added, and can be attached to the spdk subsystem)
I am glad that this is your intention, the question is, do you think that it would be useful to create such a bdev_user module which will allow other users to integrate spdk to their appliance using such a simple threading model? Perhaps such a module will allow easier integration of spdk.
I am attaching a reference application which is does NULL IO via bdev_user.
Regarding the RPC, we have an implementation of it, and will be happy to push it upstream.
I am not sure that using the RPC for this type of bdev_user namespaces is the correct approach in the long run, since the user appliance is the one adding/removing namespaces (like hot plugging of a new NVME device), so it can just call the "add_namespace_to_subsystem" interface directly, and does not need to use an RPC for it.
Thanks,
Shahar
3 years, 10 months
Dynamic base bdev management for multi-tenant virtual bdev
by Andrey Kuzmin
Planning for a multi-tenant virtual bdev driver, I looked into the provided
base bdev management capabilities and found them short of what I need. The
issues I see are outlined below. Let me know if the analysis is correct
and, if yes, are there any plans to provide for the dynamic base bdev
management capabilities in the multi-tenant vbdev use case.
1. Vbdev startup
spdk_vbdev_register at present allows one to register a completely
assembled vbdev (with all base bdevs already examined) only. The root cause
behind that fully-assembled requirement above is spdk_vbdev_set_base_bdevs
call that follows, which assumes that vbdev's base bdevs haven't been set
up yet.
Apparently, a non-trivial multi-tenant vbdev should be allowed to start up
in a partially assembled state; erasure code-based RAID provides a
ready-made example of a vbdev that is expected to be/remain operational
while an arbitrary number of base bdevs is missing permanently or
temporarily, in particular (but not limited to) at startup time.
Furthermore, a vbdev like this should be able to register a hot-plugged
base bdev at any point of runtime, yet again pointing to the need for a
vbdev_register_base_bdev(vbdev, base_bdev) call in addition to/in
replacement of the available spdk_vbdev_set_base_bdevs method (more on this
under Bdev hot plug below).
2. Bdev surprise removal
SPDK bdev ops vector includes .hotremove method which, for each open base
bdev descriptor, gives vbdev module an opportunity to clean up and/or do
any redundancy-related base bdev management.
While .hotremove provides for the vbdev-internal bdev management on hot
remove, spdk_bdev_unregister which completes hot-remove handling in the
bdev layer does not remove base bdev from vbdev's base bdev list, so base
bdev in question still sits on the list after being removed. The reason is
likely the missing vbdev->base_bdevs dynamic management in general and
vbdev_remove_base_bdev(vbdev, bdev) call in particular, required to manage
vbdev->base_bdevs list on a single bdev removal.
3 Bdev hot plug
At present virtual bdev design does seem to provide any support for base
bdev hot-plug. Vbdev's extant .examine method seems to be geared toward
initial vbdev setup in that it assumes no open vbdev descriptors (so that
vbdev to base bdevs descriptor linkage occurs when vbdev is subsequently
opened and its I/O channels are created).
There is currently no .hotplug mechanism complementary to .hotremove that
would propagate base bdev insertion throughout all open vbdev descriptors,
so that vbdev has a chance to set up the I/O channel/do other house-keeping
for the plugged base bdev on each vbdev descriptor/channel open at the
moment of the base bdev insertion.
4. Vbdev shutdown
It appears that, while bdev subsystem start-up proceeds in the expected
bottom-up fashion, with vbdevs instantiated as the underlying base bdevs
show up, the reverse is not true: on bdev subsystem shutdown, I see vbdev's
.hotremove being called where I would expect vbdev being
closed/unregistered.
Understandably, for a vbdev module author it would be very helpful to be
able to differentiate between planned (sub)system shutdown and hot removal
of a base bdev at run time; for this to happen, bdev subsystem shutdown
should proceed top-down, with virtual bdevs unregistered prior to the
underlying bdevs.
Regards,
Andrey
3 years, 11 months
SPDK pooled volume module in bdev
by Sablok, Kunal
Hi,
SPDK pooled volume (PVOL) module is a new bdev module which is responsible for striping various NVMe devices and expose the pooled volume to bdev layer which would enhance the performance and capacity. It can support theoretically 255 base devices which can be easily enhanced if required (currently it is being tested max upto 8 base devices). Multiple strip sizes like 32KB, 64KB, 128KB, 256KB, 512KB etc is supported. New RPC commands like "create pvol", "destroy pvol" and "get pvols" are introduced to configure pooled volume dynamically in a running SPDK system.
Please find attached more information in a ppt with system end to end testing done on this module in SPDK stack in SPDK multi-core environment.
Please find below gerrithub review details:
https://review.gerrithub.io/#/c/spdk/spdk/+/410484/
Regards,
Kunal
3 years, 11 months
Re: [SPDK] SPDK Test Pool Sync
by Meneghini, John
Here’s an agenda for today’s meeting:
1. Fedora, et al, OS update schedule
* Update OS kernel and rpm/dnf tool chain on a schedule (e.g. once every major release)
* Only update between releases when specific fixes or patches are available that SPDK or DPDK needs
2. Maintain support for legacy tool chains and platforms
* Currently, there is no way to build or test legacy builds (e.g., SPDK v17.10 can’t be compiled and tested on the CIT testbeds)
* There isn’t even a mechanism in place to re-create the build/test/CIT platform from v17.10
* SPDK should at least have an SCM controlled document or script that specifies the exact kernel, compiler, and toolchain versions needed to recreate each release
3. Hardware support matrix
* Each release should have a detailed list of supported hardware
* At a minimum, SPDK should maintain a list of recommend hardware.
For example, the attached BOM describes the hardware testbed NetApp is using
Linux Host
Fujitsu RX2560 M2<http://www.fujitsu.com/fts/products/computing/servers/primergy/rack/rx256...>. As configured, the RX2560 M2 (2 processor slot) system contains:
· 2x Intel Xeon E5-2650v4 12C/24T 2.20 GHz (24 cores)
· 64Gb memory (4x 16GB DDR4-2400 R ECC Dimms)
· 1x SATA 6G 500GB 7.2K HOT PL 2.5" HDD (boot disk)
· 2x 2.5" HDD cages for up to 16x 2.5” SAS 3.0 HDD or SSD
o with a 24x SAS 3.0 Expander, SAS 3.0 Backplanes, and SAS 3.0 cable
· 1x 2.5" PCIe-SSD SFF cage for 4x 2.5" PCIe-SSD SFFs
o with 4x *4 PCIe switch & cables
· 6x PCIe Expansion slots
o 3 PCIe slots on Board Full height, 167mm lengh on first CPU:
§ 2X PCIe-Gen3 x8
§ 1X PCIe-Gen3 x16
§ Risercard with 1x *8 and 2x *4 slots (Full height, 252mm length) possible
o 3 PCIe slots Full height, 167mm lengh on second CPU:
§ 1x PCIe-Gen3 x8
§ 2X PCIe-Gen3 x16
* File:RX2560 Invoice.pdf - $6,452.52
RDMA HCAs
We have 2 ConnectX-3 Pro VPI<http://www.mellanox.com/page/products_dyn?product_family=161&mtag=connect...> adapters on loan from the IC interconnect team. These cards support RoCE, however these cards are not currently supported by some internal applications so it was decide - to use the slightly older Mellanox ConnectX-3 VPI<http://www.mellanox.com/page/products_dyn?product_family=119&mtag=connect...>. Fujitsu resells this card as the Fujitsu InfiniBand HCA 40 Gb 1/2 port enhanced QDR adapter, but has not qualified this card on the RX2560 M2. The only RDMA card currently supported by Fujitsu on the RX2560 M2 is the Fujitsu InfiniBand HCA 56 Gb 1/2 port FDR adapter.
* The Fujitsu InfiniBand HCA 40 Gb 1/2 port QDR enhanced<http://www.fujitsu.com/fts/products/computing/servers/primergy/components...> adapter
* Fujitsu S26361-F4475-L103 IB HCA 40Gb 1 port QDR enhanced 1 $522.90
* Compare with the CONNECTX-3 VPI ADPT CARD QSFP QDR 1 Port MCX353A-QCBT - $556.30
* This is a Mellanox ConnectX-3 VPI QDR<http://www.mellanox.com/page/products_dyn?product_family=119&mtag=connect...> adapter
* The Fujitsu InfiniBand HCA 56 Gb 1/2 port FDR<http://www.fujitsu.com/fts/products/computing/servers/primergy/blades/con...> adapter
* Fujitsu S26361-F4533-L102 IB HCA 56Gb 1 port FDR 1 $652.20
* Compare with CONNECT-3 VPI ADPT CARD QSFP FDR 1 Port MCX353A-FCBT adapter -$668.03
* This is a Mellanox ConnectX-3 VPI IB<http://www.mellanox.com/page/products_dyn?product_family=119&mtag=connect...> adapter
Adapter choices:
* CONNECTX-3 PRO VPI ADPT CARD QSFP 1 Port - Part # MCX353A-FCCT - $746.00 (SELECTED)
* CONNECTX-3 PRO VPI ADPT PCIE3 2 Port - Part # MCX354A-FCCT - $1,163.81 (on loan)
* 8 See the Mellanox ConnectX-3 Pro VPI Users Guide<http://www.mellanox.com/related-docs/prod_adapter_cards/PB_ConnectX-3_Pro...>.
* CONNECTX-3 VPI ADPT CARD QSFP QDR 1 Port - Part # MCX353A-QCBT - $556.30
* CONNECT-3 VPI ADPT CARD QSFP QDR 2 Ports - Part # MCX354A-QCBT - $871.70
* See the ConnectX-3 VPI QSFP Users Guide<http://www.mellanox.com/related-docs/user_manuals/ConnectX-3_VPI_Single_a...>
See the Mallanox HCA overview<http://www.mellanox.com/page/infiniband_cards_overview>
PCIe NVMe SSD
The Intel PCIe NVMe DC P3700 Series SSD<http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-dri...> has been recommended by the NVMexpress.org Linux engineers. This is a state of the art PCIe NVMe SSD with 20us response time.
* Fujitsu resells this product as the Fujitsu PCIe-SSD P3700 Series<http://www.fujitsu.com/fts/products/computing/servers/primergy/components...>
* Fujitsu S26361-F5534-L800 SSD PCIe3 800GB Main 2.5 H-P EP 1 $3,250.50/$2,659.50
This is the same product as the
* Intel DCP3700 800GB SFF<http://www.intel.com/buy/us/en/product/components/intel-dcp3700-800gb-800...> From $1,204.29, or the
* Intel DCP3700 800GB AIC<http://www.intel.com/buy/us/en/product/components/intel-dcp3700-800gb-800...> from $1,413.63
Examples:
* Intel Solid-State Drive DC P3700 Series - 800GB - 2.5 Form Factor - Part ID: SSDPE2MD800G401<http://www.colfaxdirect.com/store/pc/viewPrd.asp?idproduct=2190&idcategor...> for $1,695.00
* 800GB SSD DC P3700 PCIE MLC SFF - Part # SSDPE2MD800G401 - $1407.74 (SELECTED)
* 800GB SSD DC P3700 PCIE MLC AIC - Part # SSDPEDMD800G401 - 1,402.82
From: SPDK <spdk-bounces(a)lists.01.org> on behalf of "Luse, Paul E" <paul.e.luse(a)intel.com>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
Date: Tuesday, May 1, 2018 at 11:39 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: [SPDK] SPDK Test Pool Sync
Hi All,
This note is mainly for Seth, Karol & John but all are welcome to read/participate in the discussion of course!
We just talked about this in the community meeting. There are actually 3 SPDK test pools up and running right now as follows:
* Intel, Chandler AZ: This is the “official” one and scheduling is done via a series of home grown scripts. This is the only pool that “counts” in terms of voting on patches right now. Seth does most of the work on keeping this running along with help from the maintainers here in CH as well (Daniel, Ben, Jim).
* Intel, Gdansk Poland: We are in the process of transition away from the Chandler test pool over to this one (few more months). The main difference is that this pool is scheduled with Jenkins however it is a whole different set of hardware as well all located in Poland. The system is currently up and running and you can see it on your recent patches as “Jenkins” however it’s voting still does not count. As soon as we believe it’s stable we will phase out the Chandler system.
* Netapp, one of the coasts I believe ☺: John gave us an update on this today. It is being used internally by Netapp and not fully functional yet but they’ve made great progress so fa. It is based on the Chandler scheduling scripts.
John brought up the possibility of having a little more rigor behind the OS and/or other environmental updates of the test systems in the pool. Specifically we’ve seen OS updates have an impact on stability and even appear to make bugs that don’t seem to be related go away or least get masked. So, I’d like to get a one-off community call going probably next week sometime to brainstorm some more on this and then we can spend some time at the Summit to put some more concrete next steps down.
Seth, Karol and John I’ll send a separate email just to coordinate a time for next week and then reply back to the list here once we have it nailed down for anyone else who wants to join.
Thanks!!!
Paul
3 years, 11 months
Re: [SPDK] 答复: SPDK Dynamic Threading Model
by Meneghini, John
Hi Frank.
Thanks for your suggestion.
In our implementation/application, we don’t use DPDK. This is why the first set of changes we proposed last year were to abstract out the dependencies on DPK. I think I still have copy of the old pull request around for reference.
https://github.com/spdk/spdk/pull/152
We are actually running SPDK in a completely different execution environment, and we need a “native” SPDK dynamic threading model that can be supported on any platform, without DPDK.
An second RFC patch has been pushed up to GerritHub for review. Please see the commit message of these two patches for a complete description of the proposed change.
https://review.gerrithub.io/#/c/spdk/spdk/+/412277/
https://review.gerrithub.io/#/c/spdk/spdk/+/412093/
/John
40.5. The L-thread subsystem
The L-thread subsystem resides in the examples/performance-thread/common directory and is built and linked automatically when building the l3fwd-thread example.
The subsystem provides a simple cooperative scheduler to enable arbitrary functions to run as cooperative threads within a single EAL thread. The subsystem provides a pthread like API that is intended to assist in reuse of legacy code written for POSIX pthreads.
The following sections provide some detail on the features, constraints, performance and porting considerations when using L-threads.
From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Huang Frank <kinzent(a)hotmail.com>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
Date: Wednesday, May 23, 2018 at 9:46 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: [SPDK] 答复: SPDK Dynamic Threading Model
Hi,
Why not consider to use lpthread provided by DPDK?
http://dpdk.org/doc/guides-16.04/sample_app_ug/performance_thread.html#lt...
Frank Huang
________________________________
发件人: SPDK <spdk-bounces(a)lists.01.org> 代表 Meneghini, John <John.Meneghini(a)netapp.com>
发送时间: 2018年5月23日 4:12
收件人: Storage Performance Development Kit
主题: [SPDK] RFC: SPDK Dynamic Threading Model
As discussed during the Summit last week, we believe SPDK needs support for a dynamic threading model. An RFC patch has been pushed upstream for review.
https://review.gerrithub.io/#/c/spdk/spdk/+/412093/
This patch is a beginning point for our proposed changes. Improvements will be made with subsequent patches.
The description below is taken from https://github.com/spdk/spdk/issues/308
SPDK needs to support a dynamic threading model where reactors are NOT bound to lcores.
Many applications need SPDK to support a threading model that:
1. Does not assume a static number of threads
2. Does not bind threads to cores (this burns up cores)
3. Does not assume all treads use the same polling model
Removing these assumptions from the SPDK libraries will allow:
* Different applications to share the SPDK libraries on the same platform
* E.g. FC-NVMe, RDMA-NVMe, and NVMe
* Different platforms to support the same applications with the same libraries
* E.g. a 4 core platform and a 128 core plaform, a PowerPC and NFS traffic
* Different workloads at different scales
* E.g. 1 NVMF Host with 1 Subsystem and 1 Namespace, or 16 NVMF Hosts with 100 Subsystems and 1,000 namespaces.
* In particular, in SPDK, NVMF threads need to come and go depending upon the “NVMF load”.
More Dynamic Use Cases Coming
With the advent of FC-NVMe (which uses NPIV to visualize FC ports) NVMF Subsystem Ports and Host Ports are not static. Different Hosts and Subsystems can have a different number of Ports, and Ports can be dynamically added and removed from the configuration. This means:
* The same platform may end up having different number of Subsystem ports at various points in its lifecycle
* The SPDK FC-NVMe application does NOT know up front how many ports it will have.
Expected Behavior
1. SPDK libraries should not assume a static number of threads
2. SPDK libraries should bind threads to cores only optionally - supporting both static and dynamic threading models
3. SPDK libraries should support a Hybrid polling model (modified run to completion)
Current Behavior
1. SPDK libraries assume a static number of threads
2. SPDK libraries bind threads to cores
3. SPDK libraries assume all treads use the same polling model
Possible Solution
Proposal to solve above Use Cases:
Use the spdk_nvmf_poll_group (PG) as the unit of threading abstraction
* Use PG as the fundamental unit on which a thread operates
* The spdk_thread will be a “virtual” thread that gets tied into a PG (1-1 relationship)
* Create PGs as and when hardware ports (and associated queue-pairs) come to life.
* No dependency between a PG and a “real” thread.
* A PG can be picked up by any “real” thread and worked upon. The PG contains everything needed for IO handling.
* PG continues to contain spdk_thread. spdk_thread continues same mechanisms for IO channels to different NS etc. etc.
* PG contains vendor data. Eg. A “ring” for depositing asynchronous callback events from the backend OR management events that come from external modules.
* spdk_thread contains thread_context that points to a PG instead of a reactor. So messages from the library get routed to the PG “ring” instead of a thread/reactor event ring.
Understanding the intent of the event library, it is believed this is the place for customization. However, the current event library assumes a threading model that's a part of the util library. Moreover, many of the other SPDK core libraries assume the same threading model as the util library. If the SPDK util library can be modified to support these use dynamic threading use cases, all applications would be able to use the SPDK framework more effectively.
Steps to Reproduce
This is an enhancement. There is no bug.
Context (Environment including OS version, SPDK version, etc.)
Would like to provide these enhancements in V18.07.
3 years, 11 months
Handling of physical disk removals
by Baruch Even
Hi,
I'm using spdk for local nvme through the nvme interface, I find that
physical disk removals are not handled properly for my use case and wonder
if others see it that way as well and if there is an intention to fix this.
Our system uses long running processes that control one or more disks at a
time, if a disk fails it may drop completely from the pcie bus and it will
also look like that if the disk is physically removed (say a technician
mistakes the disk that he should replace).
The problem that I see is that spdk doesnt consider a device completely
disappearing from the bus and will try to release the io qpair by sending
the delete io sq and delete io cq commands, both of these will never get an
answer (the device is not on the pcie device anymore) and there is no
timeout logic in that code path. This means two things, the process will
halt forever and there is an effective memory leak which currently means
that we need to restart the process. Now, our system is resilient enough
that restarting the process is not a big deal but it is a very messy way to
go about handlign a physical drive removal.
Have others seen this behavior? Does it bother others?
For my own use I put a timeout in there of a few seconds and that solves it
for me.
Baruch Even
--
*Baruch Even, Software Developer E baruch(a)weka.io <liran(a)weka.io>
www.weka.io <http://www.weka.io>*
3 years, 11 months