This is primarily for Shuhei but please feel free, anyone, to respond :)
Adding support for Intel's next generation offload engine is going well (Note, the feature is not available in HW yet, I'm using a simulator to do dev/test). Currently support exists, or is about to land on master, for:
Copy, fill, dual-cast, CRC32C, compare and the ability to submit batches of commands.
Currently these are only being used by a new tool in /examples/accel/perf but once they all land and I've added some more tests, we'll start using them in SPDK modules - the most notable uses will be for CRC32C 9iscsi) and DIF/DIX throughout the stack. There will be other uses (compare, fill, copy, etc) as well but those are the big ones.
I've just now started looking at DIF/DIX and have determined that using these within SPDK won't be quite as straightforward as some of the others. I'll explain what I'm thinking after briefly summering the DSA DIF/DIX functions (more detail is available in the public spec at https://software.intel.com/content/www/us/en/develop/download/intel-data-...)
Note: there is no SGL support in any of these, all are single src and/or dst:
* DIF Check: The DIF Check operation computes the Data Integrity Field (DIF) on the source data and compares the computed DIF to the DIF contained in the source data.
* DIF Insert: The DIF Insert operation copies memory from the Source Address to the Destination Address, while computing the Data Integrity Field (DIF) on the source data and inserting the DIF into the output data.
* DIF Strip: The DIF Strip operation copies memory from the Source Address to the Destination Address, removing the Data Integrity Field (DIF). It optionally computes the DIF on the source data and compares the computed DIF to the DIF contained in the source data.
* DIF Update: The DIF Update operation copies memory from the Source Address to the Destination Address. It optionally computes the Data Integrity Field (DIF) on the source data and compares the computed DIF to the DIF contained in the data. It simultaneously computes the DIF on the source data using Destination DIF fields in the descriptor and inserts the computed DIF into the output data.
Upon initial review of the relatively complex implementation of DIF?DIX we have in SPDK I have the following observations that I'm hoping to get some feedback on:
* It looks like we require SGL in most if not all cases. I can go through them one by one but wanted to get an initial feel mainly from Shuhei on how lack of SGL support impacts our ability to use DIF?DIX offload w/DSA before I start adding support :)
* With the exception of DIF Check, all of the DSA functions include a copy (I can only assume they figured a use case where they are moving data from a host buffer into a different memory subsystem in prep for DMA'ing to disk). It looks like most if not all of our calculations are done on fixed buffers. I see a few copy functions in diff.c but I don't see them used anywhere
I'm almost thinking the DSA functions are too "simple" for our current implementation but wonder if there's some refactoring we can do to make use of them. I don't know if the DSA CRC32C engine calculates the same exact CRC as the DIF/DIX functions but if so (I can verify) at a minum maybe use just accelerate the CRCs called from funcs within diff.c
Thoughts? We can chat in a community meeting soon too but email might be easier to get us all on the amge page first.
With this message I wanted to update SPDK community on state of VPP socket abstraction as of SPDK 19.07 release.
At this time there does not seem to be a clear efficiency improvements with VPP. There is no further work planned on SPDK and VPP integration.
As some of you may remember, SPDK 18.04 release introduced support for alternative socket types. Along with that release, Vector Packet Processing (VPP)<https://wiki.fd.io/view/VPP> 18.01 was integrated with SPDK, by expanding socket abstraction to use VPP Communications Library (VCL). TCP/IP stack in VPP<https://wiki.fd.io/view/VPP/HostStack> was in early stages back then and has seen improvements throughout the last year.
To better use VPP capabilities, following fruitful collaboration with VPP team, in SPDK 19.07, this implementation was changed from VCL to VPP Session API from VPP 19.04.2.
VPP socket abstraction has met some challenges due to inherent design of both projects, in particular related to running separate processes and memory copies.
Seeing improvements from original implementation was encouraging, yet measuring against posix socket abstraction (taking into consideration entire system, i.e. both processes), results are comparable. In other words, at this time there does not seem to be a clear benefit of either socket abstraction from standpoint of CPU efficiency or IOPS.
With this message I just wanted to update SPDK community on state of socket abstraction layers as of SPDK 19.07 release. Each SPDK release always brings improvements to the abstraction and its implementations, with exciting work on more efficient use of kernel TCP stack - changes in SPDK 19.10 and SPDK 20.01.
However there is no active involvement at this point around VPP implementation of socket abstraction in SPDK. Contributions in this area are always welcome. In case you're interested in implementing further enhancements of VPP and SPDK integration feel free to reply, or to use one of the many SPDK community communications channels<https://spdk.io/community/>.