From: SPDK [mailto:email@example.com] On Behalf Of Shahar Salzman
Sent: Thursday, July 11, 2019 9:32 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: [SPDK] Sharing namespaces between subsystems
Looking to support native NVMe multipathing on Linux, I am looking at the
specifications regarding controller IDs.
Our system is a distributed system exposing the same logical devices through
multiple physical hosts, each running its own SPDK instance.
So say you have node A and node B, and they both expose namespace '123' where
'123' is the namespace UUID. If I'm understanding your scenario correctly,
you'd like a client to see node A and node B as the same subsystem, with two valid
paths to get there. Is that correct?
Looking at the code (v19.04), I see that controller IDs are generated in the 0-
0xFFF0 range, and verify that within the subsystem they are unique before
returning the value to the controller. This method of serially generating the
controller ID means that on different nodes we will probably get the same
controller ID, which means that the host may identify a new controller as one
which already exists.
This means I need to either limit the controller ID range per spdk instance, and
remain spec aligned, or expose a different subsystem per physical host, solving
the controller ID issue, but not conforming to the spec...
The SPDK NVMe-oF target is not distributed in and of itself. To make it distributed,
you'll need to make changes to make the code correctly coordinate with all of the
nodes that compose the subsystem. Certainly selecting a unique controller identifier is
one area of coordination, but there are likely many more (anything stored in struct
I looked at the namespace ID section in NVMe 1.4, and there doesn't seem to be
any mention of world wide uniqueness, so it seems that the correct
implementation would be to limit the controller ID range. Would an API to limit
the controller ID range in SPDK be acceptable?
Do you know of any work being done on namespace sharing between
subsystems, and on world wide unique namespace IDs?
There has been some discussion of a new NMIC bit that indicates that a namespace can be
shared across two separate subsystems (there is already a bit that says whether it can be
shared across two controllers in the same subsystem). But I confirmed that is not in the
latest specification. I think sharing a namespace across two separate subsystems is
actually a more elegant solution to the problem, so we can hope they decide to move
forward with that.
I'd be fine with an API that lets the user provide a callback to generate controller
ids for each subsystem. Then your application can set it up to work however you want. My
primary concern is that this may just be the tip of the iceberg, so I'd like to hold
off on going this route until we understand all of the different pieces of data that are
going to need coordination across the nodes. Just from a quick glance, some problematic
1) reservations (which belong to a namespace and are emulated in software. I think you
just have to disable this.)
2) discovery services (I assume you have a separate discovery service that is
cluster-aware and the one in SPDK is turned off?)
3) namespace ids (which I think we already let the user pick)
4) Subsystem state (pause/resume). I don't think SPDK has RPCs to pause and resume
subsystems directly. The pause and resume just happens automatically when you do some
other management operation like add or remove a namespace. However, in a distributed
implementation you'd need the orchestrator to pause the subsystem on all nodes, then
do the management operation, then resume on all nodes. I think you'd need additional
RPCs for this. The pause and resume functions are part of the public nvmf API, so maybe
your application is calling those and this is all fine.
Have you already thought through these sorts of problems? Is it just the controller ID
that you haven't solved?
SPDK mailing list