On Mon, Oct 22, 2018 at 1:18 PM Dave Hansen <dave.hansen(a)linux.intel.com> wrote:
Persistent memory is cool. But, currently, you have to rewrite
your applications to use it. Wouldn't it be cool if you could
just have it show up in your system like normal RAM and get to
it like a slow blob of memory? Well... have I got the patch
series for you!
This series adds a new "driver" to which pmem devices can be
attached. Once attached, the memory "owned" by the device is
hot-added to the kernel and managed like any other memory. On
systems with an HMAT (a new ACPI table), each socket (roughly)
will have a separate NUMA node for its persistent memory so
this newly-added memory can be selected by its unique NUMA
This is highly RFC, and I really want the feedback from the
nvdimm/pmem folks about whether this is a viable long-term
perversion of their code and device mode. It's insufficiently
documented and probably not bisectable either.
1. The device re-binding hacks are ham-fisted at best. We
need a better way of doing this, especially so the kmem
driver does not get in the way of normal pmem devices.
2. When the device has no proper node, we default it to
NUMA node 0. Is that OK?
3. We muck with the 'struct resource' code quite a bit. It
definitely needs a once-over from folks more familiar
with it than I.
4. Is there a better way to do this than starting with a
copy of pmem.c?
So I don't think we want to do patch 2, 3, or 5. Just jump to patch 7
and remove all the devm_memremap_pages() infrastructure and dax_region
The driver should be a dead simple turn around to call add_memory()
for the passed in range. The hard part is, as you say, arranging for
the kmem driver to not stand in the way of typical range / device
claims by the dax_pmem device.
To me this looks like teaching the nvdimm-bus and this dax_kmem driver
to require explicit matching based on 'id'. The attachment scheme
would look like this:
echo dax0.0 > /sys/bus/nd/drivers/dax_kmem/new_id
echo dax0.0 > /sys/bus/nd/drivers/dax_pmem/unbind
echo dax0.0 > /sys/bus/nd/drivers/dax_kmem/bind
At step1 the dax_kmem drivers will match no devices and stays out of
the way of dax_pmem. It learns about devices it cares about by being
explicitly told about them. Then unbind from the typical dax_pmem
driver and attach to dax_kmem to perform the one way hotplug.
I expect udev can automate this by setting up a rule to watch for
device-dax instances by UUID and call a script to do the detach /