On Fri, Jun 7, 2019 at 12:57 PM Dave Hansen <dave.hansen(a)intel.com> wrote:
On 6/7/19 12:27 PM, Dan Williams wrote:
> In support of optionally allowing either application-exclusive and
> core-kernel-mm managed access to differentiated memory, claim
> EFI_MEMORY_SP ranges for exposure as device-dax instances by default.
> Such instances can be directly owned / mapped by a
> platform-topology-aware application. Alternatively, with the new kmem
> facility , the administrator has the option to instead designate that
> those memory ranges be hot-added to the core-kernel-mm as a unique
> memory numa-node. In short, allow for the decision about what software
> agent manages specific-purpose memory to be made at runtime.
It's probably worth noting that the reason the memory lands into the
state of being controlled by device-dax by default is that device-dax is
nice. It's actually willing and able to give up ownership of the memory
when we ask. If we added to the core-mm, we'd almost certainly not be
able to get it back reliably.
Anyway, thanks for doing these, and I really hope that the world's
BIOSes actually use this flag.
It should be noted that the flag is necessary, but not sufficient to
route this memory range to device-dax. The BIOS must also publish ACPI
HMAT performance data for the range so the OS has a chance of knowing
*why* the memory is "reserved for a specific purpose", and delineate
the boundaries of multiple performance differentiated memory ranges
that might be combined into one shared / contiguous EFI memory
With no HMAT the memory will be reserved, but no dax-device will be
surfaced. Perhaps this implementation also needs a WARN_TAINT(...,
TAINT_FIRMWARE_WORKAROUND...) to scream about a BIOS that fails to
publish the required HMAT entries, or perhaps even better a command
line option to ignore the flag so that the core-mm can pick up the
memory by default?