On Tue, Apr 16, 2019 at 8:31 AM Brice Goglin <Brice.Goglin(a)inria.fr> wrote:
Le 08/04/2019 à 21:55, Brice Goglin a écrit :
Le 08/04/2019 à 16:56, Dan Williams a écrit :
Yes, I agree with all of the above, but I think we need a way to fix
this independent of the HMAT data being present. The SLIT already
tells the kernel enough to let tooling figure out equidistant "local"
nodes. While the numa_node attribute will remain a singleton the
tooling needs to handle this case and can't assume the HMAT data will
be present.
So you want to export the part of SLIT that is currently hidden to
userspace because the corresponding nodes aren't registered?
With the patch below, I get 17 17 28 28 in dax0.0/node_distance which
means it's close to node0 and node1.
The code is pretty much a duplicate of read_node_distance() in
drivers/base/node.c. Not sure it's worth factorizing such small functions?
The name "node_distance" (instead of "distance" for NUMA nodes) is
also
subject to discussion.
Here's a better patch that exports the existing routine for showing
node distances, and reuses it in dax/bus.c and nvdimm/pfn_devs.c:
# cat /sys/class/block/pmem1/device/node_distance
28 28 17 17
# cat /sys/bus/dax/devices/dax0.0/node_distance
17 17 28 28
By the way, it also handles the case where the nd_region has no
valid target_node (idea stolen from kmem.c).
Are there other places where it'd be useful to export that attribute?
Ideally we could just export it in the region sysfs directory,
but I can't find backlinks going from daxX.Y or pmemZ to that
region directory :/
I understand where you're trying to go, but this is too dax-device
specific. What about a storage-controller in the topology that is
equidistant from multiple cpu nodes. I'd rather solve this from the
tooling perspective to lookup cpu nodes that are equidistant to the
device's "numa_node".
I'd rather not teach the kernel to export this extra node_distance
information in favor of teaching numactl to consider equidistant cpu
nodes in its default node masks.