On Mon, May 25, 2020 at 2:01 AM Vaibhav Jain <vaibhav(a)linux.ibm.com> wrote:
Hello,
I am looking for some community feedback on these two Problem-statements:
1.How to expose NVDIMM performance statistics in an arch or nvdimm vendor
agnostic manner ?
2. Is there a common set of performance statistics for NVDIMMs that all
vendors should provide ?
It would be nice to try to encourage some common keyword metrics
across vendors, but I suspect that like perf there will always be some
statistics that are architecture specific.
Problem context
===============
While working on bring up of PAPR SCM based NVDIMMs[1] for arch/powerpc
we want to expose certain dimm performance statistics like "Media
Read/Write Counts", "Power-on Seconds" etc to user-space [2]. These
performance statistics are similar to what ipmctl[3] reports for Intel®
Optane™ persistent memory via the '-show performance' command line
arg. However the reported set of performance stats doesn't cover the
entirety of all performance stats supported by PAPR SCM based NVDimms.
For example here is a subset of performance stats which are specific to
PAPR SCM NVDimms and that not reported by ipmctl:
* Controller Reset Count
* Controller Reset Elapsed Time
* Power-on Seconds
* Cache Read Hit Count
* Cache Write Hit Count
Possibility of updating ipmctl to add support for these performance
statistics is greatly hampered by no support for ACPI on Powerpc
arch. Secondly vendors who dont support ACPI/NFIT command set
similar to Intel® Optane™ (Example MSFT) are also left out in
lurch. Problem-statement#1 points to this specific problem.
ipmctl is vendor specific and OS agnostic.
ndctl is vendor/platform agnostic and Linux specific.
ndctl is built for this abstraction as it depends on libnvdimm of
which ACPI/NFIT is just one of many co-equal bus providers. In short,
I would expect this support to land in ndctl, not ipmctl.
The only thing that has prevented ndctl from adding performance
statistics was the lack of a public specification, otherwise ndctl
aims to abstract and provide a common tool for any publicly specified
persistent memory device.
Additionally in absence of any pre-agreed set of performance
statistics
which all vendors should support, adding support for such a
functionality in ipmctl may not bode well of other nvdimm vendors. For
example if support for reporting "Controller Reset Count" is added to
ipmctl then it may not be applicable to other vendors such as Intel®
Optane™. This issue is what Problem-statement#2 refers to.
Possible Solution for Problem#1
===============================
One possible solution to Problem#1 can to add support for reporting
NVDIMM performance statistics in 'ndtcl'. 'libndctl' already has a layer
that abstracts underlying NVDIMM vendors (via struct ndctl_dimm_ops),
making supporting different NVDIMM vendors fairly easy. Also ndctl is
more widely used compared to 'ipmctl', hence adding such a functionality
to ndctl would make it more widely used.
Above solution was implemented as RFC patch-set[2] that exposes these
performance statistics through a generic abstraction in libndctl and
added a presentation layer for this data in ndctl[4]. It added a new
command line flags '--stat' to ndctl to report *all* nvdimm vendor
reported performance stats. The output is similar to one below:
# ndctl list -D --stats
[
{
"dev":"nmem0",
"stats":{
"Power-on Seconds":603931,
"Media Read Count":0,
"Media Write Count":6313,
I wonder if this should be explicit about the platform-specific stats
versus the common ones? At least with the health data implementation
so far it is implementing a common set of keywords across vendors. I
just worry that someone that writes a useful tool for this data needs
to understand that their tool is vendor generic, or tied to a given
implementation. I'm thinking "platform_stats" for the ones that are
tied to the nvdimm-bus-provider vs "stats" that might reasonably show
up on more than one vendor's implementation.
}
}
]
This was done by adding two new dimm-ops callbacks that were
implemented by the papr_scm implementation within libndctl. These
callbacks are invoked by newly introduce code in 'util/json-smart.c'
that format the returned stats from these new dimm-ops and transform
them into a json-object to later presentation. I would request you to
look at RFC patch-set[2] to understand the implementation details.
I'm ok to add some stats to ndctl, but I want ndctl to be limited to
general statistics and not performance counters. Performance counters
and performance events should be abstracted through perf where
possible.
Possibled Solution for Problem#2
================================
Solution to Problem-statement#2 is what eludes me though. If there is a
minimal set of performance stats (similar to what ndctl enforces for
health-stats) then implementation of such a functionality in
ndctl/ipmctl would be easy to implement. But is it really possible to
have such a common set of performance stats that NVDIMM vendors can
expose.
Patch-set[2] though tries to bypass this problem by letting the vendor
descide which performance stats to expose. This opens up a possibility
of this functionality to abused by dimm vendors to reports arbirary data
through this flag that may not be performance-stats.
Summing-up
==========
In light of above, requesting your feedback as to how
problem-statements#{1, 2} can be addressed within ndctl subsystem. Also
are these problems even worth solving.
Yes, I think it's worth solving, just the hard part of giving
implementations enough freedom to convey the data they need, but not
enough freedom that we damage ndctl and the kernel's ability to
maintain a common interface across vendors.
Appreciate the thorough write-up.