On Thu, May 28, 2020 at 11:59 AM Vaibhav Jain <vaibhav(a)linux.ibm.com> wrote:
Thanks for this taking time to look into this Dan,
Agree with the points you have made earlier that I am summarizing below:
* This is better done in ndctl rather than ipmctl.
* Should only expose general performance metrics and not performance
counters. Performance counter should be exposed via perf
* Vendor specific metrics to be separated from generic performance
One way to split generic and vendor specific metrics might be to report
generic performance metrics together with dimm health metrics such as
"temprature_celsius" or "spares_percentage" that are already reported
by dimm health output.
Vendor specific performance metrics can be reported as a seperate object
in the json output. Something similar to output below:
# ndctl list -DH --stats --vendor-stats
/* Generic performance metrics/stats */
/* Vendor specific stats for the dimm */
"Controller Reset Count":10
"Controller Reset Elapsed Time": 3600
"Power-on Seconds": 3600
How do you tell generic from vendor-specific stats, though?
Controller reset count and power-on time may not be reported by some
controllers but sound pretty generic.
Even if you declare that the stats reported by all controllers
available at this moment are generic a later one may not report some of
these 'generic' statistics, or report them in different way/units, or
may simply not report anything at all for some technical reason.
Kernels that do not have this feature will not report anything at all