On Thu, May 28, 2020 at 11:59 AM Vaibhav Jain <vaibhav(a)linux.ibm.com> wrote:
Thanks for this taking time to look into this Dan,
Agree with the points you have made earlier that I am summarizing below:
* This is better done in ndctl rather than ipmctl.
* Should only expose general performance metrics and not performance
counters. Performance counter should be exposed via perf
* Vendor specific metrics to be separated from generic performance
One way to split generic and vendor specific metrics might be to report
generic performance metrics together with dimm health metrics such as
"temprature_celsius" or "spares_percentage" that are already reported
by dimm health output.
Vendor specific performance metrics can be reported as a seperate object
in the json output. Something similar to output below:
# ndctl list -DH --stats --vendor-stats
/* Generic performance metrics/stats */
/* Vendor specific stats for the dimm */
"Controller Reset Count":10
"Controller Reset Elapsed Time": 3600
"Power-on Seconds": 3600
Looks reasonable, although I think I want to maintain the
"Linux-style" format for the keys i.e. lowercase + underbars. If only
for consistency, but it also simplifies parsers that have this far
have assumed no whitespace in the key names.
Dan Williams <dan.j.williams(a)intel.com> writes:
> On Wed, May 27, 2020 at 12:24 PM Dan Williams <dan.j.williams(a)intel.com>
>> > This was done by adding two new dimm-ops callbacks that were
>> > implemented by the papr_scm implementation within libndctl. These
>> > callbacks are invoked by newly introduce code in
>> > that format the returned stats from these new dimm-ops and transform
>> > them into a json-object to later presentation. I would request you to
>> > look at RFC patch-set to understand the implementation details.
>> I'm ok to add some stats to ndctl, but I want ndctl to be limited to
>> general statistics and not performance counters. Performance counters
>> and performance events should be abstracted through perf where
> Another aspect that helps common statistics is to expose them in
> sysfs. I'm going to go review your proposed ioctl mechanism, but I
> would hope that is reserved for multi-field command payloads that need
> to be sent as a unit rather than statistics retrieval that is amenable
> to a sysfs interface.
The patchset is using a machenism similar to GET_CONFIG_SIZE/DATA to
retrive a struct composed of tuples of (stat-id, stat-value) from
papr_scm and then exposes them to ndctl via some new dimm-ops.
I think sysfs is a better fit for this. Yes, we could make this work
as you have identified, but I think it was a mistake that I did this
for health properties especially the static ones.
0ead11181fe0 acpi, nfit: Collect shutdown status
That started as data which was only available via ioctl, but It
simplified userspace to have a sysfs attribute. In addition to the
built-in enumeration / capability detection that sysfs affords, it
also allows for the kernel to cache this property once that many
different userspace agents might want to read. Between perf for
dynamic peformance properties, and sysfs for static / health data,
what's left for the ioctl path?