On 6/9/2017 3:06 PM, Dan Williams wrote:
On Fri, Jun 9, 2017 at 11:56 AM, Linda Knippers
> On 6/9/2017 2:10 PM, Dan Williams wrote:
>>>> This format for health_detail makes the json harder to consume for
>>>> upper-layer applications. Every field should be associated with a
>>>> key-id even if it's just a flag.
>>> Ok. I thought we had discussed this format with my RFC patch a
>>> while back. The only change I made was adding underscores instead
>>> of spaces as requested.
>> Yeah, I thought I had brought up the array of json-boolean objects vs
>> json-string objects before.
> Hmm...maybe I missed that, but ok.
>>> The items in the array are all associated with a single validation flag.
>>> There are some additional groups of fields that I plan to add as well,
>>> like a group of error counters, so do you want everything flat?
>> Yes, after writing some scripts to parse json I think flatter is
>> better unless the data itself is actually hierarchical.
>>>> I think I'd like to keep the json
>>>> topology flatter. We already have support for the NFIT health state
>>>> flags in this form:
>>>> So, let's just add additional flag names and omit the ones that are
>>>> duplicates of the NFIT-level health state flags.
>>> None of them are duplicates because the NFIT state flags are static from
>>> time while the health status detail changes at runtime.
>> Yes, but it's a pain for applications that now have to deal with two
>> different flag names for similar indicators. So for example if
>> "flag_failed_arm" is false at startup, but "arm_error"
>> I would think that would just be reflected in "flag_failed_arm". In
>> other words override the static flag_failed_arm value with the dynamic
>> result so that monitor applications need only learn one json key for
>> the same data.
> But they're different things. One represents boot time state and one
> runtime. One is an NFIT flag, not even reported as health today, and one is
> runtime health. Even your commit message says they're distinct from SMART
> health data. Knowing that my NVDIMM came up healthy and then had an
> or vice versa, is important.
I agree, but I don't think it's ndctl's job to maintain that
distinction. The default case is "I want to know the present state of
my dimm", if an environment wants to know "has my state changed over
time" then it should be logging the health state snapshots and
If you want to know the state of the NVDIMM, that's the health information
for systems that provide this information. Right now for the information
we're talking about, that's systems that support the HPE1 DSM family.
It is different from the NFIT device flag, which represents the state of
the NVDIMM at the point at which the FW created the NFIT table, which would
apply to all NFIT-commpliant platforms.
If you try to combine the two, it's not clear what state you're getting
because it's not obvious from ndctl whether it's static or dynamic data.
I believe that's why you correctly didn't call the state flags "health"
> Maybe the state flags should be renamed to reflect what they
> The "true" value for those is probably redundant too because I don't
Yes, the 'true' value is redundant. The goal is to get the flag name
into a json-key rather than a json-value.
> output anything if they're fault, although now I'm not sure those NFIT
> flags are being displayed properly at all. On my system, /sys../nfit/flags
> shows "smart_notify", which I assume means I should see
> but I don't. I'll look at that separately.
The 'smart_notify' flag is whether the DIMM supports health state
notifications, the 'flag_smart_event' indicates if a smart event was
pending. So your system supports notifications and the NFIT says no
SMART event was pending.
Ok, so you display all the nfit flags but one. The name is a bit unclear.
And yes, I should have reviewed the patch in April. I'm happy to post a
patch to make the names a little clearer now.