-----Original Message-----
From: Dan Williams [mailto:dan.j.williams@intel.com]
Sent: Monday, July 9, 2018 3:27 PM
To: Qi, Fuli/斉 福利 <qi.fuli(a)jp.fujitsu.com>
Cc: linux-nvdimm <linux-nvdimm(a)lists.01.org>
Subject: Re: [PATCH v9 1/3] ndctl, monitor: add ndctl monitor
On Sun, Jul 8, 2018 at 11:22 PM, Qi, Fuli <qi.fuli(a)jp.fujitsu.com> wrote:
>> -----Original Message-----
>> From: Dan Williams [mailto:dan.j.williams@intel.com]
>> Sent: Monday, July 9, 2018 3:04 PM
>> To: Qi, Fuli/斉 福利 <qi.fuli(a)jp.fujitsu.com>
>> Cc: linux-nvdimm <linux-nvdimm(a)lists.01.org>
>> Subject: Re: [PATCH v9 1/3] ndctl, monitor: add ndctl monitor
>>
>> On Sun, Jul 8, 2018 at 9:59 PM, Qi, Fuli <qi.fuli(a)jp.fujitsu.com> wrote:
>> >> -----Original Message-----
>> >> From: Dan Williams [mailto:dan.j.williams@intel.com]
>> >> Sent: Sunday, July 8, 2018 5:06 AM
>> >> To: Qi, Fuli/斉 福利 <qi.fuli(a)jp.fujitsu.com>
>> >> Cc: linux-nvdimm <linux-nvdimm(a)lists.01.org>
>> >> Subject: Re: [PATCH v9 1/3] ndctl, monitor: add ndctl monitor
>> >>
>> >> On Thu, Jun 28, 2018 at 7:30 PM, QI Fuli <qi.fuli(a)jp.fujitsu.com>
wrote:
>> >> > Ndctl monitor is used for monitoring the smart events of nvdimm
DIMMs.
>> >> > When a smart event fires, monitor will output the notifications
>> >> > which include dimm health status and evnet informations to
>> >> > syslog or a logfile by setting [--logfile] option. The
>> >> > notifications follow json format and can be consumed by log
collectors like
Fluentd.
>> >> >
>> >> > The objects to monitor can be selected by setting [--dimm]
>> >> > [--region] [--namespace] [--bus] options and the event type can
>> >> > be filtered by setting [--dimm-event] option. These options
>> >> > support multiple space-separated arguments.
>> >> >
>> >> > Ndctl monitor can be forked as a daemon by using [--daemon]
>> >> > option, such as:
>> >> > # ndctl monitor --daemon --logfile /var/log/ndctl/monitor.log
>> >> >
>> >> > Signed-off-by: QI Fuli <qi.fuli(a)jp.fujitsu.com>
>> >> > ---
>> >> > builtin.h | 1 +
>> >> > ndctl/Makefile.am | 3 +-
>> >> > ndctl/lib/libndctl.sym | 1 +
>> >> > ndctl/lib/smart.c | 17 ++
>> >> > ndctl/libndctl.h | 6 +
>> >> > ndctl/monitor.c | 531
+++++++++++++++++++++++++++++++++++++++++
>> >> > ndctl/ndctl.c | 1 +
>> >> > util/filter.h | 9 +
>> >> > 8 files changed, 568 insertions(+), 1 deletion(-) create mode
>> >> > 100644 ndctl/monitor.c
>> >> >
>> >> > diff --git a/builtin.h b/builtin.h index d3cc723..675a6ce 100644
>> >> > --- a/builtin.h
>> >> > +++ b/builtin.h
>> >> > @@ -39,6 +39,7 @@ int cmd_inject_error(int argc, const char
>> >> > **argv, void *ctx); int cmd_wait_scrub(int argc, const char
>> >> > **argv, void *ctx); int cmd_start_scrub(int argc, const char
>> >> > **argv, void *ctx); int cmd_list(int argc, const char **argv,
>> >> > void *ctx);
>> >> > +int cmd_monitor(int argc, const char **argv, void *ctx);
>> >> > #ifdef ENABLE_TEST
>> >> > int cmd_test(int argc, const char **argv, void *ctx); #endif
>> >> > diff --git a/ndctl/Makefile.am b/ndctl/Makefile.am index
>> >> > d22a379..7dbf223
>> >> > 100644
>> >> > --- a/ndctl/Makefile.am
>> >> > +++ b/ndctl/Makefile.am
>> >> > @@ -16,7 +16,8 @@ ndctl_SOURCES = ndctl.c \
>> >> > util/json-smart.c \
>> >> > util/json-firmware.c \
>> >> > inject-error.c \
>> >> > - inject-smart.c
>> >> > + inject-smart.c \
>> >> > + monitor.c
>> >> >
>> >> > if ENABLE_DESTRUCTIVE
>> >> > ndctl_SOURCES += ../test/blk_namespaces.c \ diff --git
>> >> > a/ndctl/lib/libndctl.sym b/ndctl/lib/libndctl.sym index
>> >> > e939993..f64df56 100644
>> >> > --- a/ndctl/lib/libndctl.sym
>> >> > +++ b/ndctl/lib/libndctl.sym
>> >> > @@ -366,4 +366,5 @@ global:
>> >> > ndctl_namespace_inject_error2;
>> >> > ndctl_namespace_uninject_error2;
>> >> > ndctl_cmd_ars_stat_get_flag_overflow;
>> >> > + ndctl_cmd_smart_get_event_flags;
>> >> > } LIBNDCTL_15;
>> >> > diff --git a/ndctl/lib/smart.c b/ndctl/lib/smart.c index
>> >> > 0455252..90a65d0 100644
>> >> > --- a/ndctl/lib/smart.c
>> >> > +++ b/ndctl/lib/smart.c
>> >> > @@ -101,6 +101,23 @@ NDCTL_EXPORT unsigned int
>> >> > ndctl_cmd_smart_threshold_get_temperature(
>> >> >
>> >> > smart_cmd_op(smart_threshold_get_supported_alarms, unsigned
>> >> > int, 0);
>> >> >
>> >> > +NDCTL_EXPORT unsigned int
>> >> > +ndctl_cmd_smart_get_event_flags(struct
>> >> > +ndctl_cmd *cmd)
>> >>
>> >> My expectation for this ndctl_*_get_event_flags() api was to have it
be:
>> >>
>> >> ndctl_dimm_get_event_flags()
>> >>
>> >> ...and with that in place get rid of the 'struct monitor_dimm'
object.
>> >> Push everything to be retrieved through api calls against a 'struct
ndctl_dimm'
>> object.
>> >> In other words, the usage of 'struct ndctl_cmd'
>> >> should be hidden and all monitor operations should be done in
>> >> terms of 'struct ndctl_dimm' helper calls.
>> >>
>> > Hi Dan,
>> > Thanks for your comments.
>> >
>> > In the v9 of monitor, I use the 'struct ndctl_cmd' object in the
following places:
>> > ndctl_cmd_smart_get_flags(struct ndctl_cmd *cmd)
>> > ndctl_cmd_smart_get_health(struct ndctl_cmd *cmd)
>> > ndctl_cmd_smart_get_event_flags(struct ndctl_cmd *cmd) Is
>> > it that you want to hide all of the 'struct ndctl_cmd' objects
and
>> > add the following
>> 'struct ndctl_dimm' helper calls?
>>
>> I'm primarily reacting to:
>>
>> +struct monitor_dimm {
>> + struct ndctl_dimm *dimm;
>> + int health_eventfd;
>> + unsigned int health;
>> + unsigned int event_flags;
>> + struct list_node list;
>> +};
>>
>> Which is effectively duplicating ndctl_dimm internal data.
>>
>> > ndctl_dimm_get_flags(struct ndctl_dimm *dimm)
>> > ndctl_dimm_get_health(struct ndctl_dimm *dimm)
>> > ndctl_dimm_get_event_flags(struct ndctl_dimm *dimm)
>>
>> I understand why we need ndctl_dimm_get_event_flags() since that
>> tells you what events have fired. Why do we need the other 2? If the
>> event has fired then the monitor proceeds to call util_dimm_health_to_json.
>> Is that not sufficient?
>>
> About ndctl_dimm_get_flags(struct ndctl_dimm *dimm), I think the
ND_SMART_ALARM_VALID of DIMM should be confirmed before monitor start.
> If the smart alarm invalid, the DIMM no need to monitor.
> Also, the monitor should know the health of DIMM when it starts. When an event
fires, then compare it with current health of DIMM.
> Therefore, we could know if the dimm-health-state event fired.
Ok, but that sounds like two APIs. _get_supported_flags() and _get_flags().
Ok, I will add both _get_supported_flags() and _get_flags().
Thank you very much.
Qi