-----Original Message-----
From: Dan Williams [mailto:dan.j.williams@intel.com]
Sent: Monday, July 9, 2018 3:04 PM
To: Qi, Fuli/斉 福利 <qi.fuli(a)jp.fujitsu.com>
Cc: linux-nvdimm <linux-nvdimm(a)lists.01.org>
Subject: Re: [PATCH v9 1/3] ndctl, monitor: add ndctl monitor
On Sun, Jul 8, 2018 at 9:59 PM, Qi, Fuli <qi.fuli(a)jp.fujitsu.com> wrote:
>> -----Original Message-----
>> From: Dan Williams [mailto:dan.j.williams@intel.com]
>> Sent: Sunday, July 8, 2018 5:06 AM
>> To: Qi, Fuli/斉 福利 <qi.fuli(a)jp.fujitsu.com>
>> Cc: linux-nvdimm <linux-nvdimm(a)lists.01.org>
>> Subject: Re: [PATCH v9 1/3] ndctl, monitor: add ndctl monitor
>>
>> On Thu, Jun 28, 2018 at 7:30 PM, QI Fuli <qi.fuli(a)jp.fujitsu.com> wrote:
>> > Ndctl monitor is used for monitoring the smart events of nvdimm DIMMs.
>> > When a smart event fires, monitor will output the notifications
>> > which include dimm health status and evnet informations to syslog
>> > or a logfile by setting [--logfile] option. The notifications
>> > follow json format and can be consumed by log collectors like Fluentd.
>> >
>> > The objects to monitor can be selected by setting [--dimm]
>> > [--region] [--namespace] [--bus] options and the event type can be
>> > filtered by setting [--dimm-event] option. These options support
>> > multiple space-separated arguments.
>> >
>> > Ndctl monitor can be forked as a daemon by using [--daemon] option,
>> > such as:
>> > # ndctl monitor --daemon --logfile /var/log/ndctl/monitor.log
>> >
>> > Signed-off-by: QI Fuli <qi.fuli(a)jp.fujitsu.com>
>> > ---
>> > builtin.h | 1 +
>> > ndctl/Makefile.am | 3 +-
>> > ndctl/lib/libndctl.sym | 1 +
>> > ndctl/lib/smart.c | 17 ++
>> > ndctl/libndctl.h | 6 +
>> > ndctl/monitor.c | 531 +++++++++++++++++++++++++++++++++++++++++
>> > ndctl/ndctl.c | 1 +
>> > util/filter.h | 9 +
>> > 8 files changed, 568 insertions(+), 1 deletion(-) create mode
>> > 100644 ndctl/monitor.c
>> >
>> > diff --git a/builtin.h b/builtin.h
>> > index d3cc723..675a6ce 100644
>> > --- a/builtin.h
>> > +++ b/builtin.h
>> > @@ -39,6 +39,7 @@ int cmd_inject_error(int argc, const char **argv,
>> > void *ctx); int cmd_wait_scrub(int argc, const char **argv, void
>> > *ctx); int cmd_start_scrub(int argc, const char **argv, void
>> > *ctx); int cmd_list(int argc, const char **argv, void *ctx);
>> > +int cmd_monitor(int argc, const char **argv, void *ctx);
>> > #ifdef ENABLE_TEST
>> > int cmd_test(int argc, const char **argv, void *ctx); #endif diff
>> > --git a/ndctl/Makefile.am b/ndctl/Makefile.am index
>> > d22a379..7dbf223
>> > 100644
>> > --- a/ndctl/Makefile.am
>> > +++ b/ndctl/Makefile.am
>> > @@ -16,7 +16,8 @@ ndctl_SOURCES = ndctl.c \
>> > util/json-smart.c \
>> > util/json-firmware.c \
>> > inject-error.c \
>> > - inject-smart.c
>> > + inject-smart.c \
>> > + monitor.c
>> >
>> > if ENABLE_DESTRUCTIVE
>> > ndctl_SOURCES += ../test/blk_namespaces.c \ diff --git
>> > a/ndctl/lib/libndctl.sym b/ndctl/lib/libndctl.sym index
>> > e939993..f64df56 100644
>> > --- a/ndctl/lib/libndctl.sym
>> > +++ b/ndctl/lib/libndctl.sym
>> > @@ -366,4 +366,5 @@ global:
>> > ndctl_namespace_inject_error2;
>> > ndctl_namespace_uninject_error2;
>> > ndctl_cmd_ars_stat_get_flag_overflow;
>> > + ndctl_cmd_smart_get_event_flags;
>> > } LIBNDCTL_15;
>> > diff --git a/ndctl/lib/smart.c b/ndctl/lib/smart.c index
>> > 0455252..90a65d0 100644
>> > --- a/ndctl/lib/smart.c
>> > +++ b/ndctl/lib/smart.c
>> > @@ -101,6 +101,23 @@ NDCTL_EXPORT unsigned int
>> > ndctl_cmd_smart_threshold_get_temperature(
>> >
>> > smart_cmd_op(smart_threshold_get_supported_alarms, unsigned int,
>> > 0);
>> >
>> > +NDCTL_EXPORT unsigned int ndctl_cmd_smart_get_event_flags(struct
>> > +ndctl_cmd *cmd)
>>
>> My expectation for this ndctl_*_get_event_flags() api was to have it be:
>>
>> ndctl_dimm_get_event_flags()
>>
>> ...and with that in place get rid of the 'struct monitor_dimm' object.
>> Push everything to be retrieved through api calls against a 'struct
ndctl_dimm'
object.
>> In other words, the usage of 'struct ndctl_cmd'
>> should be hidden and all monitor operations should be done in terms
>> of 'struct ndctl_dimm' helper calls.
>>
> Hi Dan,
> Thanks for your comments.
>
> In the v9 of monitor, I use the 'struct ndctl_cmd' object in the following
places:
> ndctl_cmd_smart_get_flags(struct ndctl_cmd *cmd)
> ndctl_cmd_smart_get_health(struct ndctl_cmd *cmd)
> ndctl_cmd_smart_get_event_flags(struct ndctl_cmd *cmd) Is it
> that you want to hide all of the 'struct ndctl_cmd' objects and add the
following
'struct ndctl_dimm' helper calls?
I'm primarily reacting to:
+struct monitor_dimm {
+ struct ndctl_dimm *dimm;
+ int health_eventfd;
+ unsigned int health;
+ unsigned int event_flags;
+ struct list_node list;
+};
Which is effectively duplicating ndctl_dimm internal data.
> ndctl_dimm_get_flags(struct ndctl_dimm *dimm)
> ndctl_dimm_get_health(struct ndctl_dimm *dimm)
> ndctl_dimm_get_event_flags(struct ndctl_dimm *dimm)
I understand why we need ndctl_dimm_get_event_flags() since that tells you what
events have fired. Why do we need the other 2? If the event has fired then the monitor
proceeds to call util_dimm_health_to_json.
Is that not sufficient?
About ndctl_dimm_get_flags(struct ndctl_dimm *dimm), I think the
ND_SMART_ALARM_VALID of DIMM should be confirmed before monitor start.
If the smart alarm invalid, the DIMM no need to monitor.
Also, the monitor should know the health of DIMM when it starts. When an event fires, then
compare it with current health of DIMM.
Therefore, we could know if the dimm-health-state event fired.