Re: [Devel] [PATCH v9 5/7] arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl
by gengdongjiu
Hi James,
Thanks a lot for your review and comments.
>
> Hi Dongjiu Geng,
>
> On 06/01/18 16:02, Dongjiu Geng wrote:
> > The ARM64 RAS SError Interrupt(SEI) syndrome value is specific to the
> > guest and user space needs a way to tell KVM this value. So we add a
> > new ioctl. Before user space specifies the Exception Syndrome Register
> > ESR(ESR), it firstly checks that whether KVM has the capability to set
> > the guest ESR, If has, will set it. Otherwise, nothing to do.
> >
> > For this ESR specifying, Only support for AArch64, not support AArch32.
>
> After this patch user-space can trigger an SError in the guest. If it wants to migrate the guest, how does the pending SError get migrated?
>
> I think we need to fix migration first. Andrew Jones suggested using
> KVM_GET/SET_VCPU_EVENTS:
> https://www.spinics.net/lists/arm-kernel/msg616846.html
>
> Given KVM uses kvm_inject_vabt() on v8.0 hardware too, we should cover systems without the v8.2 RAS Extensions with the same API. I
> think this means a bit to read/write whether SError is pending, and another to indicate the ESR should be set/read.
> CPUs without the v8.2 RAS Extensions can reject pending-SError that had an ESR.
For the CPUs without the v8.2 RAS Extensions, its ESR is always 0, we only can inject a SError with ESR 0 to guest, cannot set its ESR.
About how about to use the KVM_GET/SET_VCPU_EVENTS, I will check the code, and consider your suggestion at the same time.
The IOCTL KVM_GET/SET_VCPU_EVENTS has been used by X86.
>
> user-space can then use the 'for migration' calls to make a 'new' SError pending.
>
> Now that the cpufeature bits are queued, I think this can be split up into two separate series for v4.16-rc1, one to tackle NOTIFY_SEI and
> the associated plumbing. The second for the KVM 'make SError pending' API.
Ok, thanks for your suggestion, will split it.
>
>
> > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c index
> > 5c7f657..738ae90 100644
> > --- a/arch/arm64/kvm/guest.c
> > +++ b/arch/arm64/kvm/guest.c
> > @@ -277,6 +277,11 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
> > return -EINVAL;
> > }
> >
> > +int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome) {
> > + return -EINVAL;
> > +}
>
> Does nothing in the patch that adds the support? This is a bit odd.
> (oh, its hiding in patch 6...)
To make this patch simple and small, I add it in patch 6.
>
>
> Thanks,
>
> James
3 years
[PATCH v9 0/7] Handle guest RAS Error in KVM and kernel
by Dongjiu Geng
This series patches mainly do below things:
1. Trap guest RAS ERR* registers accesses to EL2 from Non-secure EL1,
KVM will will do a minimum simulation, these registers are simulated
to RAZ/WI in KVM.
2. Route guest synchronous External Abort to EL2. If it is also routed
to EL3 by firmware at the same time, system will trap to EL3 firmware instead
of EL2 KVM, then firmware judges whether EL2 routing is enabled, if enabled,
jump back to EL2 KVM, otherwise jump back to EL1 host kernel.
3. Enable APEI ARv8 SEI notification to parse the CPER records for SError
in the ACPI GHES driver, KVM will call handle_guest_sei() to let ACPI
driver to parse the CPER recorded for SError which happened in the guest
4. If ACPI driver parsed the CPER record failed, KVM will classify the Error
through Exception Syndrome Register and do different approaches according
to Asynchronous Error Type
5. If the guest RAS SError is not propagated and not consumed, this exception
is precise, we temporarily shut down the VM to isolate the error. For other
Asynchronous Error Type, KVM directly injects virtual SError with IMPLEMENTATION
DEFINED ESR or KVM panic if the error is fatal. For the RAS extension, guest
virtual ESR must be set, because all-zero means 'RAS error: Uncategorized' instead
of 'no valid ISS', so set this ESR to IMPLEMENTATION DEFINED by default if user space
does not specify it.
change since v8:
1. update the patch [1/7] and [2/7] to align this serie.
https://www.spinics.net/lists/arm-kernel/msg623513.html
https://www.spinics.net/lists/arm-kernel/msg623520.html
2. In kvm ,check handle_guest_sei()'s return value. If this function return true, stop
classifying errors.
3. Temporarily shut down the VM to isolate the error for recoverable error (UER)
4. update some patch's commit messages and clean some patches
Dongjiu Geng (5):
acpi: apei: Add SEI notification type support for ARMv8
KVM: arm64: Trap RAS error registers and set HCR_EL2's TERR & TEA
arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl
arm64: kvm: Set Virtual SError Exception Syndrome for guest
arm64: kvm: handle guest SError Interrupt by categorization
James Morse (1):
KVM: arm64: Save ESR_EL2 on guest SError
Xie XiuQi (1):
arm64: cpufeature: Detect CPU RAS Extentions
Documentation/virtual/kvm/api.txt | 11 ++++++
arch/arm/include/asm/kvm_host.h | 1 +
arch/arm/kvm/guest.c | 9 +++++
arch/arm64/Kconfig | 16 +++++++++
arch/arm64/include/asm/cpucaps.h | 3 +-
arch/arm64/include/asm/esr.h | 11 ++++++
arch/arm64/include/asm/kvm_arm.h | 2 ++
arch/arm64/include/asm/kvm_emulate.h | 17 +++++++++
arch/arm64/include/asm/kvm_host.h | 2 ++
arch/arm64/include/asm/sysreg.h | 15 ++++++++
arch/arm64/include/asm/system_misc.h | 1 +
arch/arm64/kernel/cpufeature.c | 13 +++++++
arch/arm64/kvm/guest.c | 14 ++++++++
arch/arm64/kvm/handle_exit.c | 68 +++++++++++++++++++++++++++++++++---
arch/arm64/kvm/hyp/switch.c | 25 +++++++++++--
arch/arm64/kvm/inject_fault.c | 13 ++++++-
arch/arm64/kvm/reset.c | 3 ++
arch/arm64/kvm/sys_regs.c | 10 ++++++
arch/arm64/mm/fault.c | 16 +++++++++
drivers/acpi/apei/Kconfig | 15 ++++++++
drivers/acpi/apei/ghes.c | 53 ++++++++++++++++++++++++++++
include/acpi/ghes.h | 1 +
include/uapi/linux/kvm.h | 3 ++
virt/kvm/arm/arm.c | 7 ++++
24 files changed, 320 insertions(+), 9 deletions(-)
--
1.9.1
3 years, 1 month
Re: [Devel] [PATCH v3 5/5] ACPICA: Remove calling of _STA from acpi_get_object_info
by Rafael J. Wysocki
On Sat, Jan 27, 2018 at 3:02 PM, Hans de Goede <hdegoede(a)redhat.com> wrote:
> Hi,
>
> On 26-01-18 21:42, Moore, Robert wrote:
>>
>> For ACPICA, this is a change to a published public interface. A change to
>> this will potentially affect many operating systems, so we would be very
>> hesitant to change it in the core ACPICA code.
>
>
> From the acpi_get_object_info() documentation in
> drivers/acpi/acpica/nsxfname.c:
>
> * Note: This interface is intended to be used during the initial device
> * discovery namespace traversal. Therefore, no complex methods can be
> * executed, especially those that access operation regions.
>
> Since _STA can call Opregions, IMHO it does not belong in
> acpi_get_object_info()
> and as Erik mentioned _STA can have side-effects then that seems like
> another
> reason to NOT call it from acpi_get_object_info().
>
> But I can understand that you don't want to outright change the behavior of
> acpi_get_object_info() to call _STA since it has always done this, still
> calling _STA can be a problem in some cases.
>
> Perhaps we can give acpi_get_object_info() a flags argument, or if you want
> to
> preserve the current interface ad a new acpi_get_object_info_no_sta() call
> (and
> use flags under the hood so that we still have only 1 implementation) ?
Let's first get the [1-4/5] from this series in (as the last patch
depends on them AFAICS) and then we'll see.
Thanks,
Rafael
3 years, 1 month
Re: [Devel] [PATCH v3 5/5] ACPICA: Remove calling of _STA from acpi_get_object_info
by Moore, Robert
For ACPICA, this is a change to a published public interface. A change to this will potentially affect many operating systems, so we would be very hesitant to change it in the core ACPICA code.
Bob
> -----Original Message-----
> From: Hans de Goede [mailto:hdegoede@redhat.com]
> Sent: Friday, January 26, 2018 7:03 AM
> To: Rafael J . Wysocki <rjw(a)rjwysocki.net>; Len Brown <lenb(a)kernel.org>;
> Moore, Robert <robert.moore(a)intel.com>; Schmauss, Erik
> <erik.schmauss(a)intel.com>; Bjorn Helgaas <bhelgaas(a)google.com>
> Cc: Hans de Goede <hdegoede(a)redhat.com>; linux-acpi(a)vger.kernel.org;
> devel(a)acpica.org; linux-pci(a)vger.kernel.org
> Subject: [PATCH v3 5/5] ACPICA: Remove calling of _STA from
> acpi_get_object_info
>
> As the comment above it indicates, acpi_get_object_info is intended for
> early probe usage and as such should not call any methods which may rely
> on OpRegions, before this commit it was also calling _STA, which on some
> systems does rely on OpRegions.
>
> Calling _STA before things are ready leads to errors such as these:
>
> [ 0.123579] ACPI Error: No handler for Region [ECRM]
> (00000000ba9edc4c)
> [GenericSerialBus] (20170831/evregion-166)
> [ 0.123601] ACPI Error: Region GenericSerialBus (ID=9) has no handler
> (20170831/exfldio-299)
> [ 0.123618] ACPI Error: Method parse/execution failed
> \_SB.I2C1.BAT1._STA, AE_NOT_EXIST (20170831/psparse-550)
>
> End 2015 support for the _SUB method was removed for exactly the same
> reason. Removing current_status from struct acpi_device_info only has a
> limit impact. Within ACPICA it is only used by 2 debug messages, both of
> which are modified to no longer print it with this commit.
>
> Outside of ACPICA, for Linux there is only one user. For which a patch
> to remove the dependency on the current_status field is available.
>
> Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
> ---
> drivers/acpi/acpica/dbdisply.c | 5 ++---
> drivers/acpi/acpica/nsdumpdv.c | 5 ++---
> drivers/acpi/acpica/nsxfname.c | 19 ++++---------------
> include/acpi/actypes.h | 2 --
> 4 files changed, 8 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/acpi/acpica/dbdisply.c
> b/drivers/acpi/acpica/dbdisply.c index 5a606eac0c22..7b5eb33fe962 100644
> --- a/drivers/acpi/acpica/dbdisply.c
> +++ b/drivers/acpi/acpica/dbdisply.c
> @@ -642,9 +642,8 @@ void acpi_db_display_object_type(char *object_arg)
> return;
> }
>
> - acpi_os_printf("ADR: %8.8X%8.8X, STA: %8.8X, Flags: %X\n",
> - ACPI_FORMAT_UINT64(info->address),
> - info->current_status, info->flags);
> + acpi_os_printf("ADR: %8.8X%8.8X, Flags: %X\n",
> + ACPI_FORMAT_UINT64(info->address), info->flags);
>
> acpi_os_printf("S1D-%2.2X S2D-%2.2X S3D-%2.2X S4D-%2.2X\n",
> info->highest_dstates[0], info->highest_dstates[1],
> diff --git a/drivers/acpi/acpica/nsdumpdv.c
> b/drivers/acpi/acpica/nsdumpdv.c index 5026594763ea..573a5f36f01a 100644
> --- a/drivers/acpi/acpica/nsdumpdv.c
> +++ b/drivers/acpi/acpica/nsdumpdv.c
> @@ -88,10 +88,9 @@ acpi_ns_dump_one_device(acpi_handle obj_handle,
> }
>
> ACPI_DEBUG_PRINT_RAW((ACPI_DB_TABLES,
> - " HID: %s, ADR: %8.8X%8.8X, Status: %X\n",
> + " HID: %s, ADR: %8.8X%8.8X\n",
> info->hardware_id.value,
> - ACPI_FORMAT_UINT64(info->address),
> - info->current_status));
> + ACPI_FORMAT_UINT64(info->address));
> ACPI_FREE(info);
> }
>
> diff --git a/drivers/acpi/acpica/nsxfname.c
> b/drivers/acpi/acpica/nsxfname.c index 106966235805..0a9c600c2599 100644
> --- a/drivers/acpi/acpica/nsxfname.c
> +++ b/drivers/acpi/acpica/nsxfname.c
> @@ -241,7 +241,7 @@ static char *acpi_ns_copy_device_id(struct
> acpi_pnp_device_id *dest,
> * namespace node and possibly by running several standard
> * control methods (Such as in the case of a device.)
> *
> - * For Device and Processor objects, run the Device _HID, _UID, _CID,
> _STA,
> + * For Device and Processor objects, run the Device _HID, _UID, _CID,
> * _CLS, _ADR, _sx_w, and _sx_d methods.
> *
> * Note: Allocates the return buffer, must be freed by the caller.
> @@ -250,8 +250,9 @@ static char *acpi_ns_copy_device_id(struct
> acpi_pnp_device_id *dest,
> * discovery namespace traversal. Therefore, no complex methods can be
> * executed, especially those that access operation regions. Therefore,
> do
> * not add any additional methods that could cause problems in this
> area.
> - * this was the fate of the _SUB method which was found to cause such
> - * problems and was removed (11/2015).
> + * Because of this reason support for the following methods has been
> removed:
> + * 1) _SUB method was removed (11/2015)
> + * 2) _STA method was removed (01/2018)
> *
>
> ************************************************************************
> ******/
>
> @@ -374,20 +375,8 @@ acpi_get_object_info(acpi_handle handle,
> * Notes: none of these methods are required, so they may or
> may
> * not be present for this device. The Info->Valid bitfield
> is used
> * to indicate which methods were found and run successfully.
> - *
> - * For _STA, if the method does not exist, then (as per the
> ACPI
> - * specification), the returned current_status flags will
> indicate
> - * that the device is present/functional/enabled. Otherwise,
> the
> - * current_status flags reflect the value returned from _STA.
> */
>
> - /* Execute the Device._STA method */
> -
> - status = acpi_ut_execute_STA(node, &info->current_status);
> - if (ACPI_SUCCESS(status)) {
> - valid |= ACPI_VALID_STA;
> - }
> -
> /* Execute the Device._ADR method */
>
> status = acpi_ut_evaluate_numeric_object(METHOD_NAME__ADR,
> node, diff --git a/include/acpi/actypes.h b/include/acpi/actypes.h index
> 4f077edb9b81..220ef8674763 100644
> --- a/include/acpi/actypes.h
> +++ b/include/acpi/actypes.h
> @@ -1191,7 +1191,6 @@ struct acpi_device_info {
> u8 flags; /* Miscellaneous info */
> u8 highest_dstates[4]; /* _sx_d values: 0xFF indicates not
> valid */
> u8 lowest_dstates[5]; /* _sx_w values: 0xFF indicates not valid */
> - u32 current_status; /* _STA value */
> u64 address; /* _ADR value */
> struct acpi_pnp_device_id hardware_id; /* _HID value */
> struct acpi_pnp_device_id unique_id; /* _UID value */
> @@ -1205,7 +1204,6 @@ struct acpi_device_info {
>
> /* Flags for Valid field above (acpi_get_object_info) */
>
> -#define ACPI_VALID_STA 0x0001
> #define ACPI_VALID_ADR 0x0002
> #define ACPI_VALID_HID 0x0004
> #define ACPI_VALID_UID 0x0008
> --
> 2.14.3
3 years, 1 month
Re: [Devel] ACPI: Do not call _STA on battery devices with unmet dependencies
by Schmauss, Erik
Hi,
> -----Original Message-----
> From: Hans de Goede [mailto:hdegoede@redhat.com]
> Sent: Saturday, January 20, 2018 4:48 AM
> To: Schmauss, Erik <erik.schmauss(a)intel.com>
> Cc: Rafael J . Wysocki <rjw(a)rjwysocki.net>; Len Brown <lenb(a)kernel.org>;
> Bjorn Helgaas <bhelgaas(a)google.com>; Moore, Robert
> <robert.moore(a)intel.com>; Zheng, Lv <lv.zheng(a)intel.com>; linux-
> acpi(a)vger.kernel.org; linux-pci(a)vger.kernel.org; devel(a)acpica.org
> Subject: Re: ACPI: Do not call _STA on battery devices with unmet dependencies
>
> Hi,
>
> On 19-01-18 22:03, Schmauss, Erik wrote:
> >
> >> -----Original Message-----
> >> From: linux-acpi-owner(a)vger.kernel.org [mailto:linux-acpi-
> >> owner(a)vger.kernel.org] On Behalf Of Bjorn Helgaas
> >> Sent: Thursday, January 18, 2018 11:11 AM
> >> To: Hans de Goede <hdegoede(a)redhat.com>
> >> Cc: Rafael J . Wysocki <rjw(a)rjwysocki.net>; Len Brown
> >> <lenb(a)kernel.org>; Bjorn Helgaas <bhelgaas(a)google.com>; Moore, Robert
> >> <robert.moore(a)intel.com>; Zheng, Lv <lv.zheng(a)intel.com>; linux-
> >> acpi(a)vger.kernel.org; linux-pci(a)vger.kernel.org; devel(a)acpica.org
> >> Subject: Re: ACPI: Do not call _STA on battery devices with unmet
> >> dependencies
> >>
> >> On Thu, Jan 18, 2018 at 05:03:55PM +0100, Hans de Goede wrote:
> >
> > Hi Hans,
> >
> >>> Hi All,
> >>>
> >>> The ACPI code already contains quite a bit of code to not bind the
> >>> ACPI-battery until all deps for an ACPI battery device have been
> >>> met, but on some devices calling _STA before all deps are met is a
> >>> problem too because the _STA method uses an i2c OpRegion there.
> >
> > Could you explain why _STA method using an I2C OpRegion is problematic?
>
> It is problematic if we call the _STA method before the handler for the OpRegion
> has been installed, this series delays / avoids calling _STA before the OpRegion
> has been installed.
Thanks for the explanation.
I'm looking at the ACPI spec and from what it said about _STA, I gathered that _STA is allowed to have side-effects to named objects within the AML namespace. So removing _STA from acpi_get_object_info seems a little dangerous... Is there a reason why you cannot remove or delay the call to acpi_get_object_info instead? Chapter 6 section 1 of the ACPI spec also states that it's possible for _HID and _ADR to read from operation regions as well. Is it common convention that these _HID and _ADR objects do not access operation regions?
Thanks,
Erik
>
> Regards,
>
> Hans
>
>
> >
> > Thanks,
> > Erik
> >
> >>>
> >>> Here is the DSDT of the device I'm seeing this on:
> >>> https://fedorapeople.org/~jwrdegoede/toshiba-click-mini-dsdt.dsl
> >>
> >> This looks like interesting info, but (a) this link isn't mentioned
> >> in the actual patches, and (b) it's conceivable that fedorapeople.org could go
> away someday.
> >> If this were a PCI series, I would suggest opening a report at
> >> bugzilla.kernel.org, attaching the DSL there, and including the link in the
> patch changelog.
> >>
> >>> This series modifies the kernel to not call _STA until all deps are
> >>> met, mirroring the binding behavior of the battery driver.
> >>>
> >>> Without this series a total of 32 ACPI errors get printend to the
> >>> console on boot, there are 4 errors per _STA call, 2 battery devices
> >>> on this system and 4 _STA calls per battery device.
> >>>
> >>> The first commit is a preparation commit for making the ACPICA
> >>> changes in the 4th commit, this commit is necessary to not break
> >>> things after the ACPICA changes.
> >>>
> >>> The second commit modifies acpi_bus_get_status to not call _STA on
> >>> battery devices until all deps are met. This fixes 2 of the 4 too
> >>> early _STA calls triggering these errors.
> >>>
> >>> The third commit makes the device instantiation code use
> >>> acpi_bus_get_status instead of acpi_bus_get_status_handle so that
> >>> the code to get the initial status also does not makes 1 too early _STA call.
> >>>
> >>> The fourth commit changes the ACPICA acpi_get_object_info function
> >>> to not call _STA. Only 1 user (which is fixed in the first commit)
> >>> cares about acpi_device_info.current_status. And the ACPICA code has
> >>> this
> >> comment:
> >>>
> >>> * Note: This interface is intended to be used during the initial
> >>> device
> >>> * discovery namespace traversal. Therefore, no complex methods can be
> >>> * executed, especially those that access operation regions.
> >>> Therefore, do
> >>> * not add any additional methods that could cause problems in this area.
> >>> * Because of this reason support for the following methods has
> >>> been
> >> removed:
> >>> * this was the fate of the _SUB method which was found to cause such
> >>> * problems and was removed (11/2015).
> >>>
> >>> The described problems with the _SUB method clearly also apply to
> >>> the _STA method, so removing it from acpi_get_object_info seems like
> >>> it is the right thing to do here. This too fixes 1 too early _STA
> >>> call, so that with all
> >>> 4 patches in place we've fixed all 4 too early _STA calls.
> >>>
> >>> Regards,
> >>>
> >>> Hans
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-acpi"
> >>> in the body of a message to majordomo(a)vger.kernel.org More majordomo
> >>> info at http://vger.kernel.org/majordomo-info.html
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-acpi"
> >> in the body of a message to majordomo(a)vger.kernel.org More majordomo
> >> info at http://vger.kernel.org/majordomo-info.html
3 years, 1 month
Re: [Devel] [PATCH v2 1/5] ACPI: export acpi_bus_get_status_handle()
by Rafael J. Wysocki
On Mon, Jan 22, 2018 at 9:51 AM, Hans de Goede <hdegoede(a)redhat.com> wrote:
> Some modular drivers need this, export it.
>
> Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
> ---
> Changes in v2:
> -This is a new patch in v2 of this patch-set
> ---
> drivers/acpi/bus.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> index 4d0979e02a28..db0c05997cc2 100644
> --- a/drivers/acpi/bus.c
> +++ b/drivers/acpi/bus.c
> @@ -108,6 +108,7 @@ acpi_status acpi_bus_get_status_handle(acpi_handle handle,
> }
> return status;
> }
> +EXPORT_SYMBOL(acpi_bus_get_status_handle);
EXPORT_SYMBOL_GPL(), please.
>
> int acpi_bus_get_status(struct acpi_device *device)
> {
> --
3 years, 1 month
Re: [Devel] [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
by James Morse
Hi gengdongjiu,
On 21/01/18 02:45, gengdongjiu wrote:
> For the ESR_ELx_AET_UER, this exception is precise, closing the VM may
> be better[1].
> But if you think panic is better until we support kernel-first, it is
> also OK to me.
I'm not convinced SError while a guest was running means only guest memory could
be affected. Mechanisms like KSM means the error could affect multiple guests.
Both firmware-fist and kernel-first will give us the address, with which we can
know which processes are affected, isolated the memory and signal affected
processes.
Until we have one of these panic() is the only way we have to contain an error,
but its an interim fix.
Not panic()ing the host for an error that should be contained to the guest is a
fudge, we don't actually know its safe (KSM, page-table etc). I want to improve
on this with {firmware, kernel}-first support (or both!), I don't want to expose
that this is happening to user-space, as once we have one of {firmware,
kernel}-first, it shouldn't happen.
>> This is inventing something new for RAS errors not claimed by firmware-first.
>> If we have kernel-first too, this will never happen. (unless your system is
>> losing the error description).
> In fact, if we have kernel-first, I think we still need to judge the
> error type by ESR, right?
The kernel-first mechanism should consider the ESR/FAR, yes, but once the error
has been claimed and handled, KVM shouldn't care about any of these values.
(maybe we'll sanity check for uncontained errors, just in case the error escaped
to the RAS code...)
My point here was exposing 'unhandled' (ignored) RAS errors to user-space
creates an ABI: someone will complain once we start handling the error, and they
no longer get a notification via this 'unhandled' interface. Code written to use
this interface becomes useless/untested.
> If the handle_guest_sei() , may be the system does not support firmware-first,
> so we judge the ESR value,
...and panic()/ignore as appropriate.
I agree not all systems will support firmware-first, (big-endian is the obvious
example), but if we get kernel-first support this ESR guessing can disappear,
I'm against exposing it to user-space in the meantime.
Thanks,
James
3 years, 1 month
Re: [Devel] ACPI: Do not call _STA on battery devices with unmet dependencies
by Schmauss, Erik
> -----Original Message-----
> From: linux-acpi-owner(a)vger.kernel.org [mailto:linux-acpi-
> owner(a)vger.kernel.org] On Behalf Of Bjorn Helgaas
> Sent: Thursday, January 18, 2018 11:11 AM
> To: Hans de Goede <hdegoede(a)redhat.com>
> Cc: Rafael J . Wysocki <rjw(a)rjwysocki.net>; Len Brown <lenb(a)kernel.org>;
> Bjorn Helgaas <bhelgaas(a)google.com>; Moore, Robert
> <robert.moore(a)intel.com>; Zheng, Lv <lv.zheng(a)intel.com>; linux-
> acpi(a)vger.kernel.org; linux-pci(a)vger.kernel.org; devel(a)acpica.org
> Subject: Re: ACPI: Do not call _STA on battery devices with unmet dependencies
>
> On Thu, Jan 18, 2018 at 05:03:55PM +0100, Hans de Goede wrote:
Hi Hans,
> > Hi All,
> >
> > The ACPI code already contains quite a bit of code to not bind the
> > ACPI-battery until all deps for an ACPI battery device have been met,
> > but on some devices calling _STA before all deps are met is a problem
> > too because the _STA method uses an i2c OpRegion there.
Could you explain why _STA method using an I2C OpRegion is problematic?
Thanks,
Erik
> >
> > Here is the DSDT of the device I'm seeing this on:
> > https://fedorapeople.org/~jwrdegoede/toshiba-click-mini-dsdt.dsl
>
> This looks like interesting info, but (a) this link isn't mentioned in the actual
> patches, and (b) it's conceivable that fedorapeople.org could go away someday.
> If this were a PCI series, I would suggest opening a report at bugzilla.kernel.org,
> attaching the DSL there, and including the link in the patch changelog.
>
> > This series modifies the kernel to not call _STA until all deps are
> > met, mirroring the binding behavior of the battery driver.
> >
> > Without this series a total of 32 ACPI errors get printend to the
> > console on boot, there are 4 errors per _STA call, 2 battery devices
> > on this system and 4 _STA calls per battery device.
> >
> > The first commit is a preparation commit for making the ACPICA changes
> > in the 4th commit, this commit is necessary to not break things after
> > the ACPICA changes.
> >
> > The second commit modifies acpi_bus_get_status to not call _STA on
> > battery devices until all deps are met. This fixes 2 of the 4 too
> > early _STA calls triggering these errors.
> >
> > The third commit makes the device instantiation code use
> > acpi_bus_get_status instead of acpi_bus_get_status_handle so that the
> > code to get the initial status also does not makes 1 too early _STA call.
> >
> > The fourth commit changes the ACPICA acpi_get_object_info function to
> > not call _STA. Only 1 user (which is fixed in the first commit) cares
> > about acpi_device_info.current_status. And the ACPICA code has this
> comment:
> >
> > * Note: This interface is intended to be used during the initial
> > device
> > * discovery namespace traversal. Therefore, no complex methods can be
> > * executed, especially those that access operation regions.
> > Therefore, do
> > * not add any additional methods that could cause problems in this area.
> > * Because of this reason support for the following methods has been
> removed:
> > * this was the fate of the _SUB method which was found to cause such
> > * problems and was removed (11/2015).
> >
> > The described problems with the _SUB method clearly also apply to the
> > _STA method, so removing it from acpi_get_object_info seems like it is
> > the right thing to do here. This too fixes 1 too early _STA call, so
> > that with all
> > 4 patches in place we've fixed all 4 too early _STA calls.
> >
> > Regards,
> >
> > Hans
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-acpi"
> > in the body of a message to majordomo(a)vger.kernel.org More majordomo
> > info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of
> a message to majordomo(a)vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
3 years, 1 month
Re: [Devel] [PATCH 1/2] ACPICA: acpi_ds_resolve_package_element: Use proper status when logging exception
by Moore, Robert
We are working on this file right now, thanks.
Bob
> -----Original Message-----
> From: Hans de Goede [mailto:hdegoede@redhat.com]
> Sent: Friday, January 12, 2018 3:37 AM
> To: Len Brown <lenb(a)kernel.org>; Moore, Robert <robert.moore(a)intel.com>;
> Zheng, Lv <lv.zheng(a)intel.com>
> Cc: Hans de Goede <hdegoede(a)redhat.com>; linux-acpi(a)vger.kernel.org;
> devel(a)acpica.org
> Subject: [PATCH 1/2] ACPICA: acpi_ds_resolve_package_element: Use proper
> status when logging exception
>
> Do not override the status which causes us to call acpi_exception with
> the status returned by the acpi_ns_externalize_name call we do to get a
> name to log.
>
> Related: https://bugzilla.kernel.org/show_bug.cgi?id=198167
> Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
> ---
> drivers/acpi/acpica/dspkginit.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/acpi/acpica/dspkginit.c
> b/drivers/acpi/acpica/dspkginit.c index 6d487edfe2de..cddbbed2f344
> 100644
> --- a/drivers/acpi/acpica/dspkginit.c
> +++ b/drivers/acpi/acpica/dspkginit.c
> @@ -372,9 +372,9 @@ acpi_ds_resolve_package_element(union
> acpi_operand_object **element_ptr)
> ACPI_NS_SEARCH_PARENT | ACPI_NS_DONT_OPEN_SCOPE,
> NULL, &resolved_node);
> if (ACPI_FAILURE(status)) {
> - status = acpi_ns_externalize_name(ACPI_UINT32_MAX,
> - (char *)element->reference.
> - aml, NULL, &external_path);
> + acpi_ns_externalize_name(ACPI_UINT32_MAX,
> + (char *)element->reference.aml, NULL,
> + &external_path);
>
> ACPI_EXCEPTION((AE_INFO, status,
> "Could not find/resolve named package element: %s",
> --
> 2.14.3
3 years, 1 month
Re: [Devel] [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
by James Morse
Hi gengdongjiu,
On 16/12/17 04:47, gengdongjiu wrote:
> [...]
>>
>>> + case ESR_ELx_AET_UER: /* The error has not been propagated */
>>> + /*
>>> + * Userspace only handle the guest SError Interrupt(SEI) if the
>>> + * error has not been propagated
>>> + */
>>> + run->exit_reason = KVM_EXIT_EXCEPTION;
>>> + run->ex.exception = ESR_ELx_EC_SERROR;
>>> + run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>>> + return 0;
>>
>> We should not pass RAS notifications to user space. The kernel either handles
>> them, or it panics(). User space shouldn't even know if the kernel supports RAS
>
> For the ESR_ELx_AET_UER(Recoverable error), let us see its definition
> below, which get from [0]
[..]
> so we can see the exception is precise and PE can recover execution
> from the preferred return address of the exception,
> so let guest handling it is
> better, for example, if it is guest application RAS error, we can kill
> the guest application instead of panic whole OS; if it is guest kernel
> RAS error, guest will panic.
If the kernel takes an unhandled RAS error it should panic - we don't know where
the error is.
I understand you want to kill-off guest tasks as a result of RAS errors, but
this needs to go through the whole APEI->memory_failure()->sigbus machinery so
that the kernel knows the kernel can keep running.
This saves us signalling user-space when we don't need to. An example:
code-corruption. Linux can happily re-read affected user-space executables from
disk, there is absolutely nothing user-space can do about it.
Handling errors first in the kernel allows us to do recovery for all the
affected processes, not just the one that happens to be running right now.
> Host does not know which application of guest has error, so host can
> not handle it,
It has to work this out, otherwise the errors we can handle never get a chance.
This kernel is expected to look at the error description, (which for some reason
we aren't talking about here), e.g. the CPER records, and determine what
recovery action is necessary for this error.
For memory errors this may be re-reading from disk, or at the worst case,
unmapping from all user-space users (including KVM's stage2) and raining signals
on all affected processes.
For a memory error the important piece of information is the physical address.
Only the kernel can do anything with this, it determines who owns the affected
memory and what needs doing to recover from the error.
If you pass the notification to user-space, all it can do is signal the guest to
"stop doing whatever it is you're doing". The guest may have been able to
re-read pages from disk, or otherwise handle the error.
Has the error been handled? No: The error remains latent in the system.
> panic OS is not a good choice for the Recoverable error.
If we don't know where the error is, and we can't make progress, its the only
sane choice.
This code is never expected to run! (why are we arguing about it?) We should get
RAS errors as GHES notifications from firmware via some mechanism. If those are
NOTIFY_SEI then APEI should claim the notification and kick off the appropriate
handling based on the CPER records. If/when we get kernel-first, that can claim
the SError. What we're left with is RAS notifications that no-one claimed
because there was no error-description found.
James
3 years, 1 month