Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
by Huang, Kai
Hi Paolo/Radim,
I'd like to start a discussion regarding to IA32_SGXLEPUBKEYHASHn
handling here. I also copied SGX driver mailing list (which looks like I
should do when sending out this series, sorry) and Sean, Haim and Haitao
from Intel to have a better discussion.
Basically IA32_SGXLEPUBKEYHASHn (or more generally speaking, SGX Launch
Control) allows us to run different Launch Enclave (LE) signed with
different RSA keys. Only when the value of IA32_SGXLEPUBKEYHASHn matches
the key used to sign the LE, the LE can be initialized, specifically, by
EINIT, successfully. So before calling EINIT for LE, we have to make
sure IA32_SGXLEPUBKEYHASHn contain the matched value. One fact is only
EINIT uses IA32_SGXLEPUBKEYHASHn, and after EINIT, other ENCLS/ENCLU
(ex, EGETKEY) runs correctly even the MSRs are changed to other values.
To support KVM guests to run their own LEs inside guests, KVM traps
IA32_SGXLEPUBKEYHASHn MSR write and keep the value to vcpu internally,
and KVM needs to write the cached value to real MSRs before guest runs
EINIT. The problem at host side, we also run LE, probably multiple LEs
(it seems currently SGX driver plans to run single in-kernel LE but I am
not familiar with details, and IMO we should not assume host will only
run one LE), therefore if KVM changes the physical MSRs for guest,
host may not be able to run LE as it may not re-write the right MSRs
back. There are two approaches to make host and KVM guests work together:
1. Anyone who wants to run LE is responsible for writing the correct
value to IA32_SGXLEPUBKEYHASHn.
My current patch is based on this assumption. For KVM guest, naturally,
we will write the cached value to real MSRs when vcpu is scheduled in.
For host, SGX driver should write its own value to MSRs when it performs
EINIT for LE.
One argument against this approach is KVM guest should never have impact
on host side, meaning host should not be aware of such MSR change, in
which case, if host do some performance optimization thing that won't
update MSRs actively, when host run EINIT, the physical MSRs may contain
incorrect value. Instead, KVM should be responsible for restoring the
original MSRs, which brings us to approach 2 below.
2. KVM should restore MSRs after changing for guest.
To do this, the simplest way for KVM is: 1) to save the original
physical MSRs and update to guest's MSRs before VMENTRY; 2) in VMEXIT
rewrite the original value to physical MSRs.
To me this approach is also arguable, as KVM guest is actually just a
normal process (OK, maybe not that normal), and KVM guest should be
treated as the same as other processes which runs LE, which means
approach 1 is also reasonable.
And approach 2 will have more performance impact than approach 1 for
KVM, as it read/write IA32_SGXLEPUBKEYHASHn during each VMEXIT/VMENTRY,
while approach 1 only write MSRs when vcpu is scheduled in, which is
less frequent.
I'd like to hear all your comments and hopefully we can have some
agreement on this.
Another thing is, not quite related to selecting which approach above,
and either we choose approach 1 or approach 2, KVM still suffers the
performance loss of writing (and/or reading) to IA32_SGXLEPUBKEYHASHn
MSRs, either when vcpu scheduled in or during each VMEXIT/VMENTRY. Given
the fact that the IA32_SGXLEPUBKEYHASHn will only be used by EINIT, We
can actually do some optimization by trapping EINIT from guest and only
update MSRs in EINIT VMEXIT. This works for approach 1, but for approach
2 we have to do some tricky thing during VMEXIT/VMENTRY
to check whether MSRs have been changed by EINIT VMEXIT, and only
restore the original value if EINIT VMEXIT has happened. Guest's LE
continues to run even physical MSRs are changed back to original.
But trapping ENCLS requires either 1) KVM to run ENCLS on hebalf of
guest, in which case we have to reconstruct and remap guest's ENCLS
parameters and skip the ENCLS for guest; 2) using MTF to let guest to
run ENCLS again, while still trapping ENCLS. Either case would introduce
more complicated code and potentially be more buggy, and I don't think
we should do this to save some time of writing MSRs. If we need to turn
on ENCLS VMEXIT anyway we can optimize this.
Thank you in advance.
Thanks,
-Kai
On 5/8/2017 5:24 PM, Kai Huang wrote:
> If SGX runtime launch control is enabled on host (IA32_FEATURE_CONTROL[17]
> is set), KVM can support running multiple guests with each running LE signed
> with different RSA pubkey. KVM traps IA32_SGXLEPUBKEYHASHn MSR write from
> and keeps the values to vcpu internally, and when vcpu is scheduled in, KVM
> write those values to real IA32_SGXLEPUBKEYHASHn MSR.
>
> Signed-off-by: Kai Huang <kai.huang(a)linux.intel.com>
> ---
> arch/x86/include/asm/msr-index.h | 5 ++
> arch/x86/kvm/vmx.c | 123 +++++++++++++++++++++++++++++++++++++++
> 2 files changed, 128 insertions(+)
>
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index e3770f570bb9..70482b951b0f 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -417,6 +417,11 @@
> #define MSR_IA32_TSC_ADJUST 0x0000003b
> #define MSR_IA32_BNDCFGS 0x00000d90
>
> +#define MSR_IA32_SGXLEPUBKEYHASH0 0x0000008c
> +#define MSR_IA32_SGXLEPUBKEYHASH1 0x0000008d
> +#define MSR_IA32_SGXLEPUBKEYHASH2 0x0000008e
> +#define MSR_IA32_SGXLEPUBKEYHASH3 0x0000008f
> +
> #define MSR_IA32_XSS 0x00000da0
>
> #define FEATURE_CONTROL_LOCKED (1<<0)
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index a16539594a99..c96332b9dd44 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -656,6 +656,9 @@ struct vcpu_vmx {
> */
> u64 msr_ia32_feature_control;
> u64 msr_ia32_feature_control_valid_bits;
> +
> + /* SGX Launch Control public key hash */
> + u64 msr_ia32_sgxlepubkeyhash[4];
> };
>
> enum segment_cache_field {
> @@ -2244,6 +2247,70 @@ static void decache_tsc_multiplier(struct vcpu_vmx *vmx)
> vmcs_write64(TSC_MULTIPLIER, vmx->current_tsc_ratio);
> }
>
> +static bool cpu_sgx_lepubkeyhash_writable(void)
> +{
> + u64 val, sgx_lc_enabled_mask = (FEATURE_CONTROL_LOCKED |
> + FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE);
> +
> + rdmsrl(MSR_IA32_FEATURE_CONTROL, val);
> +
> + return ((val & sgx_lc_enabled_mask) == sgx_lc_enabled_mask);
> +}
> +
> +static bool vmx_sgx_lc_disabled_in_bios(struct kvm_vcpu *vcpu)
> +{
> + return (to_vmx(vcpu)->msr_ia32_feature_control & FEATURE_CONTROL_LOCKED)
> + && (!(to_vmx(vcpu)->msr_ia32_feature_control &
> + FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE));
> +}
> +
> +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH0 0xa6053e051270b7ac
> +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH1 0x6cfbe8ba8b3b413d
> +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH2 0xc4916d99f2b3735d
> +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH3 0xd4f8c05909f9bb3b
> +
> +static void vmx_sgx_init_lepubkeyhash(struct kvm_vcpu *vcpu)
> +{
> + u64 h0, h1, h2, h3;
> +
> + /*
> + * If runtime launch control is enabled (IA32_SGXLEPUBKEYHASHn is
> + * writable), we set guest's default value to be Intel's default
> + * hash (which is fixed value and can be hard-coded). Otherwise,
> + * guest can only use machine's IA32_SGXLEPUBKEYHASHn so set guest's
> + * default to that.
> + */
> + if (cpu_sgx_lepubkeyhash_writable()) {
> + h0 = SGX_INTEL_DEFAULT_LEPUBKEYHASH0;
> + h1 = SGX_INTEL_DEFAULT_LEPUBKEYHASH1;
> + h2 = SGX_INTEL_DEFAULT_LEPUBKEYHASH2;
> + h3 = SGX_INTEL_DEFAULT_LEPUBKEYHASH3;
> + }
> + else {
> + rdmsrl(MSR_IA32_SGXLEPUBKEYHASH0, h0);
> + rdmsrl(MSR_IA32_SGXLEPUBKEYHASH1, h1);
> + rdmsrl(MSR_IA32_SGXLEPUBKEYHASH2, h2);
> + rdmsrl(MSR_IA32_SGXLEPUBKEYHASH3, h3);
> + }
> +
> + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[0] = h0;
> + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[1] = h1;
> + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[2] = h2;
> + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[3] = h3;
> +}
> +
> +static void vmx_sgx_lepubkeyhash_load(struct kvm_vcpu *vcpu)
> +{
> + wrmsrl(MSR_IA32_SGXLEPUBKEYHASH0,
> + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[0]);
> + wrmsrl(MSR_IA32_SGXLEPUBKEYHASH1,
> + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[1]);
> + wrmsrl(MSR_IA32_SGXLEPUBKEYHASH2,
> + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[2]);
> + wrmsrl(MSR_IA32_SGXLEPUBKEYHASH3,
> + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[3]);
> +}
> +
> /*
> * Switches to specified vcpu, until a matching vcpu_put(), but assumes
> * vcpu mutex is already taken.
> @@ -2316,6 +2383,14 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>
> vmx_vcpu_pi_load(vcpu, cpu);
> vmx->host_pkru = read_pkru();
> +
> + /*
> + * Load guset's SGX LE pubkey hash if runtime launch control is
> + * enabled.
> + */
> + if (guest_cpuid_has_sgx_launch_control(vcpu) &&
> + cpu_sgx_lepubkeyhash_writable())
> + vmx_sgx_lepubkeyhash_load(vcpu);
> }
>
> static void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
> @@ -3225,6 +3300,19 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> case MSR_IA32_FEATURE_CONTROL:
> msr_info->data = to_vmx(vcpu)->msr_ia32_feature_control;
> break;
> + case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3:
> + /*
> + * SDM 35.1 Model-Specific Registers, table 35-2.
> + * Read permitted if CPUID.0x12.0:EAX[0] = 1. (We have
> + * guaranteed this will be true if guest_cpuid_has_sgx
> + * is true.)
> + */
> + if (!guest_cpuid_has_sgx(vcpu))
> + return 1;
> + msr_info->data =
> + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[msr_info->index -
> + MSR_IA32_SGXLEPUBKEYHASH0];
> + break;
> case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
> if (!nested_vmx_allowed(vcpu))
> return 1;
> @@ -3344,6 +3432,37 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> * SGX has been enabled in BIOS before using SGX.
> */
> break;
> + case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3:
> + /*
> + * SDM 35.1 Model-Specific Registers, table 35-2.
> + * - If CPUID.0x7.0:ECX[30] = 1, FEATURE_CONTROL[17] is
> + * available.
> + * - Write permitted if CPUID.0x12.0:EAX[0] = 1 &&
> + * FEATURE_CONTROL[17] = 1 && FEATURE_CONTROL[0] = 1.
> + */
> + if (!guest_cpuid_has_sgx(vcpu) ||
> + !guest_cpuid_has_sgx_launch_control(vcpu))
> + return 1;
> + /*
> + * Don't let userspace set guest's IA32_SGXLEPUBKEYHASHn,
> + * if machine's IA32_SGXLEPUBKEYHASHn cannot be changed at
> + * runtime. Note to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash are
> + * set to default in vmx_create_vcpu therefore guest is able
> + * to get the machine's IA32_SGXLEPUBKEYHASHn by rdmsr in
> + * guest.
> + */
> + if (!cpu_sgx_lepubkeyhash_writable())
> + return 1;
> + /*
> + * If guest's FEATURE_CONTROL[17] is not set, guest's
> + * IA32_SGXLEPUBKEYHASHn are not writeable from guest.
> + */
> + if (!vmx_sgx_lc_disabled_in_bios(vcpu) &&
> + !msr_info->host_initiated)
> + return 1;
> + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[msr_index -
> + MSR_IA32_SGXLEPUBKEYHASH0] = data;
> + break;
> case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
> if (!msr_info->host_initiated)
> return 1; /* they are read-only */
> @@ -9305,6 +9424,10 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
> vmx->nested.vpid02 = allocate_vpid();
> }
>
> + /* Set vcpu's default IA32_SGXLEPUBKEYHASHn */
> + if (enable_sgx && boot_cpu_has(X86_FEATURE_SGX_LAUNCH_CONTROL))
> + vmx_sgx_init_lepubkeyhash(&vmx->vcpu);
> +
> vmx->nested.posted_intr_nv = -1;
> vmx->nested.current_vmptr = -1ull;
> vmx->nested.current_vmcs12 = NULL;
>
4 years, 10 months
Debugging interface
by Jethro Beekman
In order to provide the ability to debug enclaves independently of their
programming model, while supporting live attach (meaning the debugger
might not have seen enclave creation), a debugger needs to following
capabilities for a debugged process:
1) Identify enclaves
a) Given a page, identify it as an enclave page
b) Given an enclave page, find that enclave's base address
c) Given a particular enclave, find the TCSes
d) Given a particular enclave, get the attributes/miscselect
2) Read/write enclave memory
3) Find debugging symbols for enclaves
I think (1)(a) and (1)(b) can be done using /proc/[pid]/maps, but I'd
like confirmation. I think the following is true: If a page is in a
memory range that is listed as mapped from the sgx device, it is an
enclave page. From a memory range that is listed as mapped from the sgx
device, you can compute the enclave base address as (start-offset). (2)
is supported already by the driver using the ptrace interface.
(1)(c) is necessary to find the SSAs and (1)(d) is necessary to
determine SSAFRAMESIZE and make sense of the data in the SSAs. What
would be the best way to expose this information? Best I can think of
currently is a dedicated file in /proc/[pid] where the driver exposes
this information which was originally provided at enclave creation time.
I don't think (3) can universally be done without help from the user
program. Yes, it might be possible to learn something from ELF headers
but Linux does not generally have a requirement that user processes must
have valid ELF information in their memory mappings. Most current
userspace debug symbol loading is based on filenames. SGX does not
provide any record of where enclave memory came from. Perhaps the create
ioctl could include some user data that is later exposed again in the
same proc file mentioned above. Frameworks and toolchains could
establish a convention regarding this user data to find symbols.
--
Jethro Beekman | Fortanix
4 years, 10 months
[PATCH v2] intel_sgx: Fix max enclave size calculation
by Angie Chinchilla
sgx_encl_size_max_64 and sgx_encl_size_max_32 calculations are
incorrect, causing failures creating large enclaves.
2^EDX[7:0] should be used for max enclave size in 32-bit mode
2^EDX[15:8] should be used for max enclave size in 64-bit mode
Signed-off-by: Angie Chinchilla <angie.v.chinchilla(a)intel.com>
---
Changes in v2:
- 1ULL should be shifted, not 2ULL
drivers/platform/x86/intel_sgx/sgx_main.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/platform/x86/intel_sgx/sgx_main.c b/drivers/platform/x86/intel_sgx/sgx_main.c
index cf1e6ec..962768d 100644
--- a/drivers/platform/x86/intel_sgx/sgx_main.c
+++ b/drivers/platform/x86/intel_sgx/sgx_main.c
@@ -312,9 +312,9 @@ static int sgx_drv_probe(struct platform_device *pdev)
cpuid_count(SGX_CPUID, 0x0, &eax, &ebx, &ecx, &edx);
if (edx & 0xFFFF) {
#ifdef CONFIG_X86_64
- sgx_encl_size_max_64 = 2ULL << (edx & 0xFF);
+ sgx_encl_size_max_64 = 1ULL << ((edx >> 8) & 0xFF);
#endif
- sgx_encl_size_max_32 = 2ULL << ((edx >> 8) & 0xFF);
+ sgx_encl_size_max_32 = 1ULL << (edx & 0xFF);
}
return sgx_dev_init(&pdev->dev);
--
2.7.4
4 years, 11 months
[PATCH] intel_sgx: fail if CPUID does not give correct ranges
by Jarkko Sakkinen
The driver init should not continue in a broken environment.
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
---
drivers/platform/x86/intel_sgx/sgx_main.c | 16 +++++++---------
1 file changed, 7 insertions(+), 9 deletions(-)
diff --git a/drivers/platform/x86/intel_sgx/sgx_main.c b/drivers/platform/x86/intel_sgx/sgx_main.c
index 962768d72546..8124d027e5d7 100644
--- a/drivers/platform/x86/intel_sgx/sgx_main.c
+++ b/drivers/platform/x86/intel_sgx/sgx_main.c
@@ -72,9 +72,6 @@
#define DRV_DESCRIPTION "Intel SGX Driver"
#define DRV_VERSION "0.10"
-#define ENCL_SIZE_MAX_64 (64ULL * 1024ULL * 1024ULL * 1024ULL)
-#define ENCL_SIZE_MAX_32 (2ULL * 1024ULL * 1024ULL * 1024ULL)
-
MODULE_DESCRIPTION(DRV_DESCRIPTION);
MODULE_AUTHOR("Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>");
MODULE_VERSION(DRV_VERSION);
@@ -87,8 +84,8 @@ struct workqueue_struct *sgx_add_page_wq;
#define SGX_MAX_EPC_BANKS 8
struct sgx_epc_bank sgx_epc_banks[SGX_MAX_EPC_BANKS];
int sgx_nr_epc_banks;
-u64 sgx_encl_size_max_32 = ENCL_SIZE_MAX_32;
-u64 sgx_encl_size_max_64 = ENCL_SIZE_MAX_64;
+u64 sgx_encl_size_max_32;
+u64 sgx_encl_size_max_64;
u64 sgx_xfrm_mask = 0x3;
u32 sgx_ssaframesize_tbl[64];
bool sgx_has_sgx2;
@@ -310,12 +307,13 @@ static int sgx_drv_probe(struct platform_device *pdev)
}
cpuid_count(SGX_CPUID, 0x0, &eax, &ebx, &ecx, &edx);
- if (edx & 0xFFFF) {
+ if (!(edx & 0xFFFF))
+ return -ENODEV;
+
#ifdef CONFIG_X86_64
- sgx_encl_size_max_64 = 1ULL << ((edx >> 8) & 0xFF);
+ sgx_encl_size_max_64 = 1ULL << ((edx >> 8) & 0xFF);
#endif
- sgx_encl_size_max_32 = 1ULL << (edx & 0xFF);
- }
+ sgx_encl_size_max_32 = 1ULL << (edx & 0xFF);
return sgx_dev_init(&pdev->dev);
}
--
2.11.0
4 years, 11 months
[PATCH] intel_sgx: update documentation with VM guest fallback
by Jarkko Sakkinen
I implemented fallback mechanism for the driver in the last Fall
originally because I had to drop all BUG_ON() usage. The driver
gracefully fallbacks from failing ENCLS in any situation.
It turns out that this could happen in a VM guest even if the
driver is working correctly. This patch adds to the documentation
an explanation how we fallback after the host has woken up in a
VM guest.
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
---
Documentation/x86/intel_sgx.rst | 38 ++++++++++++++++++++++++++------------
1 file changed, 26 insertions(+), 12 deletions(-)
diff --git a/Documentation/x86/intel_sgx.rst b/Documentation/x86/intel_sgx.rst
index 71026732af61..ee7fe9487d7b 100644
--- a/Documentation/x86/intel_sgx.rst
+++ b/Documentation/x86/intel_sgx.rst
@@ -77,29 +77,43 @@ every time when an enclave is launched. This does not scale because for
generic case because BIOS might lock down the MSRs before handover to
the OS.
+Debug enclaves
+--------------
+
+Enclave can be set as a *debug enclave* of which memory can be read or written
+by using the ENCLS(EDBGRD) and ENCLS(EDBGWR) opcodes. The Intel provided launch
+enclave provides them always a valid EINITTOKEN and therefore they are a low
+hanging fruit way to try out SGX.
+
Virtualization
+==============
+
+Launch control
--------------
The values for IA32_SGXLEPUBKEYHASHn MSRs cannot be emulated for a virtual
machine guest. It would easily seem feasible to hold virtual values for these
-MSRs, trap EINIT and use the host LE to generate a token when a guest LE is
-initialized.
+MSRs, trap ENCLS(EINIT) and use the host LE to generate a token when a guest LE
+is initialized.
-However, looking at the pseudo code of ENCLS(EINIT) from the SDM there is
-a constraint that the instruction will fail if ATTRIBUTES.EINITTOKENKEY is
-set (the documentation does not tell the reason why the constraint exists
-but it exists).
+However, looking at the pseudo code of ENCLS(EINIT) from the SDM there is a
+constraint that the instruction will fail if ATTRIBUTES.EINITTOKENKEY is set
+(the documentation does not tell the reason why the constraint exists but it
+exists).
Thus, only when the MSRs are left unlocked before handover to the OS the
setting of these MSRs can be supported for VM guests.
-Debug enclaves
---------------
+Suspend and resume
+------------------
-Enclave can be set as a *debug enclave* of which memory can be read or written
-by using the ENCLS(EDBGRD) and ENCLS(EDBGWR) opcodes. The Intel provided launch
-enclave provides them always a valid EINITTOKEN and therefore they are a low
-hanging fruit way to try out SGX.
+If the host suspends and resumes, the enclave memory for the VM guest could
+become invalid. This can make ENCLS leaf operations suddenly fail.
+
+The driver has a graceful fallback mechanism to manage this situation. If any of
+the ENCLS leaf operations fail, the driver will fallback by kicking threads out
+of the enclave, removing the TCS entries and marking enclave as invalid. After
+this no new pages can be allocated for the enclave and no entry can be done.
SGX uapi
========
--
2.11.0
4 years, 11 months
[PATCH] intel_sgx: Fix max enclave size calculation
by Angie Chinchilla
sgx_encl_size_max_64 and sgx_encl_size_max_32 calculations are
incorrect, causing failures creating large enclaves.
2^EDX[7:0] should be used for max enclave size in 32-bit mode
2^EDX[15:8] should be used for max enclave size in 64-bit mode
Signed-off-by: Angie Chinchilla <angie.v.chinchilla(a)intel.com>
---
drivers/platform/x86/intel_sgx/sgx_main.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/platform/x86/intel_sgx/sgx_main.c b/drivers/platform/x86/intel_sgx/sgx_main.c
index cf1e6ec..1840f81 100644
--- a/drivers/platform/x86/intel_sgx/sgx_main.c
+++ b/drivers/platform/x86/intel_sgx/sgx_main.c
@@ -312,9 +312,9 @@ static int sgx_drv_probe(struct platform_device *pdev)
cpuid_count(SGX_CPUID, 0x0, &eax, &ebx, &ecx, &edx);
if (edx & 0xFFFF) {
#ifdef CONFIG_X86_64
- sgx_encl_size_max_64 = 2ULL << (edx & 0xFF);
+ sgx_encl_size_max_64 = 2ULL << ((edx >> 8) & 0xFF);
#endif
- sgx_encl_size_max_32 = 2ULL << ((edx >> 8) & 0xFF);
+ sgx_encl_size_max_32 = 2ULL << (edx & 0xFF);
}
return sgx_dev_init(&pdev->dev);
--
2.7.4
4 years, 11 months
[RFC][PATCH 00/12] add SGX EPC cgroup controller
by Sean Christopherson
Implement a cgroup controller, sgx_epc, which regulates distribution of
SGX EPC memory. EPC memory is independent from normal system memory;
it is managed by the SGX subsystem and is not accounted by the memory
controller. This approach is necessary as EPC memory must be reserved
at boot, i.e. memory cannot be converted between EPC and normal memory
while the system is running.
Much like normal system memory, EPC memory can be overcommitted via
virtual memory techniques and pages can be swapped out of the EPC
to their backing store (normal system memory allocated via shmem).
The SGX EPC subsystem is analogous to the memory subsytem, and the
SGX EPC controller is in turn analogous to the memory controller;
it implements limit and protection models for EPC memory.
"sgx_epc.high" and "sgx_epc.low" are the main mechanisms to control
EPC usage, while "sgx_epc.max" is a last line of defense mechanism.
"sgx_epc.high" is a best-effort limit of EPC usage. "sgx_epc.low"
is a best-effort protection of EPC usage. "sgx_epc.max" is a hard
limit of EPC usage.
The cgroup to which an EPC page is charged is tracked by the EPC page
itself; charges are not tracked by the owning enclave. This approach
avoids introducing an expensive lock into the EPC page allocation hot
path, and greatly simplifies the logic for handling process migration.
Once an EPC page is charged to a cgroup, it remains charged to that
cgroup until the page is released or reclaimed. Migrating the owning
process to a different cgroup doesn't migrate its current EPC charges.
To allow the SGX EPC controller to reclaim pages of a specific cgroup,
e.g. to enforce the high limit, remove struct sgx_encl's load_list and
instead track loaded EPC pages via a shared list. Tracking loaded EPC
pages via a global list enables a smooth transition to per-cgroup lists
when the EPC controller is enabled. The core SGX code can then iterate
over cgroups when isolating EPC pages for global reclaim.
A (positive) side effect of using a shared list for tracking loaded EPC
pages is that it provides fairer and more accurate LRU-based swapping.
Determining a page's LRU status is no longer dependent on the number of
TGID contexts or enclaves, i.e. the time between calls to age a given
page is purely a function of the number of EPC pages in use system wide,
as opposed to the current approach in which the time between calls may
vary depending on the number of contexts (processes) and enclaves.
Patches 1-4 implement the shared EPC load list. Patches 5-8 are minor
modifications/extensions to the core SGX EPC page reclaim code that add
and/or expose interfaces needed by the the SGX EPC controller. Patches
9-12 add, enable and document the controller.
This patch set is obviously RFC at this time, as applying the cgroup
patches prior to upstreaming the base SGX subsystem is a non-starter.
That being said, if there is interest in merging the global load list
code prior to upstreaming, I'd be happy to send those out separately
for review.
Sean Christopherson (12):
intel_sgx: reuse sgx_epc_page list for loaded pages
intel_sgx: walk pages via radix then VMA tree to zap TCS
intel_sgx: swap pages using common active/lru list
intel_sgx: remove tgid_ctx, track tgid pid in encl
intel_sgx: add struct sgx_epc_lru to encapsulate lru list(s)
intel_sgx: return nr pages reclaimed by sgx_swap_pages
intel_sgx: add swap flag to allow caller to ignore LRU
intel_sgx: declare struct sgx_epc_cgroup and sgx_swap_pages param
cgroup, intel_sgx: add SGX EPC cgroup controller
intel_sgx: enable EPC cgroup controller in SGX core
intel_sgx: add stats and events interfaces to EPC cgroup controller
docs, cgroup, intel_sgx: add SGX EPC cgroup controller documentation
Documentation/cgroup-v2.txt | 205 ++++++++++++++++++-
arch/x86/include/asm/sgx.h | 7 +-
drivers/platform/x86/intel_sgx/Makefile | 2 +
drivers/platform/x86/intel_sgx/sgx.h | 45 +++--
drivers/platform/x86/intel_sgx/sgx_epc_cgroup.c | 835 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
drivers/platform/x86/intel_sgx/sgx_epc_cgroup.h | 164 +++++++++++++++
drivers/platform/x86/intel_sgx/sgx_ioctl.c | 93 ++-------
drivers/platform/x86/intel_sgx/sgx_main.c | 11 +-
drivers/platform/x86/intel_sgx/sgx_page_cache.c | 371 ++++++++++++++++++++++------------
drivers/platform/x86/intel_sgx/sgx_util.c | 58 +++---
include/linux/cgroup_subsys.h | 4 +
init/Kconfig | 12 ++
12 files changed, 1554 insertions(+), 253 deletions(-)
4 years, 11 months
[PATCH v2] intel_sgx: updated documentation about on virtualization
by Jarkko Sakkinen
Updated documentation to document the constraint, which prevents
emulating the MSR values for VM guests.
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
---
v2: fix typos
Documentation/x86/intel_sgx.rst | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/Documentation/x86/intel_sgx.rst b/Documentation/x86/intel_sgx.rst
index ee1db2ca2e39..71026732af61 100644
--- a/Documentation/x86/intel_sgx.rst
+++ b/Documentation/x86/intel_sgx.rst
@@ -77,6 +77,22 @@ every time when an enclave is launched. This does not scale because for
generic case because BIOS might lock down the MSRs before handover to
the OS.
+Virtualization
+--------------
+
+The values for IA32_SGXLEPUBKEYHASHn MSRs cannot be emulated for a virtual
+machine guest. It would easily seem feasible to hold virtual values for these
+MSRs, trap EINIT and use the host LE to generate a token when a guest LE is
+initialized.
+
+However, looking at the pseudo code of ENCLS(EINIT) from the SDM there is
+a constraint that the instruction will fail if ATTRIBUTES.EINITTOKENKEY is
+set (the documentation does not tell the reason why the constraint exists
+but it exists).
+
+Thus, only when the MSRs are left unlocked before handover to the OS the
+setting of these MSRs can be supported for VM guests.
+
Debug enclaves
--------------
--
2.11.0
4 years, 11 months
[PATCH] intel_sgx: updated documentation about on virtualization
by Jarkko Sakkinen
Updated documentation to document the constraint, which prevents
emulating the MSR values for VM guests.
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
---
Documentation/x86/intel_sgx.rst | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/Documentation/x86/intel_sgx.rst b/Documentation/x86/intel_sgx.rst
index ee1db2ca2e39..b5c018081133 100644
--- a/Documentation/x86/intel_sgx.rst
+++ b/Documentation/x86/intel_sgx.rst
@@ -77,6 +77,22 @@ every time when an enclave is launched. This does not scale because for
generic case because BIOS might lock down the MSRs before handover to
the OS.
+Virtualization
+--------------
+
+The values for IA32_SGXLEPUBKEYHASHn MSRs cannot be emulated for a virtual
+machine guest. It would easily seem feasible to hold virtual values for these
+MSRs, trap EINIT and use the host LE to generate token when guest LE is
+initialized.
+
+However, looking at the pseudo code of ENCLS(EINIT) from the SDM there is
+constraint that the instruction will fail if ATTRIBUTES.EINITTOKENKEY is
+set (the documentation does not tell the reason why the constraint exists
+but it exists).
+
+Thus, only on when the MSRs are left unlocked efore handover to the OS the
+setting of these MSRs can be supported for VM guests.
+
Debug enclaves
--------------
--
2.11.0
4 years, 11 months
[PATCH] intel_sgx: clean up EPC bank code
by Jarkko Sakkinen
Removed redundant probing code and cleaned up style issues in code for
multiple EPC banks.
Reported-by: Dmitrii Kuvaiskii <Dmitrii.Kuvaiskii(a)tu-dresden.de>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
---
drivers/platform/x86/intel_sgx/sgx.h | 6 +-
drivers/platform/x86/intel_sgx/sgx_main.c | 112 ++++++------------------
drivers/platform/x86/intel_sgx/sgx_page_cache.c | 13 +--
3 files changed, 37 insertions(+), 94 deletions(-)
diff --git a/drivers/platform/x86/intel_sgx/sgx.h b/drivers/platform/x86/intel_sgx/sgx.h
index d5cdf962219d..3ccbc0ffbabc 100644
--- a/drivers/platform/x86/intel_sgx/sgx.h
+++ b/drivers/platform/x86/intel_sgx/sgx.h
@@ -149,11 +149,11 @@ struct sgx_encl {
};
struct sgx_epc_bank {
+ unsigned long pa;
#ifdef CONFIG_X86_64
- void *mem;
+ unsigned long va;
#endif
- unsigned long start;
- unsigned long end;
+ unsigned long size;
};
extern struct workqueue_struct *sgx_add_page_wq;
diff --git a/drivers/platform/x86/intel_sgx/sgx_main.c b/drivers/platform/x86/intel_sgx/sgx_main.c
index 98ee05de8db8..cf1e6ec2cd7e 100644
--- a/drivers/platform/x86/intel_sgx/sgx_main.c
+++ b/drivers/platform/x86/intel_sgx/sgx_main.c
@@ -161,78 +161,6 @@ static struct miscdevice sgx_dev = {
.mode = 0666,
};
-static int sgx_init_platform(void)
-{
- unsigned int eax, ebx, ecx, edx;
- unsigned long size;
- int i;
-
- cpuid(0, &eax, &ebx, &ecx, &edx);
- if (eax < SGX_CPUID) {
- pr_err("intel_sgx: CPUID is missing the SGX leaf instruction\n");
- return -ENODEV;
- }
-
- if (!boot_cpu_has(X86_FEATURE_SGX)) {
- pr_err("intel_sgx: CPU is missing the SGX feature\n");
- return -ENODEV;
- }
-
- cpuid_count(SGX_CPUID, 0x0, &eax, &ebx, &ecx, &edx);
- if (!(eax & 1)) {
- pr_err("intel_sgx: CPU does not support the SGX 1.0 instruction set\n");
- return -ENODEV;
- }
-
- if (boot_cpu_has(X86_FEATURE_OSXSAVE)) {
- cpuid_count(SGX_CPUID, 0x1, &eax, &ebx, &ecx, &edx);
- sgx_xfrm_mask = (((u64)edx) << 32) + (u64)ecx;
- for (i = 2; i < 64; i++) {
- cpuid_count(0x0D, i, &eax, &ebx, &ecx, &edx);
- if ((1 << i) & sgx_xfrm_mask)
- sgx_ssaframesize_tbl[i] =
- (168 + eax + ebx + PAGE_SIZE - 1) /
- PAGE_SIZE;
- }
- }
-
- cpuid_count(SGX_CPUID, 0x0, &eax, &ebx, &ecx, &edx);
- if (edx & 0xFFFF) {
-#ifdef CONFIG_X86_64
- sgx_encl_size_max_64 = 1ULL << (edx & 0xFF);
-#endif
- sgx_encl_size_max_32 = 1ULL << ((edx >> 8) & 0xFF);
- }
-
- sgx_nr_epc_banks = 0;
- do {
- cpuid_count(SGX_CPUID, sgx_nr_epc_banks + 2,
- &eax, &ebx, &ecx, &edx);
- if (eax & 0xf) {
- sgx_epc_banks[sgx_nr_epc_banks].start =
- (((u64) (ebx & 0xfffff)) << 32) +
- (u64) (eax & 0xfffff000);
- size = (((u64) (edx & 0xfffff)) << 32) +
- (u64) (ecx & 0xfffff000);
- sgx_epc_banks[sgx_nr_epc_banks].end =
- sgx_epc_banks[sgx_nr_epc_banks].start + size;
- if (!sgx_epc_banks[sgx_nr_epc_banks].start)
- return -ENODEV;
- sgx_nr_epc_banks++;
- } else {
- break;
- }
- } while (sgx_nr_epc_banks < SGX_MAX_EPC_BANKS);
-
- /* There should be at least one EPC area or something is wrong. */
- if (!sgx_nr_epc_banks) {
- WARN_ON(1);
- return 1;
- }
-
- return 0;
-}
-
static int sgx_pm_suspend(struct device *dev)
{
struct sgx_tgid_ctx *ctx;
@@ -262,6 +190,9 @@ static SIMPLE_DEV_PM_OPS(sgx_drv_pm, sgx_pm_suspend, sgx_pm_resume);
static int sgx_dev_init(struct device *dev)
{
+ unsigned int eax, ebx, ecx, edx;
+ unsigned long pa;
+ unsigned long size;
unsigned int wq_flags;
int ret;
int i;
@@ -271,28 +202,37 @@ static int sgx_dev_init(struct device *dev)
if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
return -ENODEV;
- ret = sgx_init_platform();
- if (ret)
- return ret;
+ for (i = 0; i < SGX_MAX_EPC_BANKS; i++) {
+ cpuid_count(SGX_CPUID, i + 2, &eax, &ebx, &ecx, &edx);
+ if (!(eax & 0xf))
+ break;
+
+ pa = ((u64)(ebx & 0xfffff) << 32) + (u64)(eax & 0xfffff000);
+ size = ((u64)(edx & 0xfffff) << 32) + (u64)(ecx & 0xfffff000);
+
+ dev_info(dev, "EPC bank 0x%lx-0x%lx\n", pa, pa + size);
+
+ sgx_epc_banks[i].pa = pa;
+ sgx_epc_banks[i].size = size;
+ }
- pr_info("intel_sgx: Number of EPCs %d\n", sgx_nr_epc_banks);
+ sgx_nr_epc_banks = i;
for (i = 0; i < sgx_nr_epc_banks; i++) {
- pr_info("intel_sgx: EPC memory range 0x%lx-0x%lx\n",
- sgx_epc_banks[i].start, sgx_epc_banks[i].end);
#ifdef CONFIG_X86_64
- sgx_epc_banks[i].mem = ioremap_cache(sgx_epc_banks[i].start,
- sgx_epc_banks[i].end - sgx_epc_banks[i].start);
- if (!sgx_epc_banks[i].mem) {
+ sgx_epc_banks[i].va = (unsigned long)
+ ioremap_cache(sgx_epc_banks[i].pa,
+ sgx_epc_banks[i].size);
+ if (!sgx_epc_banks[i].va) {
sgx_nr_epc_banks = i;
ret = -ENOMEM;
goto out_iounmap;
}
#endif
- ret = sgx_add_epc_bank(sgx_epc_banks[i].start,
- sgx_epc_banks[i].end - sgx_epc_banks[i].start);
+ ret = sgx_add_epc_bank(sgx_epc_banks[i].pa,
+ sgx_epc_banks[i].size);
if (ret) {
- sgx_nr_epc_banks = i+1;
+ sgx_nr_epc_banks = i + 1;
goto out_iounmap;
}
}
@@ -325,7 +265,7 @@ static int sgx_dev_init(struct device *dev)
out_iounmap:
#ifdef CONFIG_X86_64
for (i = 0; i < sgx_nr_epc_banks; i++)
- iounmap(sgx_epc_banks[i].mem);
+ iounmap((void *)sgx_epc_banks[i].va);
#endif
return ret;
}
@@ -388,7 +328,7 @@ static int sgx_drv_remove(struct platform_device *pdev)
destroy_workqueue(sgx_add_page_wq);
#ifdef CONFIG_X86_64
for (i = 0; i < sgx_nr_epc_banks; i++)
- iounmap(sgx_epc_banks[i].mem);
+ iounmap((void *)sgx_epc_banks[i].va);
#endif
sgx_page_cache_teardown();
diff --git a/drivers/platform/x86/intel_sgx/sgx_page_cache.c b/drivers/platform/x86/intel_sgx/sgx_page_cache.c
index 8e01427eea39..3ace84724273 100644
--- a/drivers/platform/x86/intel_sgx/sgx_page_cache.c
+++ b/drivers/platform/x86/intel_sgx/sgx_page_cache.c
@@ -546,14 +546,17 @@ void *sgx_get_page(struct sgx_epc_page *entry)
#ifdef CONFIG_X86_32
return kmap_atomic_pfn(PFN_DOWN(entry->pa));
#else
+ unsigned long offset;
int i;
for (i = 0; i < sgx_nr_epc_banks; i++) {
- if (entry->pa < sgx_epc_banks[i].end &&
- entry->pa >= sgx_epc_banks[i].start) {
- return sgx_epc_banks[i].mem +
- (entry->pa - sgx_epc_banks[i].start);
- }
+ if (entry->pa < sgx_epc_banks[i].pa)
+ continue;
+
+ offset = entry->pa - sgx_epc_banks[i].pa;
+
+ if (offset < sgx_epc_banks[i].size)
+ return (void *)(sgx_epc_banks[i].va + offset);
}
return NULL;
--
2.11.0
4 years, 11 months