summaryrefslogtreecommitdiff
path: root/arch/arm64/kvm
AgeCommit message (Collapse)Author
2019-03-23arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2Dave Martin
commit c88b093693ccbe41991ef2e9b1d251945e6e54ed upstream. Due to what looks like a typo dating back to the original addition of FPEXC32_EL2 handling, KVM currently initialises this register to an architecturally invalid value. As a result, the VECITR field (RES1) in bits [10:8] is initialised with 0, and the two reserved (RES0) bits [6:5] are initialised with 1. (In the Common VFP Subarchitecture as specified by ARMv7-A, these two bits were IMP DEF. ARMv8-A removes them.) This patch changes the reset value from 0x70 to 0x700, which reflects the architectural constraints and is presumably what was originally intended. Cc: <stable@vger.kernel.org> # 4.12.x- Cc: Christoffer Dall <christoffer.dall@arm.com> Fixes: 62a89c44954f ("arm64: KVM: 32bit handling of coprocessor traps") Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-23KVM: arm64: Forbid kprobing of the VHE world-switch codeJames Morse
[ Upstream commit 7d82602909ed9c73b34ad26f05d10db4850a4f8c ] On systems with VHE the kernel and KVM's world-switch code run at the same exception level. Code that is only used on a VHE system does not need to be annotated as __hyp_text as it can reside anywhere in the kernel text. __hyp_text was also used to prevent kprobes from patching breakpoint instructions into this region, as this code runs at a different exception level. While this is no longer true with VHE, KVM still switches VBAR_EL1, meaning a kprobe's breakpoint executed in the world-switch code will cause a hyp-panic. echo "p:weasel sysreg_save_guest_state_vhe" > /sys/kernel/debug/tracing/kprobe_events echo 1 > /sys/kernel/debug/tracing/events/kprobes/weasel/enable lkvm run -k /boot/Image --console serial -p "console=ttyS0 earlycon=uart,mmio,0x3f8" # lkvm run -k /boot/Image -m 384 -c 3 --name guest-1474 Info: Placing fdt at 0x8fe00000 - 0x8fffffff Info: virtio-mmio.devices=0x200@0x10000:36 Info: virtio-mmio.devices=0x200@0x10200:37 Info: virtio-mmio.devices=0x200@0x10400:38 [ 614.178186] Kernel panic - not syncing: HYP panic: [ 614.178186] PS:404003c9 PC:ffff0000100d70e0 ESR:f2000004 [ 614.178186] FAR:0000000080080000 HPFAR:0000000000800800 PAR:1d00007edbadc0de [ 614.178186] VCPU:00000000f8de32f1 [ 614.178383] CPU: 2 PID: 1482 Comm: kvm-vcpu-0 Not tainted 5.0.0-rc2 #10799 [ 614.178446] Call trace: [ 614.178480] dump_backtrace+0x0/0x148 [ 614.178567] show_stack+0x24/0x30 [ 614.178658] dump_stack+0x90/0xb4 [ 614.178710] panic+0x13c/0x2d8 [ 614.178793] hyp_panic+0xac/0xd8 [ 614.178880] kvm_vcpu_run_vhe+0x9c/0xe0 [ 614.178958] kvm_arch_vcpu_ioctl_run+0x454/0x798 [ 614.179038] kvm_vcpu_ioctl+0x360/0x898 [ 614.179087] do_vfs_ioctl+0xc4/0x858 [ 614.179174] ksys_ioctl+0x84/0xb8 [ 614.179261] __arm64_sys_ioctl+0x28/0x38 [ 614.179348] el0_svc_common+0x94/0x108 [ 614.179401] el0_svc_handler+0x38/0x78 [ 614.179487] el0_svc+0x8/0xc [ 614.179558] SMP: stopping secondary CPUs [ 614.179661] Kernel Offset: disabled [ 614.179695] CPU features: 0x003,2a80aa38 [ 614.179758] Memory Limit: none [ 614.179858] ---[ end Kernel panic - not syncing: HYP panic: [ 614.179858] PS:404003c9 PC:ffff0000100d70e0 ESR:f2000004 [ 614.179858] FAR:0000000080080000 HPFAR:0000000000800800 PAR:1d00007edbadc0de [ 614.179858] VCPU:00000000f8de32f1 ]--- Annotate the VHE world-switch functions that aren't marked __hyp_text using NOKPROBE_SYMBOL(). Signed-off-by: James Morse <james.morse@arm.com> Fixes: 3f5c90b890ac ("KVM: arm64: Introduce VHE-specific kvm_vcpu_run") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-23arm/arm64: KVM: Don't panic on failure to properly reset system registersMarc Zyngier
[ Upstream commit 20589c8cc47dce5854c8bf1b44a9fc63d798d26d ] Failing to properly reset system registers is pretty bad. But not quite as bad as bringing the whole machine down... So warn loudly, but slightly more gracefully. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-23arm/arm64: KVM: Allow a VCPU to fully reset itselfMarc Zyngier
[ Upstream commit 358b28f09f0ab074d781df72b8a671edb1547789 ] The current kvm_psci_vcpu_on implementation will directly try to manipulate the state of the VCPU to reset it. However, since this is not done on the thread that runs the VCPU, we can end up in a strangely corrupted state when the source and target VCPUs are running at the same time. Fix this by factoring out all reset logic from the PSCI implementation and forwarding the required information along with a request to the target VCPU. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-23KVM: arm/arm64: Reset the VCPU without preemption and vcpu state loadedChristoffer Dall
[ Upstream commit e761a927bc9a7ee6ceb7c4f63d5922dbced87f0d ] We have two ways to reset a vcpu: - either through VCPU_INIT - or through a PSCI_ON call The first one is easy to reason about. The second one is implemented in a more bizarre way, as it is the vcpu that handles PSCI_ON that resets the vcpu that is being powered-on. As we need to turn the logic around and have the target vcpu to reset itself, we must take some preliminary steps. Resetting the VCPU state modifies the system register state in memory, but this may interact with vcpu_load/vcpu_put if running with preemption disabled, which in turn may lead to corrupted system register state. Address this by disabling preemption and doing put/load if required around the reset logic. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-22arm64/kvm: consistently handle host HCR_EL2 flagsMark Rutland
[ Upstream commit 4eaed6aa2c628101246bcabc91b203bfac1193f8 ] In KVM we define the configuration of HCR_EL2 for a VHE HOST in HCR_HOST_VHE_FLAGS, but we don't have a similar definition for the non-VHE host flags, and open-code HCR_RW. Further, in head.S we open-code the flags for VHE and non-VHE configurations. In future, we're going to want to configure more flags for the host, so lets add a HCR_HOST_NVHE_FLAGS defintion, and consistently use both HCR_HOST_VHE_FLAGS and HCR_HOST_NVHE_FLAGS in the kvm code and head.S. We now use mov_q to generate the HCR_EL2 value, as we use when configuring other registers in head.S. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: kvmarm@lists.cs.columbia.edu Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-09arm64: KVM: Make VHE Stage-2 TLB invalidation operations non-interruptibleMarc Zyngier
commit c987876a80e7bcb98a839f10dca9ce7fda4feced upstream. Contrary to the non-VHE version of the TLB invalidation helpers, the VHE code has interrupts enabled, meaning that we can take an interrupt in the middle of such a sequence, and start running something else with HCR_EL2.TGE cleared. That's really not a good idea. Take the heavy-handed option and disable interrupts in __tlb_switch_to_guest_vhe, restoring them in __tlb_switch_to_host_vhe. The latter also gain an ISB in order to make sure that TGE really has taken effect. Cc: stable@vger.kernel.org Acked-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-01arm64: KVM: Sanitize PSTATE.M when being set from userspaceMarc Zyngier
Not all execution modes are valid for a guest, and some of them depend on what the HW actually supports. Let's verify that what userspace provides is compatible with both the VM settings and the HW capabilities. Cc: <stable@vger.kernel.org> Fixes: 0d854a60b1d7 ("arm64: KVM: enable initialization of a 32bit vcpu") Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-10-01arm64: KVM: Tighten guest core register access from userspaceDave Martin
We currently allow userspace to access the core register file in about any possible way, including straddling multiple registers and doing unaligned accesses. This is not the expected use of the ABI, and nobody is actually using it that way. Let's tighten it by explicitly checking the size and alignment for each field of the register file. Cc: <stable@vger.kernel.org> Fixes: 2f4a07c5f9fe ("arm64: KVM: guest one-reg interface") Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> [maz: rewrote Dave's initial patch to be more easily backported] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-09-07arm64: KVM: Only force FPEXC32_EL2.EN if trapping FPSIMDMarc Zyngier
If trapping FPSIMD in the context of an AArch32 guest, it is critical to set FPEXC32_EL2.EN to 1 so that the trapping is taken to EL2 and not EL1. Conversely, it is just as critical *not* to set FPEXC32_EL2.EN to 1 if we're not going to trap FPSIMD, as we then corrupt the existing VFP state. Moving the call to __activate_traps_fpsimd32 to the point where we know for sure that we are going to trap ensures that we don't set that bit spuriously. Fixes: e6b673b741ea ("KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing") Cc: stable@vger.kernel.org # v4.18 Cc: Dave Martin <dave.martin@arm.com> Reported-by: Alexander Graf <agraf@suse.de> Tested-by: Alexander Graf <agraf@suse.de> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
2018-08-22Merge tag 'kvmarm-for-v4.19' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm updates for 4.19 - Support for Group0 interrupts in guests - Cache management optimizations for ARMv8.4 systems - Userspace interface for RAS, allowing error retrival and injection - Fault path optimization - Emulated physical timer fixes - Random cleanups
2018-08-12KVM: arm64: vgic-v3: Add support for ICC_SGI0R_EL1 and ICC_ASGI1R_EL1 accessesMarc Zyngier
In order to generate Group0 SGIs, let's add some decoding logic to access_gic_sgi(), and pass the generating group accordingly. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-08-12KVM: arm/arm64: vgic-v3: Add core support for Group0 SGIsMarc Zyngier
Although vgic-v3 now supports Group0 interrupts, it still doesn't deal with Group0 SGIs. As usually with the GIC, nothing is simple: - ICC_SGI1R can signal SGIs of both groups, since GICD_CTLR.DS==1 with KVM (as per 8.1.10, Non-secure EL1 access) - ICC_SGI0R can only generate Group0 SGIs - ICC_ASGI1R sees its scope refocussed to generate only Group0 SGIs (as per the note at the bottom of Table 8-14) We only support Group1 SGIs so far, so no material change. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-08-12KVM: arm64: Remove non-existent AArch32 ICC_SGI1R encodingMarc Zyngier
ICC_SGI1R is a 64bit system register, even on AArch32. It is thus pointless to have such an encoding in the 32bit cp15 array. Let's drop it. Reviewed-by: Eric Auger <eric.auger@redhat.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-07-26arm64: Add support for STACKLEAK gcc pluginLaura Abbott
This adds support for the STACKLEAK gcc plugin to arm64 by implementing stackleak_check_alloca(), based heavily on the x86 version, and adding the two helpers used by the stackleak common code: current_top_of_stack() and on_thread_stack(). The stack erasure calls are made at syscall returns. Additionally, this disables the plugin in hypervisor and EFI stub code, which are out of scope for the protection. Acked-by: Alexander Popov <alex.popov@linux.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-21KVM: arm64: Share the parts of get/set events useful to 32bitJames Morse
The get/set events helpers to do some work to check reserved and padding fields are zero. This is useful on 32bit too. Move this code into virt/kvm/arm/arm.c, and give the arch code some underscores. This is temporarily hidden behind __KVM_HAVE_VCPU_EVENTS until 32bit is wired up. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Dongjiu Geng <gengdongjiu@huawei.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-07-21arm64: KVM: export the capability to set guest SError syndromeDongjiu Geng
For the arm64 RAS Extension, user space can inject a virtual-SError with specified ESR. So user space needs to know whether KVM support to inject such SError, this interface adds this query for this capability. KVM will check whether system support RAS Extension, if supported, KVM returns true to user space, otherwise returns false. Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> Reviewed-by: James Morse <james.morse@arm.com> [expanded documentation wording] Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-07-21arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTSDongjiu Geng
For the migrating VMs, user space may need to know the exception state. For example, in the machine A, KVM make an SError pending, when migrate to B, KVM also needs to pend an SError. This new IOCTL exports user-invisible states related to SError. Together with appropriate user space changes, user space can get/set the SError exception state to do migrate/snapshot/suspend. Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> Reviewed-by: James Morse <james.morse@arm.com> [expanded documentation wording] Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-07-21arm64: KVM: Cleanup tpidr_el2 init on non-VHEMarc Zyngier
When running on a non-VHE system, we initialize tpidr_el2 to contain the per-CPU offset required to reach per-cpu variables. Actually, we initialize it twice: the first time as part of the EL2 initialization, by copying tpidr_el1 into its el2 counterpart, and another time by calling into __kvm_set_tpidr_el2. It turns out that the first part is wrong, as it includes the distance between the kernel mapping and the linear mapping, while EL2 only cares about the linear mapping. This was the last vestige of the first per-cpu use of tpidr_el2 that came in with SDEI. The only caller then was hyp_panic(), and its now using the pc-relative get_host_ctxt() stuff, instead of kimage addresses from the literal pool. It is not a big deal, as we override it straight away, but it is slightly confusing. In order to clear said confusion, let's set this directly as part of the hyp-init code, and drop the ad-hoc HYP helper. Reviewed-by: James Morse <james.morse@arm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-07-09arm64: KVM: Handle Set/Way CMOs as NOPs if FWB is presentMarc Zyngier
Set/Way handling is one of the ugliest corners of KVM. We shouldn't have to handle that, but better safe than sorry. Thankfully, FWB fixes this for us by not requiering any maintenance (the guest is forced to use cacheable memory, no matter what it says, and the whole system is garanteed to be cache coherent), which means we don't have to emulate S/W CMOs, and don't have to track VM ops either. We still have to trap S/W though, if only to prevent the guest from doing something bad. Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-07-05kvm/arm: use PSR_AA32 definitionsMark Rutland
Some code cares about the SPSR_ELx format for exceptions taken from AArch32 to inspect or manipulate the SPSR_ELx value, which is already in the SPSR_ELx format, and not in the AArch32 PSR format. To separate these from cases where we care about the AArch32 PSR format, migrate these cases to use the PSR_AA32_* definitions rather than COMPAT_PSR_*. There should be no functional change as a result of this patch. Note that arm64 KVM does not support a compat KVM API, and always uses the SPSR_ELx format, even for AArch32 guests. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-06-21KVM: arm64: Avoid mistaken attempts to save SVE state for vcpusDave Martin
Commit e6b673b ("KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing") uses fpsimd_save() to save the FPSIMD state for a vcpu when scheduling the vcpu out. However, currently current's value of TIF_SVE is restored before calling fpsimd_save() which means that fpsimd_save() may erroneously attempt to save SVE state from the vcpu. This enables current's vector state to be polluted with guest data. current->thread.sve_state may be unallocated or not large enough, so this can also trigger a NULL dereference or buffer overrun. Instead of this, TIF_SVE should be configured properly for the guest when calling fpsimd_save() with the vcpu context loaded. This patch ensures this by delaying restoration of current's TIF_SVE until after the call to fpsimd_save(). Fixes: e6b673b741ea ("KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing") Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-06-21KVM: arm64/sve: Fix SVE trap restoration for non-current tasksDave Martin
Commit e6b673b ("KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing") attempts to restore the configuration of userspace SVE trapping via a call to fpsimd_bind_task_to_cpu(), but the logic for determining when to do this is not correct. The patch makes the errnoenous assumption that the only task that may try to enter userspace with the currently loaded FPSIMD/SVE register content is current. This may not be the case however: if some other user task T is scheduled on the CPU during the execution of the KVM run loop, and the vcpu does not try to use the registers in the meantime, then T's state may be left there intact. If T happens to be the next task to enter userspace on this CPU then the hooks for reloading the register state and configuring traps will be skipped. (Also, current never has SVE state at this point anyway and should always have the trap enabled, as a side-effect of the ioctl() syscall needed to reach the KVM run loop in the first place.) This patch instead restores the state of the EL0 trap from the state observed at the most recent vcpu_load(), ensuring that the trap is set correctly for the loaded context (if any). Fixes: e6b673b741ea ("KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing") Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-06-21KVM: arm64: Don't mask softirq with IRQs disabled in vcpu_put()Dave Martin
Commit e6b673b ("KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing") introduces a specific helper kvm_arch_vcpu_put_fp() for saving the vcpu FPSIMD state during vcpu_put(). This function uses local_bh_disable()/_enable() to protect the FPSIMD context manipulation from interruption by softirqs. This approach is not correct, because vcpu_put() can be invoked either from the KVM host vcpu thread (when exiting the vcpu run loop), or via a preempt notifier. In the former case, only preemption is disabled. In the latter case, the function is called from inside __schedule(), which means that IRQs are disabled. Use of local_bh_disable()/_enable() with IRQs disabled is considerd an error, resulting in lockdep splats while running VMs if lockdep is enabled. This patch disables IRQs instead of attempting to disable softirqs, avoiding the problem of calling local_bh_enable() with IRQs disabled in the __schedule() path. This creates an additional interrupt blackout during vcpu run loop exit, but this is the rare case and the blackout latency is still less than that of __schedule(). Fixes: e6b673b741ea ("KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing") Reported-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-06-12Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "Small update for KVM: ARM: - lazy context-switching of FPSIMD registers on arm64 - "split" regions for vGIC redistributor s390: - cleanups for nested - clock handling - crypto - storage keys - control register bits x86: - many bugfixes - implement more Hyper-V super powers - implement lapic_timer_advance_ns even when the LAPIC timer is emulated using the processor's VMX preemption timer. - two security-related bugfixes at the top of the branch" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (79 commits) kvm: fix typo in flag name kvm: x86: use correct privilege level for sgdt/sidt/fxsave/fxrstor access KVM: x86: pass kvm_vcpu to kvm_read_guest_virt and kvm_write_guest_virt_system KVM: x86: introduce linear_{read,write}_system kvm: nVMX: Enforce cpl=0 for VMX instructions kvm: nVMX: Add support for "VMWRITE to any supported field" kvm: nVMX: Restrict VMX capability MSR changes KVM: VMX: Optimize tscdeadline timer latency KVM: docs: nVMX: Remove known limitations as they do not exist now KVM: docs: mmu: KVM support exposing SLAT to guests kvm: no need to check return value of debugfs_create functions kvm: Make VM ioctl do valloc for some archs kvm: Change return type to vm_fault_t KVM: docs: mmu: Fix link to NPT presentation from KVM Forum 2008 kvm: x86: Amend the KVM_GET_SUPPORTED_CPUID API documentation KVM: x86: hyperv: declare KVM_CAP_HYPERV_TLBFLUSH capability KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}_EX implementation KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} implementation KVM: introduce kvm_make_vcpus_request_mask() API KVM: x86: hyperv: do rep check for each hypercall separately ...
2018-05-31arm64: KVM: Add ARCH_WORKAROUND_2 discovery through ARCH_FEATURES_FUNC_IDMarc Zyngier
Now that all our infrastructure is in place, let's expose the availability of ARCH_WORKAROUND_2 to guests. We take this opportunity to tidy up a couple of SMCCC constants. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-05-31arm64: KVM: Handle guest's ARCH_WORKAROUND_2 requestsMarc Zyngier
In order to forward the guest's ARCH_WORKAROUND_2 calls to EL3, add a small(-ish) sequence to handle it at EL2. Special care must be taken to track the state of the guest itself by updating the workaround flags. We also rely on patching to enable calls into the firmware. Note that since we need to execute branches, this always executes after the Spectre-v2 mitigation has been applied. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-05-31arm64: KVM: Add ARCH_WORKAROUND_2 support for guestsMarc Zyngier
In order to offer ARCH_WORKAROUND_2 support to guests, we need a bit of infrastructure. Let's add a flag indicating whether or not the guest uses SSBD mitigation. Depending on the state of this flag, allow KVM to disable ARCH_WORKAROUND_2 before entering the guest, and enable it when exiting it. Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-05-25KVM: arm64: Invoke FPSIMD context switch trap from CDave Martin
The conversion of the FPSIMD context switch trap code to C has added some overhead to calling it, due to the need to save registers that the procedure call standard defines as caller-saved. So, perhaps it is no longer worth invoking this trap handler quite so early. Instead, we can invoke it from fixup_guest_exit(), with little likelihood of increasing the overhead much further. As a convenience, this patch gives __hyp_switch_fpsimd() the same return semantics fixup_guest_exit(). For now there is no possibility of a spurious FPSIMD trap, so the function always returns true, but this allows it to be tail-called with a single return statement. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-05-25KVM: arm64: Fold redundant exit code checks out of fixup_guest_exit()Dave Martin
The entire tail of fixup_guest_exit() is contained in if statements of the form if (x && *exit_code == ARM_EXCEPTION_TRAP). As a result, we can check just once and bail out of the function early, allowing the remaining if conditions to be simplified. The only awkward case is where *exit_code is changed to ARM_EXCEPTION_EL1_SERROR in the case of an illegal GICv2 CPU interface access: in that case, the GICv3 trap handling code is skipped using a goto. This avoids pointlessly evaluating the static branch check for the GICv3 case, even though we can't have vgic_v2_cpuif_trap and vgic_v3_cpuif_trap true simultaneously unless we have a GICv3 and GICv2 on the host: that sounds stupid, but I haven't satisfied myself that it can't happen. No functional change. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-05-25KVM: arm64: Remove redundant *exit_code changes in fpsimd_guest_exit()Dave Martin
In fixup_guest_exit(), there are a couple of cases where after checking what the exit code was, we assign it explicitly with the value it already had. Assuming this is not indicative of a bug, these assignments are not needed. This patch removes the redundant assignments, and simplifies some if-nesting that becomes trivial as a result. No functional change. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-05-25KVM: arm64: Save host SVE context as appropriateDave Martin
This patch adds SVE context saving to the hyp FPSIMD context switch path. This means that it is no longer necessary to save the host SVE state in advance of entering the guest, when in use. In order to avoid adding pointless complexity to the code, VHE is assumed if SVE is in use. VHE is an architectural prerequisite for SVE, so there is no good reason to turn CONFIG_ARM64_VHE off in kernels that support both SVE and KVM. Historically, software models exist that can expose the architecturally invalid configuration of SVE without VHE, so if this situation is detected at kvm_init() time then KVM will be disabled. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-05-25KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashingDave Martin
This patch refactors KVM to align the host and guest FPSIMD save/restore logic with each other for arm64. This reduces the number of redundant save/restore operations that must occur, and reduces the common-case IRQ blackout time during guest exit storms by saving the host state lazily and optimising away the need to restore the host state before returning to the run loop. Four hooks are defined in order to enable this: * kvm_arch_vcpu_run_map_fp(): Called on PID change to map necessary bits of current to Hyp. * kvm_arch_vcpu_load_fp(): Set up FP/SIMD for entering the KVM run loop (parse as "vcpu_load fp"). * kvm_arch_vcpu_ctxsync_fp(): Get FP/SIMD into a safe state for re-enabling interrupts after a guest exit back to the run loop. For arm64 specifically, this involves updating the host kernel's FPSIMD context tracking metadata so that kernel-mode NEON use will cause the vcpu's FPSIMD state to be saved back correctly into the vcpu struct. This must be done before re-enabling interrupts because kernel-mode NEON may be used by softirqs. * kvm_arch_vcpu_put_fp(): Save guest FP/SIMD state back to memory and dissociate from the CPU ("vcpu_put fp"). Also, the arm64 FPSIMD context switch code is updated to enable it to save back FPSIMD state for a vcpu, not just current. A few helpers drive this: * fpsimd_bind_state_to_cpu(struct user_fpsimd_state *fp): mark this CPU as having context fp (which may belong to a vcpu) currently loaded in its registers. This is the non-task equivalent of the static function fpsimd_bind_to_cpu() in fpsimd.c. * task_fpsimd_save(): exported to allow KVM to save the guest's FPSIMD state back to memory on exit from the run loop. * fpsimd_flush_state(): invalidate any context's FPSIMD state that is currently loaded. Used to disassociate the vcpu from the CPU regs on run loop exit. These changes allow the run loop to enable interrupts (and thus softirqs that may use kernel-mode NEON) without having to save the guest's FPSIMD state eagerly. Some new vcpu_arch fields are added to make all this work. Because host FPSIMD state can now be saved back directly into current's thread_struct as appropriate, host_cpu_context is no longer used for preserving the FPSIMD state. However, it is still needed for preserving other things such as the host's system registers. To avoid ABI churn, the redundant storage space in host_cpu_context is not removed for now. arch/arm is not addressed by this patch and continues to use its current save/restore logic. It could provide implementations of the helpers later if desired. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-05-25KVM: arm64: Repurpose vcpu_arch.debug_flags for general-purpose flagsDave Martin
In struct vcpu_arch, the debug_flags field is used to store debug-related flags about the vcpu state. Since we are about to add some more flags related to FPSIMD and SVE, it makes sense to add them to the existing flags field rather than adding new fields. Since there is only one debug_flags flag defined so far, there is plenty of free space for expansion. In preparation for adding more flags, this patch renames the debug_flags field to simply "flags", and updates comments appropriately. The flag definitions are also moved to <asm/kvm_host.h>, since their presence in <asm/kvm_asm.h> was for purely historical reasons: these definitions are not used from asm any more, and not very likely to be as more Hyp asm is migrated to C. KVM_ARM64_DEBUG_DIRTY_SHIFT has not been used since commit 1ea66d27e7b0 ("arm64: KVM: Move away from the assembly version of the world switch"), so this patch gets rid of that too. No functional change. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@arm.com> [maz: fixed minor conflict] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-05-25KVM: arm64: Convert lazy FPSIMD context switch trap to CDave Martin
To make the lazy FPSIMD context switch trap code easier to hack on, this patch converts it to C. This is not amazingly efficient, but the trap should typically only be taken once per host context switch. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-05-04arm64: vgic-v2: Fix proxying of cpuif accessJames Morse
Proxying the cpuif accesses at EL2 makes use of vcpu_data_guest_to_host and co, which check the endianness, which call into vcpu_read_sys_reg... which isn't mapped at EL2 (it was inlined before, and got moved OoL with the VHE optimizations). The result is of course a nice panic. Let's add some specialized cruft to keep the broken platforms that require this hack alive. But, this code used vcpu_data_guest_to_host(), which expected us to write the value to host memory, instead we have trapped the guest's read or write to an mmio-device, and are about to replay it using the host's readl()/writel() which also perform swabbing based on the host endianness. This goes wrong when both host and guest are big-endian, as readl()/writel() will undo the guest's swabbing, causing the big-endian value to be written to device-memory. What needs doing? A big-endian guest will have pre-swabbed data before storing, undo this. If its necessary for the host, writel() will re-swab it. For a read a big-endian guest expects to swab the data after the load. The hosts's readl() will correct for host endianness, giving us the device-memory's value in the register. For a big-endian guest, swab it as if we'd only done the load. For a little-endian guest, nothing needs doing as readl()/writel() leave the correct device-memory value in registers. Tested on Juno with that rarest of things: a big-endian 64K host. Based on a patch from Marc Zyngier. Reported-by: Suzuki K Poulose <suzuki.poulose@arm.com> Fixes: bf8feb39642b ("arm64: KVM: vgic-v2: Add GICV access from HYP") Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-04-20arm/arm64: KVM: Add PSCI version selection APIMarc Zyngier
Although we've implemented PSCI 0.1, 0.2 and 1.0, we expose either 0.1 or 1.0 to a guest, defaulting to the latest version of the PSCI implementation that is compatible with the requested version. This is no different from doing a firmware upgrade on KVM. But in order to give a chance to hypothetical badly implemented guests that would have a fit by discovering something other than PSCI 0.2, let's provide a new API that allows userspace to pick one particular version of the API. This is implemented as a new class of "firmware" registers, where we expose the PSCI version. This allows the PSCI version to be save/restored as part of a guest migration, and also set to any supported version if the guest requires it. Cc: stable@vger.kernel.org #4.16 Reviewed-by: Christoffer Dall <cdall@kernel.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-04-17arm64: KVM: Demote SVE and LORegion warnings to debug onlyMarc Zyngier
While generating a message about guests probing for SVE/LORegions is a useful debugging tool, considering it an error is slightly over the top, as this is the only way the guest can find out about the presence of the feature. Let's turn these message into kvm_debug so that they can only be seen if CONFIG_DYNAMIC_DEBUG, and kept quiet otherwise. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-04-11arm64: Move the content of bpi.S to hyp-entry.SMarc Zyngier
bpi.S was introduced as we were starting to build the Spectre v2 mitigation framework, and it was rather unclear that it would become strictly KVM specific. Now that the picture is a lot clearer, let's move the content of that file to hyp-entry.S, where it actually belong. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-04-11arm64: KVM: Use SMCCC_ARCH_WORKAROUND_1 for Falkor BP hardeningShanker Donthineni
The function SMCCC_ARCH_WORKAROUND_1 was introduced as part of SMC V1.1 Calling Convention to mitigate CVE-2017-5715. This patch uses the standard call SMCCC_ARCH_WORKAROUND_1 for Falkor chips instead of Silicon provider service ID 0xC2001700. Cc: <stable@vger.kernel.org> # 4.14+ Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org> [maz: reworked errata framework integration] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-28Revert "arm64: KVM: Use SMCCC_ARCH_WORKAROUND_1 for Falkor BP hardening"Marc Zyngier
Creates far too many conflicts with arm64/for-next/core, to be resent post -rc1. This reverts commit f9f5dc19509bbef6f5e675346f1a7d7b846bdb12. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-19arm64: KVM: Use SMCCC_ARCH_WORKAROUND_1 for Falkor BP hardeningShanker Donthineni
The function SMCCC_ARCH_WORKAROUND_1 was introduced as part of SMC V1.1 Calling Convention to mitigate CVE-2017-5715. This patch uses the standard call SMCCC_ARCH_WORKAROUND_1 for Falkor chips instead of Silicon provider service ID 0xC2001700. Cc: <stable@vger.kernel.org> # 4.14+ Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-19arm64: KVM: Allow mapping of vectors outside of the RAM regionMarc Zyngier
We're now ready to map our vectors in weird and wonderful locations. On enabling ARM64_HARDEN_EL2_VECTORS, a vector slot gets allocated if this hasn't been already done via ARM64_HARDEN_BRANCH_PREDICTOR and gets mapped outside of the normal RAM region, next to the idmap. That way, being able to obtain VBAR_EL2 doesn't reveal the mapping of the rest of the hypervisor code. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-19arm64: KVM: Allow far branches from vector slots to the main vectorsMarc Zyngier
So far, the branch from the vector slots to the main vectors can at most be 4GB from the main vectors (the reach of ADRP), and this distance is known at compile time. If we were to remap the slots to an unrelated VA, things would break badly. A way to achieve VA independence would be to load the absolute address of the vectors (__kvm_hyp_vector), either using a constant pool or a series of movs, followed by an indirect branch. This patches implements the latter solution, using another instance of a patching callback. Note that since we have to save a register pair on the stack, we branch to the *second* instruction in the vectors in order to compensate for it. This also results in having to adjust this balance in the invalid vector entry point. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-19arm64: KVM: Move BP hardening vectors into .hyp.text sectionMarc Zyngier
There is no reason why the BP hardening vectors shouldn't be part of the HYP text at compile time, rather than being mapped at runtime. Also introduce a new config symbol that controls the compilation of bpi.S. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-19arm64: KVM: Move stashing of x0/x1 into the vector code itselfMarc Zyngier
All our useful entry points into the hypervisor are starting by saving x0 and x1 on the stack. Let's move those into the vectors by introducing macros that annotate whether a vector is valid or not, thus indicating whether we want to stash registers or not. The only drawback is that we now also stash registers for el2_error, but this should never happen, and we pop them back right at the start of the handling sequence. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-19arm64: KVM: Move vector offsetting from hyp-init.S to kvm_get_hyp_vectorMarc Zyngier
We currently provide the hyp-init code with a kernel VA, and expect it to turn it into a HYP va by itself. As we're about to provide the hypervisor with mappings that are not necessarily in the memory range, let's move the kern_hyp_va macro to kvm_get_hyp_vector. No functionnal change. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-19arm64: KVM: Introduce EL2 VA randomisationMarc Zyngier
The main idea behind randomising the EL2 VA is that we usually have a few spare bits between the most significant bit of the VA mask and the most significant bit of the linear mapping. Those bits could be a bunch of zeroes, and could be useful to move things around a bit. Of course, the more memory you have, the less randomisation you get... Alternatively, these bits could be the result of KASLR, in which case they are already random. But it would be nice to have a *different* randomization, just to make the job of a potential attacker a bit more difficult. Inserting these random bits is a bit involved. We don't have a spare register (short of rewriting all the kern_hyp_va call sites), and the immediate we want to insert is too random to be used with the ORR instruction. The best option I could come up with is the following sequence: and x0, x0, #va_mask ror x0, x0, #first_random_bit add x0, x0, #(random & 0xfff) add x0, x0, #(random >> 12), lsl #12 ror x0, x0, #(63 - first_random_bit) making it a fairly long sequence, but one that a decent CPU should be able to execute without breaking a sweat. It is of course NOPed out on VHE. The last 4 instructions can also be turned into NOPs if it appears that there is no free bits to use. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-19arm64: KVM: Dynamically compute the HYP VA maskMarc Zyngier
As we're moving towards a much more dynamic way to compute our HYP VA, let's express the mask in a slightly different way. Instead of comparing the idmap position to the "low" VA mask, we directly compute the mask by taking into account the idmap's (VA_BIT-1) bit. No functionnal change. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-19KVM: arm/arm64: Keep GICv2 HYP VAs in kvm_vgic_global_stateMarc Zyngier
As we're about to change the way we map devices at HYP, we need to move away from kern_hyp_va on an IO address. One way of achieving this is to store the VAs in kvm_vgic_global_state, and use that directly from the HYP code. This requires a small change to create_hyp_io_mappings so that it can also return a HYP VA. We take this opportunity to nuke the vctrl_base field in the emulated distributor, as it is not used anymore. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>