summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2016-01-31arm64: kernel: fix architected PMU registers unconditional accessLorenzo Pieralisi
commit f436b2ac90a095746beb6729b8ee8ed87c9eaede upstream. The Performance Monitors extension is an optional feature of the AArch64 architecture, therefore, in order to access Performance Monitors registers safely, the kernel should detect the architected PMU unit presence through the ID_AA64DFR0_EL1 register PMUVer field before accessing them. This patch implements a guard by reading the ID_AA64DFR0_EL1 register PMUVer field to detect the architected PMU presence and prevent accessing PMU system registers if the Performance Monitors extension is not implemented in the core. Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Fixes: 60792ad349f3 ("arm64: kernel: enforce pmuserenr_el0 initialization and restore") Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31arm64: KVM: Add workaround for Cortex-A57 erratum 834220Marc Zyngier
commit 498cd5c32be6e32bc0f8efcad48ab094bb2bfdf3 upstream. Cortex-A57 parts up to r1p2 can misreport Stage 2 translation faults when a Stage 1 permission fault or device alignment fault should have been reported. This patch implements the workaround (which is to validate that the Stage-1 translation actually succeeds) by using code patching. Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31arm64: restore bogomips information in /proc/cpuinfoYang Shi
commit 92e788b749862ebe9920360513a718e5dd4da7a9 upstream. As previously reported, some userspace applications depend on bogomips showed by /proc/cpuinfo. Although there is much less legacy impact on aarch64 than arm, it does break libvirt. This patch reverts commit 326b16db9f69 ("arm64: delay: don't bother reporting bogomips in /proc/cpuinfo"), but with some tweak due to context change and without the pr_info(). Fixes: 326b16db9f69 ("arm64: delay: don't bother reporting bogomips in /proc/cpuinfo") Signed-off-by: Yang Shi <yang.shi@linaro.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: <stable@vger.kernel.org> # 3.12+ Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31mn10300: Select CONFIG_HAVE_UID16 to fix build failureGuenter Roeck
commit c86576ea114a9a881cf7328dc7181052070ca311 upstream. mn10300 builds fail with fs/stat.c: In function 'cp_old_stat': fs/stat.c:163:2: error: 'old_uid_t' undeclared ipc/util.c: In function 'ipc64_perm_to_ipc_perm': ipc/util.c:540:2: error: 'old_uid_t' undeclared Select CONFIG_HAVE_UID16 and remove local definition of CONFIG_UID16 to fix the problem. Fixes: fbc416ff8618 ("arm64: fix building without CONFIG_UID16") Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31arm64: kernel: enforce pmuserenr_el0 initialization and restoreLorenzo Pieralisi
commit 60792ad349f3c6dc5735aafefe5dc9121c79e320 upstream. The pmuserenr_el0 register value is architecturally UNKNOWN on reset. Current kernel code resets that register value iff the core pmu device is correctly probed in the kernel. On platforms with missing DT pmu nodes (or disabled perf events in the kernel), the pmu is not probed, therefore the pmuserenr_el0 register is not reset in the kernel, which means that its value retains the reset value that is architecturally UNKNOWN (system may run with eg pmuserenr_el0 == 0x1, which means that PMU counters access is available at EL0, which must be disallowed). This patch adds code that resets pmuserenr_el0 on cold boot and restores it on core resume from shutdown, so that the pmuserenr_el0 setup is always enforced in the kernel. Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31arm64: mm: ensure that the zero page is visible to the page table walkerWill Deacon
commit 32d6397805d00573ce1fa55f408ce2bca15b0ad3 upstream. In paging_init, we allocate the zero page, memset it to zero and then point TTBR0 to it in order to avoid speculative fetches through the identity mapping. In order to guarantee that the freshly zeroed page is indeed visible to the page table walker, we need to execute a dsb instruction prior to writing the TTBR. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31arm64: Clear out any singlestep state on a ptrace detach operationJohn Blackwood
commit 5db4fd8c52810bd9740c1240ebf89223b171aa70 upstream. Make sure to clear out any ptrace singlestep state when a ptrace(2) PTRACE_DETACH call is made on arm64 systems. Otherwise, the previously ptraced task will die off with a SIGTRAP signal if the debugger just previously singlestepped the ptraced task. Signed-off-by: John Blackwood <john.blackwood@ccur.com> [will: added comment to justify why this is in the arch code] Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31ARM/arm64: KVM: correct PTE uncachedness checkArd Biesheuvel
commit 0de58f852875a0f0dcfb120bb8433e4e73c7803b upstream. Commit e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's uncachedness") modified the logic to test whether a HYP or stage-2 mapping needs flushing, from [incorrectly] interpreting the page table attributes to [incorrectly] checking whether the PFN that backs the mapping is covered by host system RAM. The PFN number is part of the output of the translation, not the input, so we have to use pte_pfn() on the contents of the PTE, not __phys_to_pfn() on the HYP virtual address or stage-2 intermediate physical address. Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's uncachedness") Tested-by: Pavel Fedin <p.fedin@samsung.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31arm64: KVM: Fix AArch32 to AArch64 register mappingMarc Zyngier
commit c0f0963464c24e034b858441205455bf2a5d93ad upstream. When running a 32bit guest under a 64bit hypervisor, the ARMv8 architecture defines a mapping of the 32bit registers in the 64bit space. This includes banked registers that are being demultiplexed over the 64bit ones. On exceptions caused by an operation involving a 32bit register, the HW exposes the register number in the ESR_EL2 register. It was so far understood that SW had to distinguish between AArch32 and AArch64 accesses (based on the current AArch32 mode and register number). It turns out that I misinterpreted the ARM ARM, and the clue is in D1.20.1: "For some exceptions, the exception syndrome given in the ESR_ELx identifies one or more register numbers from the issued instruction that generated the exception. Where the exception is taken from an Exception level using AArch32 these register numbers give the AArch64 view of the register." Which means that the HW is already giving us the translated version, and that we shouldn't try to interpret it at all (for example, doing an MMIO operation from the IRQ mode using the LR register leads to very unexpected behaviours). The fix is thus not to perform a call to vcpu_reg32() at all from vcpu_reg(), and use whatever register number is supplied directly. The only case we need to find out about the mapping is when we actively generate a register access, which only occurs when injecting a fault in a guest. Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31ARM/arm64: KVM: test properly for a PTE's uncachednessArd Biesheuvel
commit e6fab54423450d699a09ec2b899473a541f61971 upstream. The open coded tests for checking whether a PTE maps a page as uncached use a flawed '(pte_val(xxx) & CONST) != CONST' pattern, which is not guaranteed to work since the type of a mapping is not a set of mutually exclusive bits For HYP mappings, the type is an index into the MAIR table (i.e, the index itself does not contain any information whatsoever about the type of the mapping), and for stage-2 mappings it is a bit field where normal memory and device types are defined as follows: #define MT_S2_NORMAL 0xf #define MT_S2_DEVICE_nGnRE 0x1 I.e., masking *and* comparing with the latter matches on the former, and we have been getting lucky merely because the S2 device mappings also have the PTE_UXN bit set, or we would misidentify memory mappings as device mappings. Since the unmap_range() code path (which contains one instance of the flawed test) is used both for HYP mappings and stage-2 mappings, and considering the difference between the two, it is non-trivial to fix this by rewriting the tests in place, as it would involve passing down the type of mapping through all the functions. However, since HYP mappings and stage-2 mappings both deal with host physical addresses, we can simply check whether the mapping is backed by memory that is managed by the host kernel, and only perform the D-cache maintenance if this is the case. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Pavel Fedin <p.fedin@samsung.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31arm64: kernel: pause/unpause function graph tracer in cpu_suspend()Lorenzo Pieralisi
commit de818bd4522c40ea02a81b387d2fa86f989c9623 upstream. The function graph tracer adds instrumentation that is required to trace both entry and exit of a function. In particular the function graph tracer updates the "return address" of a function in order to insert a trace callback on function exit. Kernel power management functions like cpu_suspend() are called upon power down entry with functions called "finishers" that are in turn called to trigger the power down sequence but they may not return to the kernel through the normal return path. When the core resumes from low-power it returns to the cpu_suspend() function through the cpu_resume path, which leaves the trace stack frame set-up by the function tracer in an incosistent state upon return to the kernel when tracing is enabled. This patch fixes the issue by pausing/resuming the function graph tracer on the thread executing cpu_suspend() (ie the function call that subsequently triggers the "suspend finishers"), so that the function graph tracer state is kept consistent across functions that enter power down states and never return by effectively disabling graph tracer while they are executing. Fixes: 819e50e25d0c ("arm64: Add ftrace support") Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reported-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31arm64: cmpxchg_dbl: fix return value typeLorenzo Pieralisi
commit 57a65667991aaddef730b0c910111ab76a1ff245 upstream. The current arm64 __cmpxchg_double{_mb} implementations carry out the compare exchange by first comparing the old values passed in to the values read from the pointer provided and by stashing the cumulative bitwise difference in a 64-bit register. By comparing the register content against 0, it is possible to detect if the values read differ from the old values passed in, so that the compare exchange detects whether it has to bail out or carry on completing the operation with the exchange. Given the current implementation, to detect the cmpxchg operation status, the __cmpxchg_double{_mb} functions should return the 64-bit stashed bitwise difference so that the caller can detect cmpxchg failure by comparing the return value content against 0. The current implementation declares the return value as an int, which means that the 64-bit value stashing the bitwise difference is truncated before being returned to the __cmpxchg_double{_mb} callers, which means that any bitwise difference present in the top 32 bits goes undetected, triggering false positives and subsequent kernel failures. This patch fixes the issue by declaring the arm64 __cmpxchg_double{_mb} return values as a long, so that the bitwise difference is properly propagated on failure, restoring the expected behaviour. Fixes: e9a4b795652f ("arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU") Cc: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31arm64: bpf: fix mod-by-zero caseZi Shen Lim
commit 14e589ff4aa3f28a5424e92b6495ecb8950080f7 upstream. Turns out in the case of modulo by zero in a BPF program: A = A % X; (X == 0) the expected behavior is to terminate with return value 0. The bug in JIT is exposed by a new test case [1]. [1] https://lkml.org/lkml/2015/11/4/499 Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com> Reported-by: Yang Shi <yang.shi@linaro.org> Reported-by: Xi Wang <xi.wang@gmail.com> CC: Alexei Starovoitov <ast@plumgrid.com> Fixes: e54bcde3d69d ("arm64: eBPF JIT compiler") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31arm64: bpf: fix div-by-zero caseZi Shen Lim
commit 251599e1d6906621f49218d7b474ddd159e58f3b upstream. In the case of division by zero in a BPF program: A = A / X; (X == 0) the expected behavior is to terminate with return value 0. This is confirmed by the test case introduced in commit 86bf1721b226 ("test_bpf: add tests checking that JIT/interpreter sets A and X to 0."). Reported-by: Yang Shi <yang.shi@linaro.org> Tested-by: Yang Shi <yang.shi@linaro.org> CC: Xi Wang <xi.wang@gmail.com> CC: Alexei Starovoitov <ast@plumgrid.com> CC: linux-arm-kernel@lists.infradead.org CC: linux-kernel@vger.kernel.org Fixes: e54bcde3d69d ("arm64: eBPF JIT compiler") Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31powerpc/module: Handle R_PPC64_ENTRY relocationsUlrich Weigand
commit a61674bdfc7c2bf909c4010699607b62b69b7bec upstream. GCC 6 will include changes to generated code with -mcmodel=large, which is used to build kernel modules on powerpc64le. This was necessary because the large model is supposed to allow arbitrary sizes and locations of the code and data sections, but the ELFv2 global entry point prolog still made the unconditional assumption that the TOC associated with any particular function can be found within 2 GB of the function entry point: func: addis r2,r12,(.TOC.-func)@ha addi r2,r2,(.TOC.-func)@l .localentry func, .-func To remove this assumption, GCC will now generate instead this global entry point prolog sequence when using -mcmodel=large: .quad .TOC.-func func: .reloc ., R_PPC64_ENTRY ld r2, -8(r12) add r2, r2, r12 .localentry func, .-func The new .reloc triggers an optimization in the linker that will replace this new prolog with the original code (see above) if the linker determines that the distance between .TOC. and func is in range after all. Since this new relocation is now present in module object files, the kernel module loader is required to handle them too. This patch adds support for the new relocation and implements the same optimization done by the GNU linker. Signed-off-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31powerpc: Make {cmp}xchg* and their atomic_ versions fully orderedBoqun Feng
commit 81d7a3294de7e9828310bbf986a67246b13fa01e upstream. According to memory-barriers.txt, xchg*, cmpxchg* and their atomic_ versions all need to be fully ordered, however they are now just RELEASE+ACQUIRE, which are not fully ordered. So also replace PPC_RELEASE_BARRIER and PPC_ACQUIRE_BARRIER with PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER in __{cmp,}xchg_{u32,u64} respectively to guarantee fully ordered semantics of atomic{,64}_{cmp,}xchg() and {cmp,}xchg(), as a complement of commit b97021f85517 ("powerpc: Fix atomic_xxx_return barrier semantics") This patch depends on patch "powerpc: Make value-returning atomics fully ordered" for PPC_ATOMIC_ENTRY_BARRIER definition. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31powerpc: Make value-returning atomics fully orderedBoqun Feng
commit 49e9cf3f0c04bf76ffa59242254110309554861d upstream. According to memory-barriers.txt: > Any atomic operation that modifies some state in memory and returns > information about the state (old or new) implies an SMP-conditional > general memory barrier (smp_mb()) on each side of the actual > operation ... Which mean these operations should be fully ordered. However on PPC, PPC_ATOMIC_ENTRY_BARRIER is the barrier before the actual operation, which is currently "lwsync" if SMP=y. The leading "lwsync" can not guarantee fully ordered atomics, according to Paul Mckenney: https://lkml.org/lkml/2015/10/14/970 To fix this, we define PPC_ATOMIC_ENTRY_BARRIER as "sync" to guarantee the fully-ordered semantics. This also makes futex atomics fully ordered, which can avoid possible memory ordering problems if userspace code relies on futex system call for fully ordered semantics. Fixes: b97021f85517 ("powerpc: Fix atomic_xxx_return barrier semantics") Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31powerpc/powernv: pr_warn_once on unsupported OPAL_MSG typeStewart Smith
commit 98da62b716a3b24ab8e77453c9a8a954124c18cd upstream. When running on newer OPAL firmware that supports sending extra OPAL_MSG types, we would print a warning on *every* message received. This could be a problem for kernels that don't support OPAL_MSG_OCC on machines that are running real close to thermal limits and the OCC is throttling the chip. For a kernel that is paying attention to the message queue, we could get these notifications quite often. Conceivably, future message types could also come fairly often, and printing that we didn't understand them 10,000 times provides no further information than printing them once. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31powerpc/opal-irqchip: Fix deadlock introduced by "Fix double endian conversion"Alistair Popple
commit 036592fbbe753d236402a0ae68148e7c143a0f0e upstream. Commit 25642e1459ac ("powerpc/opal-irqchip: Fix double endian conversion") fixed an endian bug by calling opal_handle_events() in opal_event_unmask(). However this introduced a deadlock if we find an event is active during unmasking and call opal_handle_events() again. The bad call sequence is: opal_interrupt() -> opal_handle_events() -> generic_handle_irq() -> handle_level_irq() -> raw_spin_lock(&desc->lock) handle_irq_event(desc) unmask_irq(desc) -> opal_event_unmask() -> opal_handle_events() -> generic_handle_irq() -> handle_level_irq() -> raw_spin_lock(&desc->lock) (BOOM) When generating multiple opal events in quick succession this would lead to the following stall warnings: EEH: Fenced PHB#0 detected, location: U78C9.001.WZS09XA-P1-C32 INFO: rcu_sched detected stalls on CPUs/tasks: 12-...: (1 GPs behind) idle=68f/140000000000001/0 softirq=860/861 fqs=2065 15-...: (1 GPs behind) idle=be5/140000000000001/0 softirq=1142/1143 fqs=2065 (detected by 13, t=2102 jiffies, g=1325, c=1324, q=602) NMI watchdog: BUG: soft lockup - CPU#18 stuck for 22s! [irqbalance:2696] INFO: rcu_sched detected stalls on CPUs/tasks: 12-...: (1 GPs behind) idle=68f/140000000000001/0 softirq=860/861 fqs=8371 15-...: (1 GPs behind) idle=be5/140000000000001/0 softirq=1142/1143 fqs=8371 (detected by 20, t=8407 jiffies, g=1325, c=1324, q=1290) This patch corrects the problem by queuing the work if an event is active during unmasking, which is similar to the pre-endian fix behaviour. Fixes: 25642e1459ac ("powerpc/opal-irqchip: Fix double endian conversion") Signed-off-by: Alistair Popple <alistair@popple.id.au> Reported-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31powerpc/opal-irqchip: Fix double endian conversionAlistair Popple
commit 25642e1459ace29f6ce5a171efc8b7b59a52a2d4 upstream. The OPAL event calls return a mask of events that are active in big endian format. This is checked when unmasking the events in the irqchip by comparison with a cached value. The cached value was stored in big endian format but should've been converted to CPU endian first. This bug leads to OPAL event delivery being delayed or dropped on some systems. Symptoms may include a non-functional console. The bug is fixed by calling opal_handle_events(...) instead of duplicating code in opal_event_unmask(...). Fixes: 9f0fd0499d30 ("powerpc/powernv: Add a virtual irqchip for opal events") Reported-by: Douglas L Lehr <dllehr@us.ibm.com> Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31powerpc/tm: Check for already reclaimed tasksMichael Neuling
commit 7f821fc9c77a9b01fe7b1d6e72717b33d8d64142 upstream. Currently we can hit a scenario where we'll tm_reclaim() twice. This results in a TM bad thing exception because the second reclaim occurs when not in suspend mode. The scenario in which this can happen is the following. We attempt to deliver a signal to userspace. To do this we need obtain the stack pointer to write the signal context. To get this stack pointer we must tm_reclaim() in case we need to use the checkpointed stack pointer (see get_tm_stackpointer()). Normally we'd then return directly to userspace to deliver the signal without going through __switch_to(). Unfortunatley, if at this point we get an error (such as a bad userspace stack pointer), we need to exit the process. The exit will result in a __switch_to(). __switch_to() will attempt to save the process state which results in another tm_reclaim(). This tm_reclaim() now causes a TM Bad Thing exception as this state has already been saved and the processor is no longer in TM suspend mode. Whee! This patch checks the state of the MSR to ensure we are TM suspended before we attempt the tm_reclaim(). If we've already saved the state away, we should no longer be in TM suspend mode. This has the additional advantage of checking for a potential TM Bad Thing exception. Found using syscall fuzzer. Fixes: fb09692e71f1 ("powerpc: Add reclaim and recheckpoint functions for context switching transactional memory processes") Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31powerpc/tm: Block signal return setting invalid MSR stateMichael Neuling
commit d2b9d2a5ad5ef04ff978c9923d19730cb05efd55 upstream. Currently we allow both the MSR T and S bits to be set by userspace on a signal return. Unfortunately this is a reserved configuration and will cause a TM Bad Thing exception if attempted (via rfid). This patch checks for this case in both the 32 and 64 bit signals code. If both T and S are set, we mark the context as invalid. Found using a syscall fuzzer. Fixes: 2b0a576d15e0 ("powerpc: Add new transactional memory state to the signal context") Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31net: filter: make JITs zero A for SKF_AD_ALU_XOR_XRabin Vincent
[ Upstream commit 55795ef5469290f89f04e12e662ded604909e462 ] The SKF_AD_ALU_XOR_X ancillary is not like the other ancillary data instructions since it XORs A with X while all the others replace A with some loaded value. All the BPF JITs fail to clear A if this is used as the first instruction in a filter. This was found using american fuzzy lop. Add a helper to determine if A needs to be cleared given the first instruction in a filter, and use this in the JITs. Except for ARM, the rest have only been compile-tested. Fixes: 3480593131e0 ("net: filter: get rid of BPF_S_* enum") Signed-off-by: Rabin Vincent <rabin@rab.in> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31x86/mm: Improve switch_mm() barrier commentsAndy Lutomirski
commit 4eaffdd5a5fe6ff9f95e1ab4de1ac904d5e0fa8b upstream. My previous comments were still a bit confusing and there was a typo. Fix it up. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 71b3c126e611 ("x86/mm: Add barriers and document switch_mm()-vs-flush synchronization") Link: http://lkml.kernel.org/r/0a0b43cdcdd241c5faaaecfbcc91a155ddedc9a1.1452631609.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31x86/mm: Add barriers and document switch_mm()-vs-flush synchronizationAndy Lutomirski
commit 71b3c126e61177eb693423f2e18a1914205b165e upstream. When switch_mm() activates a new PGD, it also sets a bit that tells other CPUs that the PGD is in use so that TLB flush IPIs will be sent. In order for that to work correctly, the bit needs to be visible prior to loading the PGD and therefore starting to fill the local TLB. Document all the barriers that make this work correctly and add a couple that were missing. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31x86/boot: Double BOOT_HEAP_SIZE to 64KBH.J. Lu
commit 8c31902cffc4d716450be549c66a67a8a3dd479c upstream. When decompressing kernel image during x86 bootup, malloc memory for ELF program headers may run out of heap space, which leads to system halt. This patch doubles BOOT_HEAP_SIZE to 64KB. Tested with 32-bit kernel which failed to boot without this patch. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31x86/reboot/quirks: Add iMac10,1 to pci_reboot_dmi_table[]Mario Kleiner
commit 2f0c0b2d96b1205efb14347009748d786c2d9ba5 upstream. Without the reboot=pci method, the iMac 10,1 simply hangs after printing "Restarting system" at the point when it should reboot. This fixes it. Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1450466646-26663-1-git-send-email-mario.kleiner.de@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31KVM: x86: correctly print #AC in tracesPaolo Bonzini
commit aba2f06c070f604e388cf77b1dcc7f4cf4577eb0 upstream. Poor #AC was so unimportant until a few days ago that we were not even tracing its name correctly. But now it's all over the place. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31KVM: x86: expose MSR_TSC_AUX to userspacePaolo Bonzini
commit 9dbe6cf941a6fe82933aef565e4095fb10f65023 upstream. If we do not do this, it is not properly saved and restored across migration. Windows notices due to its self-protection mechanisms, and is very upset about it (blue screen of death). Cc: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSRPaul Mackerras
commit c20875a3e638e4a03e099b343ec798edd1af5cc6 upstream. Currently it is possible for userspace (e.g. QEMU) to set a value for the MSR for a guest VCPU which has both of the TS bits set, which is an illegal combination. The result of this is that when we execute a hrfid (hypervisor return from interrupt doubleword) instruction to enter the guest, the CPU will take a TM Bad Thing type of program interrupt (vector 0x700). Now, if PR KVM is configured in the kernel along with HV KVM, we actually handle this without crashing the host or giving hypervisor privilege to the guest; instead what happens is that we deliver a program interrupt to the guest, with SRR0 reflecting the address of the hrfid instruction and SRR1 containing the MSR value at that point. If PR KVM is not configured in the kernel, then we try to run the host's program interrupt handler with the MMU set to the guest context, which almost certainly causes a host crash. This closes the hole by making kvmppc_set_msr_hv() check for the illegal combination and force the TS field to a safe value (00, meaning non-transactional). Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31KVM: svm: unconditionally intercept #DBPaolo Bonzini
commit cbdb967af3d54993f5814f1cee0ed311a055377d upstream. This is needed to avoid the possibility that the guest triggers an infinite stream of #DB exceptions (CVE-2015-8104). VMX is not affected: because it does not save DR6 in the VMCS, it already intercepts #DB unconditionally. Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31KVM: PPC: Book3S HV: Don't dynamically split core when already splitPaul Mackerras
commit f74f2e2e26199f695ca3df94f29e9ab7cb707ea4 upstream. In static micro-threading modes, the dynamic micro-threading code is supposed to be disabled, because subcores can't make independent decisions about what micro-threading mode to put the core in - there is only one micro-threading mode for the whole core. The code that implements dynamic micro-threading checks for this, except that the check was missed in one case. This means that it is possible for a subcore in static 2-way micro-threading mode to try to put the core into 4-way micro-threading mode, which usually leads to stuck CPUs, spinlock lockups, and other stalls in the host. The problem was in the can_split_piggybacked_subcores() function, which should always return false if the system is in a static micro-threading mode. This fixes the problem by making can_split_piggybacked_subcores() use subcore_config_ok() for its checks, as subcore_config_ok() includes the necessary check for the static micro-threading modes. Credit to Gautham Shenoy for working out that the reason for the hangs and stalls we were seeing was that we were trying to do dynamic 4-way micro-threading while we were in static 2-way mode. Fixes: b4deba5c41e9 Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31KVM: VMX: fix SMEP and SMAP without EPTRadim Krčmář
commit 656ec4a4928a3db7d16e5cb9bce351a478cfd3d5 upstream. The comment in code had it mostly right, but we enable paging for emulated real mode regardless of EPT. Without EPT (which implies emulated real mode), secondary VCPUs won't start unless we disable SM[AE]P when the guest doesn't use paging. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31x86/xen: don't reset vcpu_info on a cancelled suspendOuyang Zhaowei (Charles)
commit 6a1f513776b78c994045287073e55bae44ed9f8c upstream. On a cancelled suspend the vcpu_info location does not change (it's still in the per-cpu area registered by xen_vcpu_setup()). So do not call xen_hvm_init_shared_info() which would make the kernel think its back in the shared info. With the wrong vcpu_info, events cannot be received and the domain will hang after a cancelled suspend. Signed-off-by: Charles Ouyang <ouyangzhaowei@huawei.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31x86/mce: Ensure offline CPUs don't participate in rendezvous processAshok Raj
commit d90167a941f62860f35eb960e1012aa2d30e7e94 upstream. Intel's MCA implementation broadcasts MCEs to all CPUs on the node. This poses a problem for offlined CPUs which cannot participate in the rendezvous process: Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler Kernel Offset: disabled Rebooting in 100 seconds.. More specifically, Linux does a soft offline of a CPU when writing a 0 to /sys/devices/system/cpu/cpuX/online, which doesn't prevent the #MC exception from being broadcasted to that CPU. Ensure that offline CPUs don't participate in the MCE rendezvous and clear the RIP valid status bit so that a second MCE won't cause a shutdown. Without the patch, mce_start() will increment mce_callin and wait for all CPUs. Offlined CPUs should avoid participating in the rendezvous process altogether. Signed-off-by: Ashok Raj <ashok.raj@intel.com> [ Massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1449742346-21470-2-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31x86/paravirt: Prevent rtc_cmos platform device init on PV guestsDavid Vrabel
commit d8c98a1d1488747625ad6044d423406e17e99b7a upstream. Adding the rtc platform device in non-privileged Xen PV guests causes an IRQ conflict because these guests do not have legacy PIC and may allocate irqs in the legacy range. In a single VCPU Xen PV guest we should have: /proc/interrupts: CPU0 0: 4934 xen-percpu-virq timer0 1: 0 xen-percpu-ipi spinlock0 2: 0 xen-percpu-ipi resched0 3: 0 xen-percpu-ipi callfunc0 4: 0 xen-percpu-virq debug0 5: 0 xen-percpu-ipi callfuncsingle0 6: 0 xen-percpu-ipi irqwork0 7: 321 xen-dyn-event xenbus 8: 90 xen-dyn-event hvc_console ... But hvc_console cannot get its interrupt because it is already in use by rtc0 and the console does not work. genirq: Flags mismatch irq 8. 00000000 (hvc_console) vs. 00000000 (rtc0) We can avoid this problem by realizing that unprivileged PV guests (both Xen and lguests) are not supposed to have rtc_cmos device and so adding it is not necessary. Privileged guests (i.e. Xen's dom0) do use it but they should not have irq conflicts since they allocate irqs above legacy range (above gsi_top, in fact). Instead of explicitly testing whether the guest is privileged we can extend pv_info structure to include information about guest's RTC support. Reported-and-tested-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: vkuznets@redhat.com Cc: xen-devel@lists.xenproject.org Cc: konrad.wilk@oracle.com Link: http://lkml.kernel.org/r/1449842873-2613-1-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31x86/signal: Fix restart_syscall number for x32 tasksDmitry V. Levin
commit 22eab1108781eff09961ae7001704f7bd8fb1dce upstream. When restarting a syscall with regs->ax == -ERESTART_RESTARTBLOCK, regs->ax is assigned to a restart_syscall number. For x32 tasks, this syscall number must have __X32_SYSCALL_BIT set, otherwise it will be an x86_64 syscall number instead of a valid x32 syscall number. This issue has been there since the introduction of x32. Reported-by: strace/tests/restart_syscall.test Reported-and-tested-by: Elvira Khabirova <lineprinter0@gmail.com> Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Cc: Elvira Khabirova <lineprinter0@gmail.com> Link: http://lkml.kernel.org/r/20151130215436.GA25996@altlinux.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31x86/mpx: Fix instruction decoder conditionDave Hansen
commit 8e8efe0379bd93e8219ca0fc6fa80b5dd85b09cb upstream. MPX decodes instructions in order to tell which bounds register was violated. Part of this decoding involves looking at the "REX prefix" which is a special instrucion prefix used to retrofit support for new registers in to old instructions. The X86_REX_*() macros are defined to return actual bit values: #define X86_REX_R(rex) ((rex) & 4) *not* boolean values. However, the MPX code was checking for them like they were booleans. This might have led to us mis-decoding the "REX prefix" and giving false information out to userspace about bounds violations. X86_REX_B() actually is bit 1, so this is really only broken for the X86_REX_X() case. Fix the conditionals up to tolerate the non-boolean values. Fixes: fcc7ffd67991 "x86, mpx: Decode MPX instruction to get bound violation information" Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20151201003113.D800C1E0@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31x86 smpboot: Re-enable init_udelay=0 by default on modern CPUsLen Brown
commit 656279a1f3b210cf48ccc572fd7c6b8e2250be77 upstream. commit f1ccd249319e allowed the cmdline "cpu_init_udelay=" to work with all values, including the default of 10000. But in setting the default of 10000, it over-rode the code that sets the delay 0 on modern processors. Also, tidy up use of INT/UINT. Fixes: f1ccd249319e "x86/smpboot: Fix cpu_init_udelay=10000 corner case boot parameter misbehavior" Reported-by: Shane <shrybman@teksavvy.com> Signed-off-by: Len Brown <len.brown@intel.com> Cc: dparsons@brightdsl.net Link: http://lkml.kernel.org/r/9082eb809ef40dad02db714759c7aaf618c518d4.1448232494.git.len.brown@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09xen/events: Always allocate legacy interrupts on PV guestsBoris Ostrovsky
commit b4ff8389ed14b849354b59ce9b360bdefcdbf99c upstream. After commit 8c058b0b9c34 ("x86/irq: Probe for PIC presence before allocating descs for legacy IRQs") early_irq_init() will no longer preallocate descriptors for legacy interrupts if PIC does not exist, which is the case for Xen PV guests. Therefore we may need to allocate those descriptors ourselves. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09KVM: s390: enable SIMD only when no VCPUs were createdDavid Hildenbrand
commit 5967c17b118a2bd1dd1d554cc4eee16233e52bec upstream. We should never allow to enable/disable any facilities for the guest when other VCPUs were already created. kvm_arch_vcpu_(load|put) relies on SIMD not changing during runtime. If somebody would create and run VCPUs and then decides to enable SIMD, undefined behaviour could be possible (e.g. vector save area not being set up). Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09KVM: s390: avoid memory overwrites on emergency signal injectionDavid Hildenbrand
commit b85de33a1a3433487b6a721cfdce25ec8673e622 upstream. Commit 383d0b050106 ("KVM: s390: handle pending local interrupts via bitmap") introduced a possible memory overwrite from user space. User space could pass an invalid emergency signal code (sending VCPU) and therefore exceed the bitmap. Let's take care of this case and check that the id is in the valid range. Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09KVM: s390: fix wrong lookup of VCPUs by array indexDavid Hildenbrand
commit 152e9f65d66f0a3891efc3869440becc0e7ff53f upstream. For now, VCPUs were always created sequentially with incrementing VCPU ids. Therefore, the index in the VCPUs array matched the id. As sequential creation might change with cpu hotplug, let's use the correct lookup function to find a VCPU by id, not array index. Let's also use kvm_lookup_vcpu() for validation of the sending VCPU on external call injection. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09KVM: s390: SCA must not cross page boundariesDavid Hildenbrand
commit c5c2c393468576bad6d10b2b5fefff8cd25df3f4 upstream. We seemed to have missed a few corner cases in commit f6c137ff00a4 ("KVM: s390: randomize sca address"). The SCA has a maximum size of 2112 bytes. By setting the sca_offset to some unlucky numbers, we exceed the page. 0x7c0 (1984) -> Fits exactly 0x7d0 (2000) -> 16 bytes out 0x7e0 (2016) -> 32 bytes out 0x7f0 (2032) -> 48 bytes out One VCPU entry is 32 bytes long. For the last two cases, we actually write data to the other page. 1. The address of the VCPU. 2. Injection/delivery/clearing of SIGP externall calls via SIGP IF. Especially the 2. happens regularly. So this could produce two problems: 1. The guest losing/getting external calls. 2. Random memory overwrites in the host. So this problem happens on every 127 + 128 created VM with 64 VCPUs. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09s390/pci: reshuffle struct used to write debug dataSebastian Ott
commit 7cc8944e13c73374b6f33b39ca24c0891c87b077 upstream. zpci_err_insn writes stale stack content to the debugfs. Ensure that the struct in zpci_err_insn is ordered in a way that we don't have uninitialized holes in it. In addition to that add the packed attribute. Fixes: 3d8258e (s390/pci: move debug messages to debugfs) Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09s390/kernel: fix ptrace peek/poke for floating point registersMartin Schwidefsky
commit 55a423b6f105fa323168f15f4bb67f23b21da44e upstream. git commit 155e839a814834a3b4b31e729f4716e59d3d2dd4 "s390/kernel: dynamically allocate FP register save area" introduced a regression in regard to ptrace. If the vector register extension is not present or unused the ptrace peek of a floating pointer register return incorrect data and the ptrace poke to a floating pointer register overwrites the task structure starting at task->thread.fpu.fprs. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09arm64: page-align sections for DEBUG_RODATAMark Rutland
commit cb083816ab5ac3d10a9417527f07fc5962cc3808 upstream. A kernel built with DEBUG_RO_DATA && !CONFIG_DEBUG_ALIGN_RODATA doesn't have .text aligned to a page boundary, though fixup_executable works at page-granularity thanks to its use of create_mapping. If .text is not page-aligned, the first page it exists in may be marked non-executable, leading to failures when an attempt is made to execute code in said page. This patch upgrades ALIGN_DEBUG_RO and ALIGN_DEBUG_RO_MIN to force page alignment for DEBUG_RO_DATA && !CONFIG_DEBUG_ALIGN_RODATA kernels, ensuring that all sections with specific RWX permission requirements are mapped with the correct permissions. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Jeremy Linton <jeremy.linton@arm.com> Reviewed-by: Laura Abbott <laura@labbott.name> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Suzuki Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will.deacon@arm.com> Fixes: da141706aea52c1a ("arm64: add better page protections to arm64") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09arm64: Fix compat register mappingsRobin Murphy
commit 5accd17d0eb523350c9ef754d655e379c9bb93b3 upstream. For reasons not entirely apparent, but now enshrined in history, the architectural mapping of AArch32 banked registers to AArch64 registers actually orders SP_<mode> and LR_<mode> backwards compared to the intuitive r13/r14 order, for all modes except FIQ. Fix the compat_<reg>_<mode> macros accordingly, in the hope of avoiding subtle bugs with KVM and AArch32 guests. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09x86/mpx: Fix 32-bit address space calculationDave Hansen
commit f3119b830264d89d216bfb378ab65065dffa02d9 upstream. I received a bug report that running 32-bit MPX binaries on 64-bit kernels was broken. I traced it down to this little code snippet. We were switching our "number of bounds directory entries" calculation correctly. But, we didn't switch the other side of the calculation: the virtual space size. This meant that we were calculating an absurd size for bd_entry_virt_space() on 32-bit because we used the 64-bit virt_space. This was _also_ broken for 32-bit kernels running on 64-bit hardware since boot_cpu_data.x86_virt_bits=48 even when running in 32-bit mode. Correct that and properly handle all 3 possible cases: 1. 32-bit binary on 64-bit kernel 2. 64-bit binary on 64-bit kernel 3. 32-bit binary on 32-bit kernel This manifested in having bounds tables not properly unmapped. It "leaked" memory but had no functional impact otherwise. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20151111181934.FA7FAC34@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09x86/mpx: Do proper get_user() when running 32-bit binaries on 64-bit kernelsDave Hansen
commit 46561c3959d6307d22139c24cd0bf196162e5681 upstream. When you call get_user(foo, bar), you effectively do a copy_from_user(&foo, bar, sizeof(*bar)); Note that the sizeof() is implicit. When we reach out to userspace to try to zap an entire "bounds table" we need to go read a "bounds directory entry" in order to locate the table's address. The size of a "directory entry" depends on the binary being run and is always the size of a pointer. But, when we have a 64-bit kernel and a 32-bit application, the directory entry is still only 32-bits long, but we fetch it with a 64-bit pointer which makes get_user() does a 64-bit fetch. Reading 4 extra bytes isn't harmful, unless we are at the end of and run off the table. It might also cause the zero page to get faulted in unnecessarily even if you are not at the end. Fix it up by doing a special 32-bit get_user() via a cast when we have 32-bit userspace. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20151111181931.3ACF6822@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>