summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2017-12-22KVM: PPC: Book3S HV: Fix pending_pri value in kvmppc_xive_get_icp()Laurent Vivier
When we migrate a VM from a POWER8 host (XICS) to a POWER9 host (XICS-on-XIVE), we have an error: qemu-kvm: Unable to restore KVM interrupt controller state \ (0xff000000) for CPU 0: Invalid argument This is because kvmppc_xics_set_icp() checks the new state is internaly consistent, and especially: ... 1129 if (xisr == 0) { 1130 if (pending_pri != 0xff) 1131 return -EINVAL; ... On the other side, kvmppc_xive_get_icp() doesn't set neither the pending_pri value, nor the xisr value (set to 0) (and kvmppc_xive_set_icp() ignores the pending_pri value) As xisr is 0, pending_pri must be set to 0xff. Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Laurent Vivier <lvivier@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-22KVM: PPC: Book3S: fix XIVE migration of pending interruptsCédric Le Goater
When restoring a pending interrupt, we are setting the Q bit to force a retrigger in xive_finish_unmask(). But we also need to force an EOI in this case to reach the same initial state : P=1, Q=0. This can be done by not setting 'old_p' for pending interrupts which will inform xive_finish_unmask() that an EOI needs to be sent. Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.12+ Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Tested-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-19powerpc/kernel: Print actual address of regs when oopsingMichael Ellerman
When we oops or otherwise call show_regs() we print the address of the regs structure. Being able to see the address is fairly useful, firstly to verify that the regs pointer is not completely bogus, and secondly it allows you to dump the regs and surrounding memory with a debugger if you have one. In the normal case the regs will be located somewhere on the stack, so printing their location discloses no further information than printing the stack pointer does already. So switch to %px and print the actual address, not the hashed value. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-15bpf, ppc64: do not reload skb pointers in non-skb contextDaniel Borkmann
The assumption of unconditionally reloading skb pointers on BPF helper calls where bpf_helper_changes_pkt_data() holds true is wrong. There can be different contexts where the helper would enforce a reload such as in case of XDP. Here, we do have a struct xdp_buff instead of struct sk_buff as context, thus this will access garbage. JITs only ever need to deal with cached skb pointer reload when ld_abs/ind was seen, therefore guard the reload behind SEEN_SKB. Fixes: 156d0e290e96 ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Sandipan Das <sandipan@linux.vnet.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2017-12-13powerpc/perf: Fix kfree memory allocated for nest pmusAnju T Sudhakar
imc_common_cpuhp_mem_free() is the common function for all IMC (In-memory Collection counters) domains to unregister cpuhotplug callback and free memory. Since kfree of memory allocated for nest-imc (per_nest_pmu_arr) is in the common code, all domains (core/nest/thread) can do the kfree in the failure case. This could potentially create a call trace as shown below, where core(/thread/nest) imc pmu initialization fails and in the failure path imc_common_cpuhp_mem_free() free the memory(per_nest_pmu_arr), which is allocated by successfully registered nest units. The call trace is generated in a scenario where core-imc initialization is made to fail and a cpuhotplug is performed in a p9 system. During cpuhotplug ppc_nest_imc_cpu_offline() tries to access per_nest_pmu_arr, which is already freed by core-imc. NIP [c000000000cb6a94] mutex_lock+0x34/0x90 LR [c000000000cb6a88] mutex_lock+0x28/0x90 Call Trace: mutex_lock+0x28/0x90 (unreliable) perf_pmu_migrate_context+0x90/0x3a0 ppc_nest_imc_cpu_offline+0x190/0x1f0 cpuhp_invoke_callback+0x160/0x820 cpuhp_thread_fun+0x1bc/0x270 smpboot_thread_fn+0x250/0x290 kthread+0x1a8/0x1b0 ret_from_kernel_thread+0x5c/0x74 To address this scenario do the kfree(per_nest_pmu_arr) only in case of nest-imc initialization failure, and when there is no other nest units registered. Fixes: 73ce9aec65b1 ("powerpc/perf: Fix IMC_MAX_PMU macro") Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-13powerpc/perf/imc: Fix nest-imc cpuhotplug callback failureAnju T Sudhakar
Oops is observed during boot: Faulting instruction address: 0xc000000000248340 cpu 0x0: Vector: 380 (Data Access Out of Range) at [c000000ff66fb850] pc: c000000000248340: event_function_call+0x50/0x1f0 lr: c00000000024878c: perf_remove_from_context+0x3c/0x100 sp: c000000ff66fbad0 msr: 9000000000009033 dar: 7d20e2a6f92d03c0 pid = 14, comm = cpuhp/0 While registering the cpuhotplug callbacks for nest-imc, if we fail in the cpuhotplug online path for any random node in a multi node system (because the opal call to stop nest-imc counters fails for that node), ppc_nest_imc_cpu_offline() will get invoked for other nodes who successfully returned from cpuhotplug online path. This call trace is generated since in the ppc_nest_imc_cpu_offline() path we are trying to migrate the event context, when nest-imc counters are not even initialized. Patch to add a check to ensure that nest-imc is registered before migrating the event context. Fixes: 885dcd709ba9 ("powerpc/perf: Add nest IMC PMU support") Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-13powerpc/perf: Dereference BHRB entries safelyRavi Bangoria
It's theoretically possible that branch instructions recorded in BHRB (Branch History Rolling Buffer) entries have already been unmapped before they are processed by the kernel. Hence, trying to dereference such memory location will result in a crash. eg: Unable to handle kernel paging request for data at address 0xd000000019c41764 Faulting instruction address: 0xc000000000084a14 NIP [c000000000084a14] branch_target+0x4/0x70 LR [c0000000000eb828] record_and_restart+0x568/0x5c0 Call Trace: [c0000000000eb3b4] record_and_restart+0xf4/0x5c0 (unreliable) [c0000000000ec378] perf_event_interrupt+0x298/0x460 [c000000000027964] performance_monitor_exception+0x54/0x70 [c000000000009ba4] performance_monitor_common+0x114/0x120 Fix it by deferefencing the addresses safely. Fixes: 691231846ceb ("powerpc/perf: Fix setting of "to" addresses for BHRB") Cc: stable@vger.kernel.org # v3.10+ Suggested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> [mpe: Use probe_kernel_read() which is clearer, tweak change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) CAN fixes from Martin Kelly (cancel URBs properly in all the CAN usb drivers). 2) Revert returning -EEXIST from __dev_alloc_name() as this propagates to userspace and broke some apps. From Johannes Berg. 3) Fix conn memory leaks and crashes in TIPC, from Jon Malloc and Cong Wang. 4) Gianfar MAC can't do EEE so don't advertise it by default, from Claudiu Manoil. 5) Relax strict netlink attribute validation, but emit a warning. From David Ahern. 6) Fix regression in checksum offload of thunderx driver, from Florian Westphal. 7) Fix UAPI bpf issues on s390, from Hendrik Brueckner. 8) New card support in iwlwifi, from Ihab Zhaika. 9) BBR congestion control bug fixes from Neal Cardwell. 10) Fix port stats in nfp driver, from Pieter Jansen van Vuuren. 11) Fix leaks in qualcomm rmnet, from Subash Abhinov Kasiviswanathan. 12) Fix DMA API handling in sh_eth driver, from Thomas Petazzoni. 13) Fix spurious netpoll warnings in bnxt_en, from Calvin Owens. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits) net: mvpp2: fix the RSS table entry offset tcp: evaluate packet losses upon RTT change tcp: fix off-by-one bug in RACK tcp: always evaluate losses in RACK upon undo tcp: correctly test congestion state in RACK bnxt_en: Fix sources of spurious netpoll warnings tcp_bbr: reset long-term bandwidth sampling on loss recovery undo tcp_bbr: reset full pipe detection on loss recovery undo tcp_bbr: record "full bw reached" decision in new full_bw_reached bit sfc: pass valid pointers from efx_enqueue_unwind gianfar: Disable EEE autoneg by default tcp: invalidate rate samples during SACK reneging can: peak/pcie_fd: fix potential bug in restarting tx queue can: usb_8dev: cancel urb on -EPIPE and -EPROTO can: kvaser_usb: cancel urb on -EPIPE and -EPROTO can: esd_usb2: cancel urb on -EPIPE and -EPROTO can: ems_usb: cancel urb on -EPIPE and -EPROTO can: mcba_usb: cancel urb on -EPROTO usbnet: fix alignment for frames with no ethernet header tcp: use current time in tcp_rcv_space_adjust() ...
2017-12-07powerpc/xmon: Don't print hashed pointers in xmonMichael Ellerman
Since commit ad67b74d2469 ("printk: hash addresses printed with %p") pointers printed with %p are hashed, ie. you don't see the actual pointer value but rather a cryptographic hash of its value. In xmon we want to see the actual pointer values, because xmon is a debugger, so replace %p with %px which prints the actual pointer value. We justify doing this in xmon because 1) xmon is a kernel crash debugger, it's only accessible via the console 2) xmon doesn't print to dmesg, so the pointers it prints are not able to be leaked that way. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-06powerpc/64s: Initialize ISAv3 MMU registers before setting partition tableNicholas Piggin
kexec can leave MMU registers set when booting into a new kernel, the PIDR (Process Identification Register) in particular. The boot sequence does not zero PIDR, so it only gets set when CPUs first switch to a userspace processes (until then it's running a kernel thread with effective PID = 0). This leaves a window where a process table entry and page tables are set up due to user processes running on other CPUs, that happen to match with a stale PID. The CPU with that PID may cause speculative accesses that address quadrant 0 (aka userspace addresses), which will result in cached translations and PWC (Page Walk Cache) for that process, on a CPU which is not in the mm_cpumask and so they will not be invalidated properly. The most common result is the kernel hanging in infinite page fault loops soon after kexec (usually in schedule_tail, which is usually the first non-speculative quadrant 0 access to a new PID) due to a stale PWC. However being a stale translation error, it could result in anything up to security and data corruption problems. Fix this by zeroing out PIDR at boot and kexec. Fixes: 7e381c0ff618 ("powerpc/mm/radix: Add mmu context handling callback for radix") Cc: stable@vger.kernel.org # v4.7+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-06KVM: PPC: Book3S HV: Fix use after free in case of multiple resize requestsSerhii Popovych
When serving multiple resize requests following could happen: CPU0 CPU1 ---- ---- kvm_vm_ioctl_resize_hpt_prepare(1); -> schedule_work() /* system_rq might be busy: delay */ kvm_vm_ioctl_resize_hpt_prepare(2); mutex_lock(); if (resize) { ... release_hpt_resize(); } ... resize_hpt_prepare_work() -> schedule_work() { mutex_unlock() /* resize->kvm could be wrong */ struct kvm *kvm = resize->kvm; mutex_lock(&kvm->lock); <<<< UAF ... } i.e. a second resize request with different order could be started by kvm_vm_ioctl_resize_hpt_prepare(), causing the previous request to be free()d when there's still an active worker thread which will try to access it. This leads to a use after free in point marked with UAF on the diagram above. To prevent this from happening, instead of unconditionally releasing a pre-existing resize structure from the prepare ioctl(), we check if the existing structure has an in-progress worker. We do that by checking if the resize->error == -EBUSY, which is safe because the resize->error field is protected by the kvm->lock. If there is an active worker, instead of releasing, we mark the structure as stale by unlinking it from kvm_struct. In the worker thread we check for a stale structure (with kvm->lock held), and in that case abort, releasing the stale structure ourself. We make the check both before and the actual allocation. Strictly, only the check afterwards is needed, the check before is an optimization: if the structure happens to become stale before the worker thread is dispatched, rather than during the allocation, it means we can avoid allocating then immediately freeing a potentially substantial amount of memory. This fixes following or similar host kernel crash message: [ 635.277361] Unable to handle kernel paging request for data at address 0x00000000 [ 635.277438] Faulting instruction address: 0xc00000000052f568 [ 635.277446] Oops: Kernel access of bad area, sig: 11 [#1] [ 635.277451] SMP NR_CPUS=2048 NUMA PowerNV [ 635.277470] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter nfsv3 nfs_acl nfs lockd grace fscache kvm_hv kvm rpcrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ext4 ib_srp scsi_transport_srp ib_ipoib mbcache jbd2 rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ocrdma(T) ib_core ses enclosure scsi_transport_sas sg shpchp leds_powernv ibmpowernv i2c_opal i2c_core powernv_rng ipmi_powernv ipmi_devintf ipmi_msghandler ip_tables xfs libcrc32c sr_mod sd_mod cdrom lpfc nvme_fc(T) nvme_fabrics nvme_core ipr nvmet_fc(T) tg3 nvmet libata be2net crc_t10dif crct10dif_generic scsi_transport_fc ptp scsi_tgt pps_core crct10dif_common dm_mirror dm_region_hash dm_log dm_mod [ 635.278687] CPU: 40 PID: 749 Comm: kworker/40:1 Tainted: G ------------ T 3.10.0.bz1510771+ #1 [ 635.278782] Workqueue: events resize_hpt_prepare_work [kvm_hv] [ 635.278851] task: c0000007e6840000 ti: c0000007e9180000 task.ti: c0000007e9180000 [ 635.278919] NIP: c00000000052f568 LR: c0000000009ea310 CTR: c0000000009ea4f0 [ 635.278988] REGS: c0000007e91837f0 TRAP: 0300 Tainted: G ------------ T (3.10.0.bz1510771+) [ 635.279077] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24002022 XER: 00000000 [ 635.279248] CFAR: c000000000009368 DAR: 0000000000000000 DSISR: 40000000 SOFTE: 1 GPR00: c0000000009ea310 c0000007e9183a70 c000000001250b00 c0000007e9183b10 GPR04: 0000000000000000 0000000000000000 c0000007e9183650 0000000000000000 GPR08: c0000007ffff7b80 00000000ffffffff 0000000080000028 d00000000d2529a0 GPR12: 0000000000002200 c000000007b56800 c000000000120028 c0000007f135bb40 GPR16: 0000000000000000 c000000005c1e018 c000000005c1e018 0000000000000000 GPR20: 0000000000000001 c0000000011bf778 0000000000000001 fffffffffffffef7 GPR24: 0000000000000000 c000000f1e262e50 0000000000000002 c0000007e9180000 GPR28: c000000f1e262e4c c000000f1e262e50 0000000000000000 c0000007e9183b10 [ 635.280149] NIP [c00000000052f568] __list_add+0x38/0x110 [ 635.280197] LR [c0000000009ea310] __mutex_lock_slowpath+0xe0/0x2c0 [ 635.280253] Call Trace: [ 635.280277] [c0000007e9183af0] [c0000000009ea310] __mutex_lock_slowpath+0xe0/0x2c0 [ 635.280356] [c0000007e9183b70] [c0000000009ea554] mutex_lock+0x64/0x70 [ 635.280426] [c0000007e9183ba0] [d00000000d24da04] resize_hpt_prepare_work+0xe4/0x1c0 [kvm_hv] [ 635.280507] [c0000007e9183c40] [c000000000113c0c] process_one_work+0x1dc/0x680 [ 635.280587] [c0000007e9183ce0] [c000000000114250] worker_thread+0x1a0/0x520 [ 635.280655] [c0000007e9183d80] [c00000000012010c] kthread+0xec/0x100 [ 635.280724] [c0000007e9183e30] [c00000000000a4b8] ret_from_kernel_thread+0x5c/0xa4 [ 635.280814] Instruction dump: [ 635.280880] 7c0802a6 fba1ffe8 fbc1fff0 7cbd2b78 fbe1fff8 7c9e2378 7c7f1b78 f8010010 [ 635.281099] f821ff81 e8a50008 7fa52040 40de00b8 <e8be0000> 7fbd2840 40de008c 7fbff040 [ 635.281324] ---[ end trace b628b73449719b9d ]--- Cc: stable@vger.kernel.org # v4.10+ Fixes: b5baa6877315 ("KVM: PPC: Book3S HV: KVM-HV HPT resizing implementation") Signed-off-by: Serhii Popovych <spopovyc@redhat.com> [dwg: Replaced BUG_ON()s with WARN_ONs() and reworded commit message for clarity] Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-12-06KVM: PPC: Book3S HV: Drop prepare_done from struct kvm_resize_hptSerhii Popovych
Currently the kvm_resize_hpt structure has two fields relevant to the state of an ongoing resize: 'prepare_done', which indicates whether the worker thread has completed or not, and 'error' which indicates whether it was successful or not. Since the success/failure isn't known until completion, this is confusingly redundant. This patch consolidates the information into just the 'error' value: -EBUSY indicates the worked is still in progress, other negative values indicate (completed) failure, 0 indicates successful completion. As a bonus this reduces size of struct kvm_resize_hpt by __alignof__(struct kvm_hpt_info) and saves few bytes of code. While there correct comment in struct kvm_resize_hpt which references a non-existent semaphore (leftover from an early draft). Assert with WARN_ON() in case of HPT allocation thread work runs more than once for resize request or resize_hpt_allocate() returns -EBUSY that is treated specially. Change comparison against zero to make checkpatch.pl happy. Cc: stable@vger.kernel.org # v4.10+ Signed-off-by: Serhii Popovych <spopovyc@redhat.com> [dwg: Changed BUG_ON()s to WARN_ON()s and altered commit message for clarity] Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-12-05bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program typeHendrik Brueckner
Commit 0515e5999a466dfe ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type") introduced the bpf_perf_event_data structure which exports the pt_regs structure. This is OK for multiple architectures but fail for s390 and arm64 which do not export pt_regs. Programs using them, for example, the bpf selftest fail to compile on these architectures. For s390, exporting the pt_regs is not an option because s390 wants to allow changes to it. For arm64, there is a user_pt_regs structure that covers parts of the pt_regs structure for use by user space. To solve the broken uapi for s390 and arm64, introduce an abstract type for pt_regs and add an asm/bpf_perf_event.h file that concretes the type. An asm-generic header file covers the architectures that export pt_regs today. The arch-specific enablement for s390 and arm64 follows in separate commits. Reported-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Fixes: 0515e5999a466dfe ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type") Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Reviewed-and-tested-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-05Revert "powerpc: Do not call ppc_md.panic in fadump panic notifier"David Gibson
This reverts commit a3b2cb30f252b21a6f962e0dd107c8b897ca65e4. That commit tried to fix problems with panic on powerpc in certain circumstances, where some output from the generic panic code was being dropped. Unfortunately, it breaks things worse in other circumstances. In particular when running a PAPR guest, it will now attempt to reboot instead of informing the hypervisor (KVM or PowerVM) that the guest has crashed. The crash notification is important to some virtualization management layers. Revert it for now until we can come up with a better solution. Fixes: a3b2cb30f252 ("powerpc: Do not call ppc_md.panic in fadump panic notifier") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [mpe: Tweak change log a bit] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-04powerpc/perf: Fix oops when grouping different pmu eventsRavi Bangoria
When user tries to group imc (In-Memory Collections) event with normal event, (sometime) kernel crashes with following log: Faulting instruction address: 0x00000000 [link register ] c00000000010ce88 power_check_constraints+0x128/0x980 ... c00000000010e238 power_pmu_event_init+0x268/0x6f0 c0000000002dc60c perf_try_init_event+0xdc/0x1a0 c0000000002dce88 perf_event_alloc+0x7b8/0xac0 c0000000002e92e0 SyS_perf_event_open+0x530/0xda0 c00000000000b004 system_call+0x38/0xe0 'event_base' field of 'struct hw_perf_event' is used as flags for normal hw events and used as memory address for imc events. While grouping these two types of events, collect_events() tries to interpret imc 'event_base' as a flag, which causes a corruption resulting in a crash. Consider only those events which belongs to 'perf_hw_context' in collect_events(). Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Reviewed-By: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-01Merge tag 'powerpc-4.15-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Two fixes for nasty kexec/kdump crashes in certain configurations. A couple of minor fixes for the new TIDR code. A fix for an oops in a CXL error handling path. Thanks to: Andrew Donnellan, Christophe Lombard, David Gibson, Mahesh Salgaonkar, Vaibhav Jain" * tag 'powerpc-4.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc: Do not assign thread.tidr if already assigned powerpc: Avoid signed to unsigned conversion in set_thread_tidr() powerpc/kexec: Fix kexec/kdump in P9 guest kernels powerpc/powernv: Fix kexec crashes caused by tlbie tracing cxl: Check if vphb exists before iterating over AFU devices
2017-11-30Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: - x86 bugfixes: APIC, nested virtualization, IOAPIC - PPC bugfix: HPT guests on a POWER9 radix host * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (26 commits) KVM: Let KVM_SET_SIGNAL_MASK work as advertised KVM: VMX: Fix vmx->nested freeing when no SMI handler KVM: VMX: Fix rflags cache during vCPU reset KVM: X86: Fix softlockup when get the current kvmclock KVM: lapic: Fixup LDR on load in x2apic KVM: lapic: Split out x2apic ldr calculation KVM: PPC: Book3S HV: Fix migration and HPT resizing of HPT guests on radix hosts KVM: vmx: use X86_CR4_UMIP and X86_FEATURE_UMIP KVM: x86: Fix CPUID function for word 6 (80000001_ECX) KVM: nVMX: Fix vmx_check_nested_events() return value in case an event was reinjected to L2 KVM: x86: ioapic: Preserve read-only values in the redirection table KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered KVM: x86: ioapic: Remove redundant check for Remote IRR in ioapic_set_irq KVM: x86: ioapic: Don't fire level irq when Remote IRR set KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race KVM: x86: inject exceptions produced by x86_decode_insn KVM: x86: Allow suppressing prints on RDMSR/WRMSR of unhandled MSRs KVM: x86: fix em_fxstor() sleeping while in atomic KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure KVM: nVMX: Validate the IA32_BNDCFGS on nested VM-entry ...
2017-11-29mm: switch to 'define pmd_write' instead of __HAVE_ARCH_PMD_WRITEDan Williams
In response to compile breakage introduced by a series that added the pud_write helper to x86, Stephen notes: did you consider using the other paradigm: In arch include files: #define pud_write pud_write static inline int pud_write(pud_t pud) ..... Then in include/asm-generic/pgtable.h: #ifndef pud_write tatic inline int pud_write(pud_t pud) { .... } #endif If you had, then the powerpc code would have worked ... ;-) and many of the other interfaces in include/asm-generic/pgtable.h are protected that way ... Given that some architecture already define pmd_write() as a macro, it's a net reduction to drop the definition of __HAVE_ARCH_PMD_WRITE. Link: http://lkml.kernel.org/r/151129126721.37405.13339850900081557813.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Oliver OHalloran <oliveroh@au1.ibm.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29powerpc: Do not assign thread.tidr if already assignedVaibhav Jain
If set_thread_tidr() is called twice for same task_struct then it will allocate a new tidr value to it leaving the previous value still dangling in the vas_thread_ida table. To fix this the patch changes set_thread_tidr() to check if a tidr value is already assigned to the task_struct and if yes then returns zero. Fixes: ec233ede4c86("powerpc: Add support for setting SPRN_TIDR") Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> [mpe: Modify to return 0 in the success case, not the TID value] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-29powerpc: Avoid signed to unsigned conversion in set_thread_tidr()Vaibhav Jain
There is an unsafe signed to unsigned conversion in set_thread_tidr() that may cause an error value to be assigned to SPRN_TIDR register and used as thread-id. The issue happens as assign_thread_tidr() returns an int and thread.tidr is an unsigned-long. So a negative error code returned from assign_thread_tidr() will fail the error check and gets assigned as tidr as a large positive value. To fix this the patch assigns the return value of assign_thread_tidr() to a temporary int and assigns it to thread.tidr iff its '> 0'. The patch shouldn't impact the calling convention of set_thread_tidr() i.e all -ve return-values are error codes and a return value of '0' indicates success. Fixes: ec233ede4c86("powerpc: Add support for setting SPRN_TIDR") Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Reviewed-by: Christophe Lombard clombard@linux.vnet.ibm.com Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-27Merge tag 'kvm-ppc-fixes-4.15-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-master PPC KVM fixes for 4.15 One commit here, that fixes a couple of bugs relating to the patch series that enables HPT guests to run on a radix host on POWER9 systems. This patch series went upstream in the 4.15 merge window, so no stable backport is required.
2017-11-27KVM: Let KVM_SET_SIGNAL_MASK work as advertisedJan H. Schönherr
KVM API says for the signal mask you set via KVM_SET_SIGNAL_MASK, that "any unblocked signal received [...] will cause KVM_RUN to return with -EINTR" and that "the signal will only be delivered if not blocked by the original signal mask". This, however, is only true, when the calling task has a signal handler registered for a signal. If not, signal evaluation is short-circuited for SIG_IGN and SIG_DFL, and the signal is either ignored without KVM_RUN returning or the whole process is terminated. Make KVM_SET_SIGNAL_MASK behave as advertised by utilizing logic similar to that in do_sigtimedwait() to avoid short-circuiting of signals. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-11-25Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: - The final conversion of timer wheel timers to timer_setup(). A few manual conversions and a large coccinelle assisted sweep and the removal of the old initialization mechanisms and the related code. - Remove the now unused VSYSCALL update code - Fix permissions of /proc/timer_list. I still need to get rid of that file completely - Rename a misnomed clocksource function and remove a stale declaration * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits) m68k/macboing: Fix missed timer callback assignment treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts timer: Remove redundant __setup_timer*() macros timer: Pass function down to initialization routines timer: Remove unused data arguments from macros timer: Switch callback prototype to take struct timer_list * argument timer: Pass timer_list pointer to callbacks unconditionally Coccinelle: Remove setup_timer.cocci timer: Remove setup_*timer() interface timer: Remove init_timer() interface treewide: setup_timer() -> timer_setup() (2 field) treewide: setup_timer() -> timer_setup() treewide: init_timer() -> setup_timer() treewide: Switch DEFINE_TIMER callbacks to struct timer_list * s390: cmm: Convert timers to use timer_setup() lightnvm: Convert timers to use timer_setup() drivers/net: cris: Convert timers to use timer_setup() drm/vc4: Convert timers to use timer_setup() block/laptop_mode: Convert timers to use timer_setup() net/atm/mpc: Avoid open-coded assignment of timer callback function ...
2017-11-24Merge tag 'powerpc-4.15-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "A small batch of fixes, about 50% tagged for stable and the rest for recently merged code. There's one more fix for the >128T handling on hash. Once a process had requested a single mmap above 128T we would then always search above 128T. The correct behaviour is to consider the hint address in isolation for each mmap request. Then a couple of fixes for the IMC PMU, a missing EXPORT_SYMBOL in VAS, a fix for STRICT_KERNEL_RWX on 32-bit, and a fix to correctly identify P9 DD2.1 but in code that is currently not used by default. Thanks to: Aneesh Kumar K.V, Christophe Leroy, Madhavan Srinivasan, Sukadev Bhattiprolu" * tag 'powerpc-4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Fix Power9 DD2.1 logic in DT CPU features powerpc/perf: Fix IMC_MAX_PMU macro powerpc/perf: Fix pmu_count to count only nest imc pmus powerpc: Fix boot on BOOK3S_32 with CONFIG_STRICT_KERNEL_RWX powerpc/perf/imc: Use cpu_to_node() not topology_physical_package_id() powerpc/vas: Export chip_to_vas_id() powerpc/64s/slice: Use addr limit when computing slice mask
2017-11-24powerpc/kexec: Fix kexec/kdump in P9 guest kernelsMichael Ellerman
The code that cleans up the IAMR/AMOR before kexec'ing failed to remember that when we're running as a guest AMOR is not writable, it's hypervisor privileged. They symptom is that the kexec stops before entering purgatory and nothing else is seen on the console. If you examine the state of the system all threads will be in the 0x700 program check handler. Fix it by making the write to AMOR dependent on HV mode. Fixes: 1e2a516e89fc ("powerpc/kexec: Fix radix to hash kexec due to IAMR/AMOR") Cc: stable@vger.kernel.org # v4.10+ Reported-by: Yilin Zhang <yilzhang@redhat.com> Debugged-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-23powerpc/powernv: Fix kexec crashes caused by tlbie tracingMahesh Salgaonkar
Rebooting into a new kernel with kexec fails in trace_tlbie() which is called from native_hpte_clear(). This happens if the running kernel has CONFIG_LOCKDEP enabled. With lockdep enabled, the tracepoints always execute few RCU checks regardless of whether tracing is on or off. We are already in the last phase of kexec sequence in real mode with HILE_BE set. At this point the RCU check ends up in RCU_LOCKDEP_WARN and causes kexec to fail. Fix this by not calling trace_tlbie() from native_hpte_clear(). mpe: It's not safe to call trace points at this point in the kexec path, even if we could avoid the RCU checks/warnings. The only solution is to not call them. Fixes: 0428491cba92 ("powerpc/mm: Trace tlbie(l) instructions") Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-23KVM: PPC: Book3S HV: Fix migration and HPT resizing of HPT guests on radix hostsPaul Mackerras
This fixes two errors that prevent a guest using the HPT MMU from successfully migrating to a POWER9 host in radix MMU mode, or resizing its HPT when running on a radix host. The first bug was that commit 8dc6cca556e4 ("KVM: PPC: Book3S HV: Don't rely on host's page size information", 2017-09-11) missed two uses of hpte_base_page_size(), one in the HPT rehashing code and one in kvm_htab_write() (which is used on the destination side in migrating a HPT guest). Instead we use kvmppc_hpte_base_page_shift(). Having the shift count means that we can use left and right shifts instead of multiplication and division in a few places. Along the way, this adds a check in kvm_htab_write() to ensure that the page size encoding in the incoming HPTEs is recognized, and if not return an EINVAL error to userspace. The second bug was that kvm_htab_write was performing some but not all of the functions of kvmhv_setup_mmu(), resulting in the destination VM being left in radix mode as far as the hardware is concerned. The simplest fix for now is make kvm_htab_write() call kvmppc_setup_partition_table() like kvmppc_hv_setup_htab_rma() does. In future it would be better to refactor the code more extensively to remove the duplication. Fixes: 8dc6cca556e4 ("KVM: PPC: Book3S HV: Don't rely on host's page size information") Fixes: 7a84084c6054 ("KVM: PPC: Book3S HV: Set partition table rather than SDR1 on POWER9") Reported-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Tested-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-11-22powerpc/64s: Fix Power9 DD2.1 logic in DT CPU featuresMichael Ellerman
I got the logic wrong in the DT CPU features code when I added the Power9 DD2.1 feature. We should be setting the bit if we detect a DD2.1, not clearing it if we detect a DD2.0. This code isn't actually exercised at the moment so nothing is actually broken. Fixes: 3ffa9d9e2a7c ("powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-22powerpc/perf: Fix IMC_MAX_PMU macroMadhavan Srinivasan
IMC_MAX_PMU is used for static storage (per_nest_pmu_arr) which holds nest pmu information. Current value for the macro is 32 based on the initial number of nest pmu units supported by the nest microcode. But going forward, microcode could support more nest units. Instead of static storage, patch to fix the code to dynamically allocate an array based on the number of nest imc units found in the device tree. Fixes:8f95faaac56c1 ('powerpc/powernv: Detect and create IMC device') Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-22powerpc/perf: Fix pmu_count to count only nest imc pmusMadhavan Srinivasan
"pmu_count" in opal_imc_counters_probe() is intended to hold the number of successful nest imc pmu registerations. But current code also counts other imc units like core_imc and thread_imc. Patch add a check to count only nest imc pmus. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-22powerpc: Fix boot on BOOK3S_32 with CONFIG_STRICT_KERNEL_RWXChristophe Leroy
On powerpc32, patch_instruction() is called by apply_feature_fixups() which is called from early_init() There is the following note in front of early_init(): * Note that the kernel may be running at an address which is different * from the address that it was linked at, so we must use RELOC/PTRRELOC * to access static data (including strings). -- paulus Therefore, slab_is_available() cannot be called yet, and text_poke_area must be addressed with PTRRELOC() Fixes: 95902e6c8864 ("powerpc/mm: Implement STRICT_KERNEL_RWX on PPC32") Cc: stable@vger.kernel.org # v4.14+ Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-22powerpc/perf/imc: Use cpu_to_node() not topology_physical_package_id()Michael Ellerman
init_imc_pmu() uses topology_physical_package_id() to detect the node id of the processor it is on to get local memory, but that's wrong, and can lead to crashes. Fix it to use cpu_to_node(). Fixes: 885dcd709ba9 ("powerpc/perf: Add nest IMC PMU support") Cc: stable@vger.kernel.org # v4.14+ Reported-By: Rob Lippert <rlippert@google.com> Tested-By: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-21treewide: setup_timer() -> timer_setup() (2 field)Kees Cook
This converts all remaining setup_timer() calls that use a nested field to reach a struct timer_list. Coccinelle does not have an easy way to match multiple fields, so a new script is needed to change the matches of "&_E->_timer" into "&_E->_field1._timer" in all the rules. spatch --very-quiet --all-includes --include-headers \ -I ./arch/x86/include -I ./arch/x86/include/generated \ -I ./include -I ./arch/x86/include/uapi \ -I ./arch/x86/include/generated/uapi -I ./include/uapi \ -I ./include/generated/uapi --include ./include/linux/kconfig.h \ --dir . \ --cocci-file ~/src/data/timer_setup-2fields.cocci @fix_address_of depends@ expression e; @@ setup_timer( -&(e) +&e , ...) // Update any raw setup_timer() usages that have a NULL callback, but // would otherwise match change_timer_function_usage, since the latter // will update all function assignments done in the face of a NULL // function initialization in setup_timer(). @change_timer_function_usage_NULL@ expression _E; identifier _field1; identifier _timer; type _cast_data; @@ ( -setup_timer(&_E->_field1._timer, NULL, _E); +timer_setup(&_E->_field1._timer, NULL, 0); | -setup_timer(&_E->_field1._timer, NULL, (_cast_data)_E); +timer_setup(&_E->_field1._timer, NULL, 0); | -setup_timer(&_E._field1._timer, NULL, &_E); +timer_setup(&_E._field1._timer, NULL, 0); | -setup_timer(&_E._field1._timer, NULL, (_cast_data)&_E); +timer_setup(&_E._field1._timer, NULL, 0); ) @change_timer_function_usage@ expression _E; identifier _field1; identifier _timer; struct timer_list _stl; identifier _callback; type _cast_func, _cast_data; @@ ( -setup_timer(&_E->_field1._timer, _callback, _E); +timer_setup(&_E->_field1._timer, _callback, 0); | -setup_timer(&_E->_field1._timer, &_callback, _E); +timer_setup(&_E->_field1._timer, _callback, 0); | -setup_timer(&_E->_field1._timer, _callback, (_cast_data)_E); +timer_setup(&_E->_field1._timer, _callback, 0); | -setup_timer(&_E->_field1._timer, &_callback, (_cast_data)_E); +timer_setup(&_E->_field1._timer, _callback, 0); | -setup_timer(&_E->_field1._timer, (_cast_func)_callback, _E); +timer_setup(&_E->_field1._timer, _callback, 0); | -setup_timer(&_E->_field1._timer, (_cast_func)&_callback, _E); +timer_setup(&_E->_field1._timer, _callback, 0); | -setup_timer(&_E->_field1._timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E->_field1._timer, _callback, 0); | -setup_timer(&_E->_field1._timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E->_field1._timer, _callback, 0); | -setup_timer(&_E._field1._timer, _callback, (_cast_data)_E); +timer_setup(&_E._field1._timer, _callback, 0); | -setup_timer(&_E._field1._timer, _callback, (_cast_data)&_E); +timer_setup(&_E._field1._timer, _callback, 0); | -setup_timer(&_E._field1._timer, &_callback, (_cast_data)_E); +timer_setup(&_E._field1._timer, _callback, 0); | -setup_timer(&_E._field1._timer, &_callback, (_cast_data)&_E); +timer_setup(&_E._field1._timer, _callback, 0); | -setup_timer(&_E._field1._timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E._field1._timer, _callback, 0); | -setup_timer(&_E._field1._timer, (_cast_func)_callback, (_cast_data)&_E); +timer_setup(&_E._field1._timer, _callback, 0); | -setup_timer(&_E._field1._timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E._field1._timer, _callback, 0); | -setup_timer(&_E._field1._timer, (_cast_func)&_callback, (_cast_data)&_E); +timer_setup(&_E._field1._timer, _callback, 0); | _E->_field1._timer@_stl.function = _callback; | _E->_field1._timer@_stl.function = &_callback; | _E->_field1._timer@_stl.function = (_cast_func)_callback; | _E->_field1._timer@_stl.function = (_cast_func)&_callback; | _E._field1._timer@_stl.function = _callback; | _E._field1._timer@_stl.function = &_callback; | _E._field1._timer@_stl.function = (_cast_func)_callback; | _E._field1._timer@_stl.function = (_cast_func)&_callback; ) // callback(unsigned long arg) @change_callback_handle_cast depends on change_timer_function_usage@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._field1; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; identifier _handle; @@ void _callback( -_origtype _origarg +struct timer_list *t ) { ( ... when != _origarg _handletype *_handle = -(_handletype *)_origarg; +from_timer(_handle, t, _field1._timer); ... when != _origarg | ... when != _origarg _handletype *_handle = -(void *)_origarg; +from_timer(_handle, t, _field1._timer); ... when != _origarg | ... when != _origarg _handletype *_handle; ... when != _handle _handle = -(_handletype *)_origarg; +from_timer(_handle, t, _field1._timer); ... when != _origarg | ... when != _origarg _handletype *_handle; ... when != _handle _handle = -(void *)_origarg; +from_timer(_handle, t, _field1._timer); ... when != _origarg ) } // callback(unsigned long arg) without existing variable @change_callback_handle_cast_no_arg depends on change_timer_function_usage && !change_callback_handle_cast@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._field1; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; @@ void _callback( -_origtype _origarg +struct timer_list *t ) { + _handletype *_origarg = from_timer(_origarg, t, _field1._timer); + ... when != _origarg - (_handletype *)_origarg + _origarg ... when != _origarg } // Avoid already converted callbacks. @match_callback_converted depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier t; @@ void _callback(struct timer_list *t) { ... } // callback(struct something *handle) @change_callback_handle_arg depends on change_timer_function_usage && !match_callback_converted && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._field1; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; @@ void _callback( -_handletype *_handle +struct timer_list *t ) { + _handletype *_handle = from_timer(_handle, t, _field1._timer); ... } // If change_callback_handle_arg ran on an empty function, remove // the added handler. @unchange_callback_handle_arg depends on change_timer_function_usage && change_callback_handle_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._field1; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; identifier t; @@ void _callback(struct timer_list *t) { - _handletype *_handle = from_timer(_handle, t, _field1._timer); } // We only want to refactor the setup_timer() data argument if we've found // the matching callback. This undoes changes in change_timer_function_usage. @unchange_timer_function_usage depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg && !change_callback_handle_arg@ expression change_timer_function_usage._E; identifier change_timer_function_usage._field1; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type change_timer_function_usage._cast_data; @@ ( -timer_setup(&_E->_field1._timer, _callback, 0); +setup_timer(&_E->_field1._timer, _callback, (_cast_data)_E); | -timer_setup(&_E._field1._timer, _callback, 0); +setup_timer(&_E._field1._timer, _callback, (_cast_data)&_E); ) // If we fixed a callback from a .function assignment, fix the // assignment cast now. @change_timer_function_assignment depends on change_timer_function_usage && (change_callback_handle_cast || change_callback_handle_cast_no_arg || change_callback_handle_arg)@ expression change_timer_function_usage._E; identifier change_timer_function_usage._field1; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_func; typedef TIMER_FUNC_TYPE; @@ ( _E->_field1._timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; | _E->_field1._timer.function = -&_callback +(TIMER_FUNC_TYPE)_callback ; | _E->_field1._timer.function = -(_cast_func)_callback; +(TIMER_FUNC_TYPE)_callback ; | _E->_field1._timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; | _E._field1._timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; | _E._field1._timer.function = -&_callback; +(TIMER_FUNC_TYPE)_callback ; | _E._field1._timer.function = -(_cast_func)_callback +(TIMER_FUNC_TYPE)_callback ; | _E._field1._timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; ) // Sometimes timer functions are called directly. Replace matched args. @change_timer_function_calls depends on change_timer_function_usage && (change_callback_handle_cast || change_callback_handle_cast_no_arg || change_callback_handle_arg)@ expression _E; identifier change_timer_function_usage._field1; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_data; @@ _callback( ( -(_cast_data)_E +&_E->_field1._timer | -(_cast_data)&_E +&_E._field1._timer | -_E +&_E->_field1._timer ) ) // If a timer has been configured without a data argument, it can be // converted without regard to the callback argument, since it is unused. @match_timer_function_unused_data@ expression _E; identifier _field1; identifier _timer; identifier _callback; @@ ( -setup_timer(&_E->_field1._timer, _callback, 0); +timer_setup(&_E->_field1._timer, _callback, 0); | -setup_timer(&_E->_field1._timer, _callback, 0L); +timer_setup(&_E->_field1._timer, _callback, 0); | -setup_timer(&_E->_field1._timer, _callback, 0UL); +timer_setup(&_E->_field1._timer, _callback, 0); | -setup_timer(&_E._field1._timer, _callback, 0); +timer_setup(&_E._field1._timer, _callback, 0); | -setup_timer(&_E._field1._timer, _callback, 0L); +timer_setup(&_E._field1._timer, _callback, 0); | -setup_timer(&_E._field1._timer, _callback, 0UL); +timer_setup(&_E._field1._timer, _callback, 0); | -setup_timer(&_field1._timer, _callback, 0); +timer_setup(&_field1._timer, _callback, 0); | -setup_timer(&_field1._timer, _callback, 0L); +timer_setup(&_field1._timer, _callback, 0); | -setup_timer(&_field1._timer, _callback, 0UL); +timer_setup(&_field1._timer, _callback, 0); | -setup_timer(_field1._timer, _callback, 0); +timer_setup(_field1._timer, _callback, 0); | -setup_timer(_field1._timer, _callback, 0L); +timer_setup(_field1._timer, _callback, 0); | -setup_timer(_field1._timer, _callback, 0UL); +timer_setup(_field1._timer, _callback, 0); ) @change_callback_unused_data depends on match_timer_function_unused_data@ identifier match_timer_function_unused_data._callback; type _origtype; identifier _origarg; @@ void _callback( -_origtype _origarg +struct timer_list *unused ) { ... when != _origarg } Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21treewide: setup_timer() -> timer_setup()Kees Cook
This converts all remaining cases of the old setup_timer() API into using timer_setup(), where the callback argument is the structure already holding the struct timer_list. These should have no behavioral changes, since they just change which pointer is passed into the callback with the same available pointers after conversion. It handles the following examples, in addition to some other variations. Casting from unsigned long: void my_callback(unsigned long data) { struct something *ptr = (struct something *)data; ... } ... setup_timer(&ptr->my_timer, my_callback, ptr); and forced object casts: void my_callback(struct something *ptr) { ... } ... setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr); become: void my_callback(struct timer_list *t) { struct something *ptr = from_timer(ptr, t, my_timer); ... } ... timer_setup(&ptr->my_timer, my_callback, 0); Direct function assignments: void my_callback(unsigned long data) { struct something *ptr = (struct something *)data; ... } ... ptr->my_timer.function = my_callback; have a temporary cast added, along with converting the args: void my_callback(struct timer_list *t) { struct something *ptr = from_timer(ptr, t, my_timer); ... } ... ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback; And finally, callbacks without a data assignment: void my_callback(unsigned long data) { ... } ... setup_timer(&ptr->my_timer, my_callback, 0); have their argument renamed to verify they're unused during conversion: void my_callback(struct timer_list *unused) { ... } ... timer_setup(&ptr->my_timer, my_callback, 0); The conversion is done with the following Coccinelle script: spatch --very-quiet --all-includes --include-headers \ -I ./arch/x86/include -I ./arch/x86/include/generated \ -I ./include -I ./arch/x86/include/uapi \ -I ./arch/x86/include/generated/uapi -I ./include/uapi \ -I ./include/generated/uapi --include ./include/linux/kconfig.h \ --dir . \ --cocci-file ~/src/data/timer_setup.cocci @fix_address_of@ expression e; @@ setup_timer( -&(e) +&e , ...) // Update any raw setup_timer() usages that have a NULL callback, but // would otherwise match change_timer_function_usage, since the latter // will update all function assignments done in the face of a NULL // function initialization in setup_timer(). @change_timer_function_usage_NULL@ expression _E; identifier _timer; type _cast_data; @@ ( -setup_timer(&_E->_timer, NULL, _E); +timer_setup(&_E->_timer, NULL, 0); | -setup_timer(&_E->_timer, NULL, (_cast_data)_E); +timer_setup(&_E->_timer, NULL, 0); | -setup_timer(&_E._timer, NULL, &_E); +timer_setup(&_E._timer, NULL, 0); | -setup_timer(&_E._timer, NULL, (_cast_data)&_E); +timer_setup(&_E._timer, NULL, 0); ) @change_timer_function_usage@ expression _E; identifier _timer; struct timer_list _stl; identifier _callback; type _cast_func, _cast_data; @@ ( -setup_timer(&_E->_timer, _callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, &_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, &_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)&_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E._timer, _callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, &_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, &_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | _E->_timer@_stl.function = _callback; | _E->_timer@_stl.function = &_callback; | _E->_timer@_stl.function = (_cast_func)_callback; | _E->_timer@_stl.function = (_cast_func)&_callback; | _E._timer@_stl.function = _callback; | _E._timer@_stl.function = &_callback; | _E._timer@_stl.function = (_cast_func)_callback; | _E._timer@_stl.function = (_cast_func)&_callback; ) // callback(unsigned long arg) @change_callback_handle_cast depends on change_timer_function_usage@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; identifier _handle; @@ void _callback( -_origtype _origarg +struct timer_list *t ) { ( ... when != _origarg _handletype *_handle = -(_handletype *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle = -(void *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle; ... when != _handle _handle = -(_handletype *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle; ... when != _handle _handle = -(void *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg ) } // callback(unsigned long arg) without existing variable @change_callback_handle_cast_no_arg depends on change_timer_function_usage && !change_callback_handle_cast@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; @@ void _callback( -_origtype _origarg +struct timer_list *t ) { + _handletype *_origarg = from_timer(_origarg, t, _timer); + ... when != _origarg - (_handletype *)_origarg + _origarg ... when != _origarg } // Avoid already converted callbacks. @match_callback_converted depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier t; @@ void _callback(struct timer_list *t) { ... } // callback(struct something *handle) @change_callback_handle_arg depends on change_timer_function_usage && !match_callback_converted && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; @@ void _callback( -_handletype *_handle +struct timer_list *t ) { + _handletype *_handle = from_timer(_handle, t, _timer); ... } // If change_callback_handle_arg ran on an empty function, remove // the added handler. @unchange_callback_handle_arg depends on change_timer_function_usage && change_callback_handle_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; identifier t; @@ void _callback(struct timer_list *t) { - _handletype *_handle = from_timer(_handle, t, _timer); } // We only want to refactor the setup_timer() data argument if we've found // the matching callback. This undoes changes in change_timer_function_usage. @unchange_timer_function_usage depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg && !change_callback_handle_arg@ expression change_timer_function_usage._E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type change_timer_function_usage._cast_data; @@ ( -timer_setup(&_E->_timer, _callback, 0); +setup_timer(&_E->_timer, _callback, (_cast_data)_E); | -timer_setup(&_E._timer, _callback, 0); +setup_timer(&_E._timer, _callback, (_cast_data)&_E); ) // If we fixed a callback from a .function assignment, fix the // assignment cast now. @change_timer_function_assignment depends on change_timer_function_usage && (change_callback_handle_cast || change_callback_handle_cast_no_arg || change_callback_handle_arg)@ expression change_timer_function_usage._E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_func; typedef TIMER_FUNC_TYPE; @@ ( _E->_timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -&_callback +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -(_cast_func)_callback; +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -&_callback; +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -(_cast_func)_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; ) // Sometimes timer functions are called directly. Replace matched args. @change_timer_function_calls depends on change_timer_function_usage && (change_callback_handle_cast || change_callback_handle_cast_no_arg || change_callback_handle_arg)@ expression _E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_data; @@ _callback( ( -(_cast_data)_E +&_E->_timer | -(_cast_data)&_E +&_E._timer | -_E +&_E->_timer ) ) // If a timer has been configured without a data argument, it can be // converted without regard to the callback argument, since it is unused. @match_timer_function_unused_data@ expression _E; identifier _timer; identifier _callback; @@ ( -setup_timer(&_E->_timer, _callback, 0); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, 0L); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, 0UL); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0L); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0UL); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_timer, _callback, 0); +timer_setup(&_timer, _callback, 0); | -setup_timer(&_timer, _callback, 0L); +timer_setup(&_timer, _callback, 0); | -setup_timer(&_timer, _callback, 0UL); +timer_setup(&_timer, _callback, 0); | -setup_timer(_timer, _callback, 0); +timer_setup(_timer, _callback, 0); | -setup_timer(_timer, _callback, 0L); +timer_setup(_timer, _callback, 0); | -setup_timer(_timer, _callback, 0UL); +timer_setup(_timer, _callback, 0); ) @change_callback_unused_data depends on match_timer_function_unused_data@ identifier match_timer_function_unused_data._callback; type _origtype; identifier _origarg; @@ void _callback( -_origtype _origarg +struct timer_list *unused ) { ... when != _origarg } Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21powerpc/vas: Export chip_to_vas_id()Sukadev Bhattiprolu
Export the symbol chip_to_vas_id() to fix a build failure when CONFIG_CRYPTO_DEV_NX_COMPRESS_POWERNV=m. Fixes: d4ef61b5e895 ("powerpc/vas, nx-842: Define and use chip_to_vas_id()") Reported-by: Haren Myneni <hbabu@us.ibm.com> Reported-by: Josh Boyer <jwboyer@fedoraproject.org> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-20powerpc/64s/slice: Use addr limit when computing slice maskAneesh Kumar K.V
While computing slice mask for the free area we need make sure we only search in the addr limit applicable for this mmap. We update the slb_addr_limit after we request for a mmap above 128TB. But the following mmap request with hint addr below 128TB should still limit its search to below 128TB. ie. we should not use slb_addr_limit to compute slice mask in this case. Instead, we should derive high addr limit based on the mmap hint addr value. Fixes: f4ea6dcb08ea ("powerpc/mm: Enable mappings above 128TB") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-17Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: - a bit more MM - procfs updates - dynamic-debug fixes - lib/ updates - checkpatch - epoll - nilfs2 - signals - rapidio - PID management cleanup and optimization - kcov updates - sysvipc updates - quite a few misc things all over the place * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits) EXPERT Kconfig menu: fix broken EXPERT menu include/asm-generic/topology.h: remove unused parent_node() macro arch/tile/include/asm/topology.h: remove unused parent_node() macro arch/sparc/include/asm/topology_64.h: remove unused parent_node() macro arch/sh/include/asm/topology.h: remove unused parent_node() macro arch/ia64/include/asm/topology.h: remove unused parent_node() macro drivers/pcmcia/sa1111_badge4.c: avoid unused function warning mm: add infrastructure for get_user_pages_fast() benchmarking sysvipc: make get_maxid O(1) again sysvipc: properly name ipc_addid() limit parameter sysvipc: duplicate lock comments wrt ipc_addid() sysvipc: unteach ids->next_id for !CHECKPOINT_RESTORE initramfs: use time64_t timestamps drivers/watchdog: make use of devm_register_reboot_notifier() kernel/reboot.c: add devm_register_reboot_notifier() kcov: update documentation Makefile: support flag -fsanitizer-coverage=trace-cmp kcov: support comparison operands collection kcov: remove pointless current != NULL check kernel/panic.c: add TAINT_AUX ...
2017-11-17pid: replace pid bitmap implementation with IDR APIGargi Sharma
Patch series "Replacing PID bitmap implementation with IDR API", v4. This series replaces kernel bitmap implementation of PID allocation with IDR API. These patches are written to simplify the kernel by replacing custom code with calls to generic code. The following are the stats for pid and pid_namespace object files before and after the replacement. There is a noteworthy change between the IDR and bitmap implementation. Before text data bss dec hex filename 8447 3894 64 12405 3075 kernel/pid.o After text data bss dec hex filename 3397 304 0 3701 e75 kernel/pid.o Before text data bss dec hex filename 5692 1842 192 7726 1e2e kernel/pid_namespace.o After text data bss dec hex filename 2854 216 16 3086 c0e kernel/pid_namespace.o The following are the stats for ps, pstree and calling readdir on /proc for 10,000 processes. ps: With IDR API With bitmap real 0m1.479s 0m2.319s user 0m0.070s 0m0.060s sys 0m0.289s 0m0.516s pstree: With IDR API With bitmap real 0m1.024s 0m1.794s user 0m0.348s 0m0.612s sys 0m0.184s 0m0.264s proc: With IDR API With bitmap real 0m0.059s 0m0.074s user 0m0.000s 0m0.004s sys 0m0.016s 0m0.016s This patch (of 2): Replace the current bitmap implementation for Process ID allocation. Functions that are no longer required, for example, free_pidmap(), alloc_pidmap(), etc. are removed. The rest of the functions are modified to use the IDR API. The change was made to make the PID allocation less complex by replacing custom code with calls to generic API. [gs051095@gmail.com: v6] Link: http://lkml.kernel.org/r/1507760379-21662-2-git-send-email-gs051095@gmail.com [avagin@openvz.org: restore the old behaviour of the ns_last_pid sysctl] Link: http://lkml.kernel.org/r/20171106183144.16368-1-avagin@openvz.org Link: http://lkml.kernel.org/r/1507583624-22146-2-git-send-email-gs051095@gmail.com Signed-off-by: Gargi Sharma <gs051095@gmail.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Ingo Molnar <mingo@kernel.org> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17Merge tag 'locks-v4.15-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking update from Jeff Layton: "A couple of fixes for a patch that went into v4.14, and the bug report just came in a few days ago.. It passes my (minimal) testing, and has been in linux-next for a few days now. I also would like to get my address changed in MAINTAINERS to clear that hurdle" * tag 'locks-v4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: fcntl: don't cap l_start and l_end values for F_GETLK64 in compat syscall fcntl: don't leak fd reference when fixup_compat_flock fails MAINTAINERS: s/jlayton@poochiereds.net/jlayton@kernel.org/
2017-11-17Merge branch 'misc.compat' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull compat and uaccess updates from Al Viro: - {get,put}_compat_sigset() series - assorted compat ioctl stuff - more set_fs() elimination - a few more timespec64 conversions - several removals of pointless access_ok() in places where it was followed only by non-__ variants of primitives * 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (24 commits) coredump: call do_unlinkat directly instead of sys_unlink fs: expose do_unlinkat for built-in callers ext4: take handling of EXT4_IOC_GROUP_ADD into a helper, get rid of set_fs() ipmi: get rid of pointless access_ok() pi433: sanitize ioctl cxlflash: get rid of pointless access_ok() mtdchar: get rid of pointless access_ok() r128: switch compat ioctls to drm_ioctl_kernel() selection: get rid of field-by-field copyin VT_RESIZEX: get rid of field-by-field copyin i2c compat ioctls: move to ->compat_ioctl() sched_rr_get_interval(): move compat to native, get rid of set_fs() mips: switch to {get,put}_compat_sigset() sparc: switch to {get,put}_compat_sigset() s390: switch to {get,put}_compat_sigset() ppc: switch to {get,put}_compat_sigset() parisc: switch to {get,put}_compat_sigset() get_compat_sigset() get rid of {get,put}_compat_itimerspec() io_getevents: Use timespec64 to represent timeouts ...
2017-11-16Merge tag 'kvm-4.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Radim Krčmář: "First batch of KVM changes for 4.15 Common: - Python 3 support in kvm_stat - Accounting of slabs to kmemcg ARM: - Optimized arch timer handling for KVM/ARM - Improvements to the VGIC ITS code and introduction of an ITS reset ioctl - Unification of the 32-bit fault injection logic - More exact external abort matching logic PPC: - Support for running hashed page table (HPT) MMU mode on a host that is using the radix MMU mode; single threaded mode on POWER 9 is added as a pre-requisite - Resolution of merge conflicts with the last second 4.14 HPT fixes - Fixes and cleanups s390: - Some initial preparation patches for exitless interrupts and crypto - New capability for AIS migration - Fixes x86: - Improved emulation of LAPIC timer mode changes, MCi_STATUS MSRs, and after-reset state - Refined dependencies for VMX features - Fixes for nested SMI injection - A lot of cleanups" * tag 'kvm-4.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (89 commits) KVM: s390: provide a capability for AIS state migration KVM: s390: clear_io_irq() requests are not expected for adapter interrupts KVM: s390: abstract conversion between isc and enum irq_types KVM: s390: vsie: use common code functions for pinning KVM: s390: SIE considerations for AP Queue virtualization KVM: s390: document memory ordering for kvm_s390_vcpu_wakeup KVM: PPC: Book3S HV: Cosmetic post-merge cleanups KVM: arm/arm64: fix the incompatible matching for external abort KVM: arm/arm64: Unify 32bit fault injection KVM: arm/arm64: vgic-its: Implement KVM_DEV_ARM_ITS_CTRL_RESET KVM: arm/arm64: Document KVM_DEV_ARM_ITS_CTRL_RESET KVM: arm/arm64: vgic-its: Free caches when GITS_BASER Valid bit is cleared KVM: arm/arm64: vgic-its: New helper functions to free the caches KVM: arm/arm64: vgic-its: Remove kvm_its_unmap_device arm/arm64: KVM: Load the timer state when enabling the timer KVM: arm/arm64: Rework kvm_timer_should_fire KVM: arm/arm64: Get rid of kvm_timer_flush_hwstate KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit KVM: arm/arm64: Move phys_timer_emulate function KVM: arm/arm64: Use kvm_arm_timer_set/get_reg for guest register traps ...
2017-11-16Merge tag 'powerpc-4.15-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "A bit of a small release, I suspect in part due to me travelling for KS. But my backlog of patches to review is smaller than usual, so I think in part folks just didn't send as much this cycle. Non-highlights: - Five fixes for the >128T address space handling, both to fix bugs in our implementation and to bring the semantics exactly into line with x86. Highlights: - Support for a new OPAL call on bare metal machines which gives us a true NMI (ie. is not masked by MSR[EE]=0) for debugging etc. - Support for Power9 DD2 in the CXL driver. - Improvements to machine check handling so that uncorrectable errors can be reported into the generic memory_failure() machinery. - Some fixes and improvements for VPHN, which is used under PowerVM to notify the Linux partition of topology changes. - Plumbing to enable TM (transactional memory) without suspend on some Power9 processors (PPC_FEATURE2_HTM_NO_SUSPEND). - Support for emulating vector loads form cache-inhibited memory, on some Power9 revisions. - Disable the fast-endian switch "syscall" by default (behind a CONFIG), we believe it has never had any users. - A major rework of the API drivers use when initiating and waiting for long running operations performed by OPAL firmware, and changes to the powernv_flash driver to use the new API. - Several fixes for the handling of FP/VMX/VSX while processes are using transactional memory. - Optimisations of TLB range flushes when using the radix MMU on Power9. - Improvements to the VAS facility used to access coprocessors on Power9, and related improvements to the way the NX crypto driver handles requests. - Implementation of PMEM_API and UACCESS_FLUSHCACHE for 64-bit. Thanks to: Alexey Kardashevskiy, Alistair Popple, Allen Pais, Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Balbir Singh, Benjamin Herrenschmidt, Breno Leitao, Christophe Leroy, Christophe Lombard, Cyril Bur, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven, Guilherme G. Piccoli, Gustavo Romero, Haren Myneni, Joel Stanley, Kamalesh Babulal, Kautuk Consul, Markus Elfring, Masami Hiramatsu, Michael Bringmann, Michael Neuling, Michal Suchanek, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud, Sandipan Das, Seth Forshee, Shriya, Stephen Rothwell, Stewart Smith, Sukadev Bhattiprolu, Tyrel Datwyler, Vaibhav Jain, Vaidyanathan Srinivasan, and William A. Kennington III" * tag 'powerpc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (151 commits) powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature powerpc/64s: Fix masking of SRR1 bits on instruction fault powerpc/64s: mm_context.addr_limit is only used on hash powerpc/64s/radix: Fix 128TB-512TB virtual address boundary case allocation powerpc/64s/hash: Allow MAP_FIXED allocations to cross 128TB boundary powerpc/64s/hash: Fix fork() with 512TB process address space powerpc/64s/hash: Fix 128TB-512TB virtual address boundary case allocation powerpc/64s/hash: Fix 512T hint detection to use >= 128T powerpc: Fix DABR match on hash based systems powerpc/signal: Properly handle return value from uprobe_deny_signal() powerpc/fadump: use kstrtoint to handle sysfs store powerpc/lib: Implement UACCESS_FLUSHCACHE API powerpc/lib: Implement PMEM API powerpc/powernv/npu: Don't explicitly flush nmmu tlb powerpc/powernv/npu: Use flush_all_mm() instead of flush_tlb_mm() powerpc/powernv/idle: Round up latency and residency values powerpc/kprobes: refactor kprobe_lookup_name for safer string operations powerpc/kprobes: Blacklist emulate_update_regs() from kprobes powerpc/kprobes: Do not disable interrupts for optprobes and kprobes_on_ftrace powerpc/kprobes: Disable preemption before invoking probe handler for optprobes ...
2017-11-15Merge tag 'drm-for-v4.15' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm updates from Dave Airlie: "This is the main drm pull request for v4.15. Core: - Atomic object lifetime fixes - Atomic iterator improvements - Sparse/smatch fixes - Legacy kms ioctls to be interruptible - EDID override improvements - fb/gem helper cleanups - Simple outreachy patches - Documentation improvements - Fix dma-buf rcu races - DRM mode object leasing for improving VR use cases. - vgaarb improvements for non-x86 platforms. New driver: - tve200: Faraday Technology TVE200 block. This "TV Encoder" encodes a ITU-T BT.656 stream and can be found in the StorLink SL3516 (later Cortina Systems CS3516) as well as the Grain Media GM8180. New bridges: - SiI9234 support New panels: - S6E63J0X03, OTM8009A, Seiko 43WVF1G, 7" rpi touch panel, Toshiba LT089AC19000, Innolux AT043TN24 i915: - Remove Coffeelake from alpha support - Cannonlake workarounds - Infoframe refactoring for DisplayPort - VBT updates - DisplayPort vswing/emph/buffer translation refactoring - CCS fixes - Restore GPU clock boost on missed vblanks - Scatter list updates for userptr allocations - Gen9+ transition watermarks - Display IPC (Isochronous Priority Control) - Private PAT management - GVT: improved error handling and pci config sanitizing - Execlist refactoring - Transparent Huge Page support - User defined priorities support - HuC/GuC firmware refactoring - DP MST fixes - eDP power sequencing fixes - Use RCU instead of stop_machine - PSR state tracking support - Eviction fixes - BDW DP aux channel timeout fixes - LSPCON fixes - Cannonlake PLL fixes amdgpu: - Per VM BO support - Powerplay cleanups - CI powerplay support - PASID mgr for kfd - SR-IOV fixes - initial GPU reset for vega10 - Prime mmap support - TTM updates - Clock query interface for Raven - Fence to handle ioctl - UVD encode ring support on Polaris - Transparent huge page DMA support - Compute LRU pipe tweaks - BO flag to allow buffers to opt out of implicit sync - CTX priority setting API - VRAM lost infrastructure plumbing qxl: - fix flicker since atomic rework amdkfd: - Further improvements from internal AMD tree - Usermode events - Drop radeon support nouveau: - Pascal temperature sensor support - Improved BAR2 handling - MMU rework to support Pascal MMU exynos: - Improved HDMI/mixer support - HDMI audio interface support tegra: - Prep work for tegra186 - Cleanup/fixes msm: - Preemption support for a5xx - Display fixes for 8x96 (snapdragon 820) - Async cursor plane fixes - FW loading rework - GPU debugging improvements vc4: - Prep for DSI panels - fix T-format tiling scanout - New madvise ioctl Rockchip: - LVDS support omapdrm: - omap4 HDMI CEC support etnaviv: - GPU performance counters groundwork sun4i: - refactor driver load + TCON backend - HDMI improvements - A31 support - Misc fixes udl: - Probe/EDID read fixes. tilcdc: - Misc fixes. pl111: - Support more variants adv7511: - Improve EDID handling. - HDMI CEC support sii8620: - Add remote control support" * tag 'drm-for-v4.15' of git://people.freedesktop.org/~airlied/linux: (1480 commits) drm/rockchip: analogix_dp: Use mutex rather than spinlock drm/mode_object: fix documentation for object lookups. drm/i915: Reorder context-close to avoid calling i915_vma_close() under RCU drm/i915: Move init_clock_gating() back to where it was drm/i915: Prune the reservation shared fence array drm/i915: Idle the GPU before shinking everything drm/i915: Lock llist_del_first() vs llist_del_all() drm/i915: Calculate ironlake intermediate watermarks correctly, v2. drm/i915: Disable lazy PPGTT page table optimization for vGPU drm/i915/execlists: Remove the priority "optimisation" drm/i915: Filter out spurious execlists context-switch interrupts drm/amdgpu: use irq-safe lock for kiq->ring_lock drm/amdgpu: bypass lru touch for KIQ ring submission drm/amdgpu: Potential uninitialized variable in amdgpu_vm_update_directories() drm/amdgpu: potential uninitialized variable in amdgpu_vce_ring_parse_cs() drm/amd/powerplay: initialize a variable before using it drm/amd/powerplay: suppress KASAN out of bounds warning in vega10_populate_all_memory_levels drm/amd/amdgpu: fix evicted VRAM bo adjudgement condition drm/vblank: Tune drm_crtc_accurate_vblank_count() WARN down to a debug drm/rockchip: add CONFIG_OF dependency for lvds ...
2017-11-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - a few misc bits - ocfs2 updates - almost all of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (131 commits) memory hotplug: fix comments when adding section mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP mm: simplify nodemask printing mm,oom_reaper: remove pointless kthread_run() error check mm/page_ext.c: check if page_ext is not prepared writeback: remove unused function parameter mm: do not rely on preempt_count in print_vma_addr mm, sparse: do not swamp log with huge vmemmap allocation failures mm/hmm: remove redundant variable align_end mm/list_lru.c: mark expected switch fall-through mm/shmem.c: mark expected switch fall-through mm/page_alloc.c: broken deferred calculation mm: don't warn about allocations which stall for too long fs: fuse: account fuse_inode slab memory as reclaimable mm, page_alloc: fix potential false positive in __zone_watermark_ok mm: mlock: remove lru_add_drain_all() mm, sysctl: make NUMA stats configurable shmem: convert shmem_init_inodecache() to void Unify migrate_pages and move_pages access checks mm, pagevec: rename pagevec drained field ...
2017-11-15mm: remove cold parameter from free_hot_cold_page*Mel Gorman
Most callers users of free_hot_cold_page claim the pages being released are cache hot. The exception is the page reclaim paths where it is likely that enough pages will be freed in the near future that the per-cpu lists are going to be recycled and the cache hotness information is lost. As no one really cares about the hotness of pages being released to the allocator, just ditch the parameter. The APIs are renamed to indicate that it's no longer about hot/cold pages. It should also be less confusing as there are subtle differences between them. __free_pages drops a reference and frees a page when the refcount reaches zero. free_hot_cold_page handled pages whose refcount was already zero which is non-obvious from the name. free_unref_page should be more obvious. No performance impact is expected as the overhead is marginal. The parameter is removed simply because it is a bit stupid to have a useless parameter copied everywhere. [mgorman@techsingularity.net: add pages to head, not tail] Link: http://lkml.kernel.org/r/20171019154321.qtpzaeftoyyw4iey@techsingularity.net Link: http://lkml.kernel.org/r/20171018075952.10627-8-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15kmemcheck: stop using GFP_NOTRACK and SLAB_NOTRACKLevin, Alexander (Sasha Levin)
Convert all allocations that used a NOTRACK flag to stop using it. Link: http://lkml.kernel.org/r/20171007030159.22241-3-alexander.levin@verizon.com Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Cc: Alexander Potapenko <glider@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tim Hansen <devtimhansen@gmail.com> Cc: Vegard Nossum <vegardno@ifi.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm: account pud page tablesKirill A. Shutemov
On a machine with 5-level paging support a process can allocate significant amount of memory and stay unnoticed by oom-killer and memory cgroup. The trick is to allocate a lot of PUD page tables. We don't account PUD page tables, only PMD and PTE. We already addressed the same issue for PMD page tables, see commit dc6c9a35b66b ("mm: account pmd page tables to the process"). Introduction of 5-level paging brings the same issue for PUD page tables. The patch expands accounting to PUD level. [kirill.shutemov@linux.intel.com: s/pmd_t/pud_t/] Link: http://lkml.kernel.org/r/20171004074305.x35eh5u7ybbt5kar@black.fi.intel.com [heiko.carstens@de.ibm.com: s390/mm: fix pud table accounting] Link: http://lkml.kernel.org/r/20171103090551.18231-1-heiko.carstens@de.ibm.com Link: http://lkml.kernel.org/r/20171002080427.3320-1-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15Merge tag 'pci-v4.15-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: - detach driver before tearing down procfs/sysfs (Alex Williamson) - disable PCIe services during shutdown (Sinan Kaya) - fix ASPM oops on systems with no Root Ports (Ard Biesheuvel) - fix ASPM LTR_L1.2_THRESHOLD programming (Bjorn Helgaas) - fix ASPM Common_Mode_Restore_Time computation (Bjorn Helgaas) - fix portdrv MSI/MSI-X vector allocation (Dongdong Liu, Bjorn Helgaas) - report non-fatal AER errors only to the affected endpoint (Gabriele Paoloni) - distribute bus numbers, MMIO, and I/O space among hotplug bridges to allow more devices to be hot-added (Mika Westerberg) - fix pciehp races during initialization and surprise link down (Mika Westerberg) - handle surprise-removed devices in PME handling (Qiang) - support resizable BARs for large graphics devices (Christian König) - expose SR-IOV offset, stride, and VF device ID via sysfs (Filippo Sironi) - create SR-IOV virtfn/physfn sysfs links before attaching driver (Stuart Hayes) - fix SR-IOV "ARI Capable Hierarchy" restore issue (Tony Nguyen) - enforce Kconfig IOV/REALLOC dependency (Sascha El-Sharkawy) - avoid slot reset if bridge itself is broken (Jan Glauber) - clean up pci_reset_function() path (Jan H. Schönherr) - make pci_map_rom() fail if the option ROM is invalid (Changbin Du) - convert timers to timer_setup() (Kees Cook) - move PCI_QUIRKS to PCI bus Kconfig menu (Randy Dunlap) - constify pci_dev_type and intel_mid_pci_ops (Bhumika Goyal) - remove unnecessary pci_dev, pci_bus, resource, pcibios_set_master() declarations (Bjorn Helgaas) - fix endpoint framework overflows and BUG()s (Dan Carpenter) - fix endpoint framework issues (Kishon Vijay Abraham I) - avoid broken Cavium CN8xxx bus reset behavior (David Daney) - extend Cavium ACS capability quirks (Vadim Lomovtsev) - support Synopsys DesignWare RC in ECAM mode (Ard Biesheuvel) - turn off dra7xx clocks cleanly on shutdown (Keerthy) - fix Faraday probe error path (Wei Yongjun) - support HiSilicon STB SoC PCIe host controller (Jianguo Sun) - fix Hyper-V interrupt affinity issue (Dexuan Cui) - remove useless ACPI warning for Hyper-V pass-through devices (Vitaly Kuznetsov) - support multiple MSI on iProc (Sandor Bodo-Merle) - support Layerscape LS1012a and LS1046a PCIe host controllers (Hou Zhiqiang) - fix Layerscape default error response (Minghuan Lian) - support MSI on Tango host controller (Marc Gonzalez) - support Tegra186 PCIe host controller (Manikanta Maddireddy) - use generic accessors on Tegra when possible (Thierry Reding) - support V3 Semiconductor PCI host controller (Linus Walleij) * tag 'pci-v4.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (85 commits) PCI/ASPM: Add L1 Substates definitions PCI/ASPM: Reformat ASPM register definitions PCI/ASPM: Use correct capability pointer to program LTR_L1.2_THRESHOLD PCI/ASPM: Account for downstream device's Port Common_Mode_Restore_Time PCI: xgene: Rename xgene_pcie_probe_bridge() to xgene_pcie_probe() PCI: xilinx: Rename xilinx_pcie_link_is_up() to xilinx_pcie_link_up() PCI: altera: Rename altera_pcie_link_is_up() to altera_pcie_link_up() PCI: Fix kernel-doc build warning PCI: Fail pci_map_rom() if the option ROM is invalid PCI: Move pci_map_rom() error path PCI: Move PCI_QUIRKS to the PCI bus menu alpha/PCI: Make pdev_save_srm_config() static PCI: Remove unused declarations PCI: Remove redundant pci_dev, pci_bus, resource declarations PCI: Remove redundant pcibios_set_master() declarations PCI/PME: Handle invalid data when reading Root Status PCI: hv: Use effective affinity mask PCI: pciehp: Do not clear Presence Detect Changed during initialization PCI: pciehp: Fix race condition handling surprise link down PCI: Distribute available resources to hotplug-capable bridges ...
2017-11-15Merge tag 'modules-for-v4.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull module updates from Jessica Yu: "Summary of modules changes for the 4.15 merge window: - treewide module_param_call() cleanup, fix up set/get function prototype mismatches, from Kees Cook - minor code cleanups" * tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: module: Do not paper over type mismatches in module_param_call() treewide: Fix function prototypes for module_param_call() module: Prepare to convert all module_param_call() prototypes kernel/module: Delete an error message for a failed memory allocation in add_module_usage()
2017-11-15Merge branch 'for-linus' of ↵Linus Torvalds
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "The usual rocket-science from trivial tree for 4.15" * 'for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: MAINTAINERS: relinquish kconfig MAINTAINERS: Update my email address treewide: Fix typos in Kconfig kfifo: Fix comments init/Kconfig: Fix module signing document location misc: ibmasm: Return error on error path HID: logitech-hidpp: fix mistake in printk, "feeback" -> "feedback" MAINTAINERS: Correct path to uDraw PS3 driver tracing: Fix doc mistakes in trace sample tracing: Kconfig text fixes for CONFIG_HWLAT_TRACER MIPS: Alchemy: Remove reverted CONFIG_NETLINK_MMAP from db1xxx_defconfig mm/huge_memory.c: fixup grammar in comment lib/xz: Add fall-through comments to a switch statement