summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2018-02-16rcu: Export init_rcu_head() and destroy_rcu_head() to GPL modulesPaul E. McKenney
commit 156baec39732f025dc778e00da95fc10d6e45885 upstream. Use of init_rcu_head() and destroy_rcu_head() from modules results in the following build-time error with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y: ERROR: "init_rcu_head" [drivers/scsi/scsi_mod.ko] undefined! ERROR: "destroy_rcu_head" [drivers/scsi/scsi_mod.ko] undefined! This commit therefore adds EXPORT_SYMBOL_GPL() for each to allow them to be used by GPL-licensed kernel modules. Reported-by: Bart Van Assche <Bart.VanAssche@wdc.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-16ftrace: Remove incorrect setting of glob search fieldSteven Rostedt (VMware)
commit 7b6586562708d2b3a04fe49f217ddbadbbbb0546 upstream. __unregister_ftrace_function_probe() will incorrectly parse the glob filter because it resets the search variable that was setup by filter_parse_regex(). Al Viro reported this: After that call of filter_parse_regex() we could have func_g.search not equal to glob only if glob started with '!' or '*'. In the former case we would've buggered off with -EINVAL (not = 1). In the latter we would've set func_g.search equal to glob + 1, calculated the length of that thing in func_g.len and proceeded to reset func_g.search back to glob. Suppose the glob is e.g. *foo*. We end up with func_g.type = MATCH_MIDDLE_ONLY; func_g.len = 3; func_g.search = "*foo"; Feeding that to ftrace_match_record() will not do anything sane - we will be looking for names containing "*foo" (->len is ignored for that one). Link: http://lkml.kernel.org/r/20180127031706.GE13338@ZenIV.linux.org.uk Fixes: 3ba009297149f ("ftrace: Introduce ftrace_glob structure") Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-16genirq: Make legacy autoprobing work againThomas Gleixner
commit 1beaeacdc88b537703d04d5536235d0bbb36db93 upstream. Meelis reported the following warning on a quad P3 HP NetServer museum piece: WARNING: CPU: 3 PID: 258 at kernel/irq/chip.c:244 __irq_startup+0x80/0x100 EIP: __irq_startup+0x80/0x100 irq_startup+0x7e/0x170 probe_irq_on+0x128/0x2b0 parport_irq_probe.constprop.18+0x8d/0x1af [parport_pc] parport_pc_probe_port+0xf11/0x1260 [parport_pc] parport_pc_init+0x78a/0xf10 [parport_pc] parport_parse_param.constprop.16+0xf0/0xf0 [parport_pc] do_one_initcall+0x45/0x1e0 This is caused by the rewrite of the irq activation/startup sequence which missed to convert a callsite in the irq legacy auto probing code. To fix this irq_activate_and_startup() needs to gain a return value so the pending logic can work proper. Fixes: c942cee46bba ("genirq: Separate activation and startup") Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Meelis Roos <mroos@linux.ee> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801301935410.1797@nanos Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-16kernel/relay.c: revert "kernel/relay.c: fix potential memory leak"Eric Biggers
commit a1be1f3931bfe0a42b46fef77a04593c2b136e7f upstream. This reverts commit ba62bafe942b ("kernel/relay.c: fix potential memory leak"). This commit introduced a double free bug, because 'chan' is already freed by the line: kref_put(&chan->kref, relay_destroy_channel); This bug was found by syzkaller, using the BLKTRACESETUP ioctl. Link: http://lkml.kernel.org/r/20180127004759.101823-1-ebiggers3@gmail.com Fixes: ba62bafe942b ("kernel/relay.c: fix potential memory leak") Signed-off-by: Eric Biggers <ebiggers@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Zhouyi Zhou <yizhouzhou@ict.ac.cn> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-16kernel/async.c: revert "async: simplify lowest_in_progress()"Rasmus Villemoes
commit 4f7e988e63e336827f4150de48163bed05d653bd upstream. This reverts commit 92266d6ef60c ("async: simplify lowest_in_progress()") which was simply wrong: In the case where domain is NULL, we now use the wrong offsetof() in the list_first_entry macro, so we don't actually fetch the ->cookie value, but rather the eight bytes located sizeof(struct list_head) further into the struct async_entry. On 64 bit, that's the data member, while on 32 bit, that's a u64 built from func and data in some order. I think the bug happens to be harmless in practice: It obviously only affects callers which pass a NULL domain, and AFAICT the only such caller is async_synchronize_full() -> async_synchronize_full_domain(NULL) -> async_synchronize_cookie_domain(ASYNC_COOKIE_MAX, NULL) and the ASYNC_COOKIE_MAX means that in practice we end up waiting for the async_global_pending list to be empty - but it would break if somebody happened to pass (void*)-1 as the data element to async_schedule, and of course also if somebody ever does a async_synchronize_cookie_domain(, NULL) with a "finite" cookie value. Maybe the "harmless in practice" means this isn't -stable material. But I'm not completely confident my quick git grep'ing is enough, and there might be affected code in one of the earlier kernels that has since been removed, so I'll leave the decision to the stable guys. Link: http://lkml.kernel.org/r/20171128104938.3921-1-linux@rasmusvillemoes.dk Fixes: 92266d6ef60c "async: simplify lowest_in_progress()" Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Tejun Heo <tj@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Adam Wallis <awallis@codeaurora.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-16sched/rt: Up the root domain ref count when passing it around via IPIsSteven Rostedt (VMware)
commit 364f56653708ba8bcdefd4f0da2a42904baa8eeb upstream. When issuing an IPI RT push, where an IPI is sent to each CPU that has more than one RT task scheduled on it, it references the root domain's rto_mask, that contains all the CPUs within the root domain that has more than one RT task in the runable state. The problem is, after the IPIs are initiated, the rq->lock is released. This means that the root domain that is associated to the run queue could be freed while the IPIs are going around. Add a sched_get_rd() and a sched_put_rd() that will increment and decrement the root domain's ref count respectively. This way when initiating the IPIs, the scheduler will up the root domain's ref count before releasing the rq->lock, ensuring that the root domain does not go away until the IPI round is complete. Reported-by: Pavan Kondeti <pkondeti@codeaurora.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 4bdced5c9a292 ("sched/rt: Simplify the IPI based RT balancing logic") Link: http://lkml.kernel.org/r/CAEU1=PkiHO35Dzna8EQqNSKW1fr1y1zRQ5y66X117MG06sQtNA@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-16sched/rt: Use container_of() to get root domain in rto_push_irq_work_func()Steven Rostedt (VMware)
commit ad0f1d9d65938aec72a698116cd73a980916895e upstream. When the rto_push_irq_work_func() is called, it looks at the RT overloaded bitmask in the root domain via the runqueue (rq->rd). The problem is that during CPU up and down, nothing here stops rq->rd from changing between taking the rq->rd->rto_lock and releasing it. That means the lock that is released is not the same lock that was taken. Instead of using this_rq()->rd to get the root domain, as the irq work is part of the root domain, we can simply get the root domain from the irq work that is passed to the routine: container_of(work, struct root_domain, rto_push_work) This keeps the root domain consistent. Reported-by: Pavan Kondeti <pkondeti@codeaurora.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 4bdced5c9a292 ("sched/rt: Simplify the IPI based RT balancing logic") Link: http://lkml.kernel.org/r/CAEU1=PkiHO35Dzna8EQqNSKW1fr1y1zRQ5y66X117MG06sQtNA@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-07module/retpoline: Warn about missing retpoline in moduleAndi Kleen
commit caf7501a1b4ec964190f31f9c3f163de252273b8 There's a risk that a kernel which has full retpoline mitigations becomes vulnerable when a module gets loaded that hasn't been compiled with the right compiler or the right option. To enable detection of that mismatch at module load time, add a module info string "retpoline" at build time when the module was compiled with retpoline support. This only covers compiled C source, but assembler source or prebuilt object files are not checked. If a retpoline enabled kernel detects a non retpoline protected module at load time, print a warning and report it in the sysfs vulnerability file. [ tglx: Massaged changelog ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: gregkh@linuxfoundation.org Cc: torvalds@linux-foundation.org Cc: jeyu@kernel.org Cc: arjan@linux.intel.com Link: https://lkml.kernel.org/r/20180125235028.31211-1-andi@firstfloor.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-28Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "A single fix for a ~10 years old problem which causes high resolution timers to stop after a CPU unplug/plug cycle due to a stale flag in the per CPU hrtimer base struct. Paul McKenney was hunting this for about a year, but the heisenbug nature made it resistant against debug attempts for quite some time" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimer: Reset hrtimer cpu base proper on CPU hotplug
2018-01-28Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Thomas Gleixner: "A single bug fix to prevent a subtle deadlock in the scheduler core code vs cpu hotplug" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Fix cpu.max vs. cpuhotplug deadlock
2018-01-28Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "Four patches which all address lock inversions and deadlocks in the perf core code and the Intel debug store" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Fix perf,x86,cpuhp deadlock perf/core: Fix ctx::mutex deadlock perf/core: Fix another perf,trace,cpuhp lock inversion perf/core: Fix lock inversion between perf,trace,cpuhp
2018-01-28Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Thomas Gleixner: "Two final locking fixes for 4.15: - Repair the OWNER_DIED logic in the futex code which got wreckaged with the recent fix for a subtle race condition. - Prevent the hard lockup detector from triggering when dumping all held locks in the system" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/lockdep: Avoid triggering hardlockup from debug_show_all_locks() futex: Fix OWNER_DEAD fixup
2018-01-27hrtimer: Reset hrtimer cpu base proper on CPU hotplugThomas Gleixner
The hrtimer interrupt code contains a hang detection and mitigation mechanism, which prevents that a long delayed hrtimer interrupt causes a continous retriggering of interrupts which prevent the system from making progress. If a hang is detected then the timer hardware is programmed with a certain delay into the future and a flag is set in the hrtimer cpu base which prevents newly enqueued timers from reprogramming the timer hardware prior to the chosen delay. The subsequent hrtimer interrupt after the delay clears the flag and resumes normal operation. If such a hang happens in the last hrtimer interrupt before a CPU is unplugged then the hang_detected flag is set and stays that way when the CPU is plugged in again. At that point the timer hardware is not armed and it cannot be armed because the hang_detected flag is still active, so nothing clears that flag. As a consequence the CPU does not receive hrtimer interrupts and no timers expire on that CPU which results in RCU stalls and other malfunctions. Clear the flag along with some other less critical members of the hrtimer cpu base to ensure starting from a clean state when a CPU is plugged in. Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the root cause of that hard to reproduce heisenbug. Once understood it's trivial and certainly justifies a brown paperbag. Fixes: 41d2e4949377 ("hrtimer: Tune hrtimer_interrupt hang logic") Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Sewior <bigeasy@linutronix.de> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos
2018-01-25perf/core: Fix ctx::mutex deadlockPeter Zijlstra
Lockdep noticed the following 3-way lockup scenario: sys_perf_event_open() perf_event_alloc() perf_try_init_event() #0 ctx = perf_event_ctx_lock_nested(1) perf_swevent_init() swevent_hlist_get() #1 mutex_lock(&pmus_lock) perf_event_init_cpu() #1 mutex_lock(&pmus_lock) #2 mutex_lock(&ctx->mutex) sys_perf_event_open() mutex_lock_double() #2 mutex_lock() #0 mutex_lock_nested() And while we need that perf_event_ctx_lock_nested() for HW PMUs such that they can iterate the sibling list, trying to match it to the available counters, the software PMUs need do no such thing. Exclude them. In particular the swevent triggers the above invertion, while the tpevent PMU triggers a more elaborate one through their event_mutex. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-25perf/core: Fix another perf,trace,cpuhp lock inversionPeter Zijlstra
Lockdep noticed the following 3-way lockup race: perf_trace_init() #0 mutex_lock(&event_mutex) perf_trace_event_init() perf_trace_event_reg() tp_event->class->reg() := tracepoint_probe_register #1 mutex_lock(&tracepoints_mutex) trace_point_add_func() #2 static_key_enable() #2 do_cpu_up() perf_event_init_cpu() #3 mutex_lock(&pmus_lock) #4 mutex_lock(&ctx->mutex) perf_ioctl() #4 ctx = perf_event_ctx_lock() _perf_iotcl() ftrace_profile_set_filter() #0 mutex_lock(&event_mutex) Fudge it for now by noting that the tracepoint state does not depend on the event <-> context relation. Ugly though :/ Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-25perf/core: Fix lock inversion between perf,trace,cpuhpPeter Zijlstra
Lockdep gifted us with noticing the following 4-way lockup scenario: perf_trace_init() #0 mutex_lock(&event_mutex) perf_trace_event_init() perf_trace_event_reg() tp_event->class->reg() := tracepoint_probe_register #1 mutex_lock(&tracepoints_mutex) trace_point_add_func() #2 static_key_enable() #2 do_cpu_up() perf_event_init_cpu() #3 mutex_lock(&pmus_lock) #4 mutex_lock(&ctx->mutex) perf_event_task_disable() mutex_lock(&current->perf_event_mutex) #4 ctx = perf_event_ctx_lock() #5 perf_event_for_each_child() do_exit() task_work_run() __fput() perf_release() perf_event_release_kernel() #4 mutex_lock(&ctx->mutex) #5 mutex_lock(&event->child_mutex) free_event() _free_event() event->destroy() := perf_trace_destroy #0 mutex_lock(&event_mutex); Fix that by moving the free_event() out from under the locks. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-24sched/core: Fix cpu.max vs. cpuhotplug deadlockPeter Zijlstra
Tejun reported the following cpu-hotplug lock (percpu-rwsem) read recursion: tg_set_cfs_bandwidth() get_online_cpus() cpus_read_lock() cfs_bandwidth_usage_inc() static_key_slow_inc() cpus_read_lock() Reported-by: Tejun Heo <tj@kernel.org> Tested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180122215328.GP3397@worktop Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-24locking/lockdep: Avoid triggering hardlockup from debug_show_all_locks()Tejun Heo
debug_show_all_locks() iterates all tasks and print held locks whole holding tasklist_lock. This can take a while on a slow console device and may end up triggering NMI hardlockup detector if someone else ends up waiting for tasklist_lock. Touch the NMI watchdog while printing the held locks to avoid spuriously triggering the hardlockup detector. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-team@fb.com Link: http://lkml.kernel.org/r/20180122220055.GB1771050@devbig577.frc2.facebook.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-24futex: Fix OWNER_DEAD fixupPeter Zijlstra
Both Geert and DaveJ reported that the recent futex commit: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex") introduced a problem with setting OWNER_DEAD. We set the bit on an uninitialized variable and then entirely optimize it away as a dead-store. Move the setting of the bit to where it is more useful. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex") Link: http://lkml.kernel.org/r/20180122103947.GD2228@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-23tracing: Update stack trace skipping for ORC unwinderSteven Rostedt (VMware)
With the addition of ORC unwinder and FRAME POINTER unwinder, the stack trace skipping requirements have changed. I went through the tracing stack trace dumps with ORC and with frame pointers and recalculated the proper values. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23ftrace, orc, x86: Handle ftrace dynamically allocated trampolinesSteven Rostedt (VMware)
The function tracer can create a dynamically allocated trampoline that is called by the function mcount or fentry hook that is used to call the function callback that is registered. The problem is that the orc undwinder will bail if it encounters one of these trampolines. This breaks the stack trace of function callbacks, which include the stack tracer and setting the stack trace for individual functions. Since these dynamic trampolines are basically copies of the static ftrace trampolines defined in ftrace_*.S, we do not need to create new orc entries for the dynamic trampolines. Finding the return address on the stack will be identical as the functions that were copied to create the dynamic trampolines. When encountering a ftrace dynamic trampoline, we can just use the orc entry of the ftrace static function that was copied for that trampoline. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-21Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Thomas Gleixner: "A single fix for the new matrix allocator to prevent vector exhaustion by certain network drivers which allocate gazillions of unused vectors which cannot be put into reservation mode due to MSI and the lack of MSI entry masking. The fix/workaround is to spread the vectors across CPUs by searching the supplied target CPU mask for the CPU with the smallest number of allocated vectors" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irq/matrix: Spread interrupts on allocation
2018-01-19Merge tag 'trace-v4.15-rc4-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Two more small fixes - The conversion of enums into their actual numbers to display in the event format file had an off-by-one bug, that could cause an enum not to be converted, and break user space parsing tools. - A fix to a previous fix to bring back the context recursion checks. The interrupt case checks for NMI, IRQ and softirq, but the softirq returned the same number regardless if it was set or not, although the logic would force it to be set if it were hit" * tag 'trace-v4.15-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix converting enum's from the map in trace_event_eval_update() ring-buffer: Fix duplicate results in mapping context to bits in recursive lock
2018-01-19Merge branch 'for-4.15-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fix from Tejun Heo: "cgroup.threads should be delegatable (ie. a container should be able to write to it from inside) but was missing the flag. The change is very low risk" * 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: make cgroup.threads delegatable
2018-01-19Merge branch 'for-4.15-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixlet from Tejun Heo: "One patch to add touch_nmi_watchdog() while dumping workqueue debug messages to avoid triggering the lockup detector spuriously. The change is very low risk" * 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: avoid hard lockups in show_workqueue_state()
2018-01-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix BPF divides by zero, from Eric Dumazet and Alexei Starovoitov. 2) Reject stores into bpf context via st and xadd, from Daniel Borkmann. 3) Fix a memory leak in TUN, from Cong Wang. 4) Disable RX aggregation on a specific troublesome configuration of r8152 in a Dell TB16b dock. 5) Fix sw_ctx leak in tls, from Sabrina Dubroca. 6) Fix program replacement in cls_bpf, from Daniel Borkmann. 7) Fix uninitialized station_info structures in cfg80211, from Johannes Berg. 8) Fix miscalculation of transport header offset field in flow dissector, from Eric Dumazet. 9) Fix LPM tree leak on failure in mlxsw driver, from Ido Schimmel. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits) ibmvnic: Fix IPv6 packet descriptors ibmvnic: Fix IP offload control buffer ipv6: don't let tb6_root node share routes with other node ip6_gre: init dev->mtu and dev->hard_header_len correctly mlxsw: spectrum_router: Free LPM tree upon failure flow_dissector: properly cap thoff field fm10k: mark PM functions as __maybe_unused cfg80211: fix station info handling bugs netlink: reset extack earlier in netlink_rcv_skb can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once bpf: mark dst unknown on inconsistent {s, u}bounds adjustments bpf: fix cls_bpf on filter replace Net: ethernet: ti: netcp: Fix inbound ping crash if MTU size is greater than 1500 tls: reset crypto_info when do_tls_setsockopt_tx fails tls: return -EBUSY if crypto_info is already set tls: fix sw_ctx leak net/tls: Only attach to sockets in ESTABLISHED state net: fs_enet: do not call phy_stop() in interrupts r8152: disable RX aggregation on Dell TB16 dock ...
2018-01-18tracing: Fix converting enum's from the map in trace_event_eval_update()Steven Rostedt (VMware)
Since enums do not get converted by the TRACE_EVENT macro into their values, the event format displaces the enum name and not the value. This breaks tools like perf and trace-cmd that need to interpret the raw binary data. To solve this, an enum map was created to convert these enums into their actual numbers on boot up. This is done by TRACE_EVENTS() adding a TRACE_DEFINE_ENUM() macro. Some enums were not being converted. This was caused by an optization that had a bug in it. All calls get checked against this enum map to see if it should be converted or not, and it compares the call's system to the system that the enum map was created under. If they match, then they call is processed. To cut down on the number of iterations needed to find the maps with a matching system, since calls and maps are grouped by system, when a match is made, the index into the map array is saved, so that the next call, if it belongs to the same system as the previous call, could start right at that array index and not have to scan all the previous arrays. The problem was, the saved index was used as the variable to know if this is a call in a new system or not. If the index was zero, it was assumed that the call is in a new system and would keep incrementing the saved index until it found a matching system. The issue arises when the first matching system was at index zero. The next map, if it belonged to the same system, would then think it was the first match and increment the index to one. If the next call belong to the same system, it would begin its search of the maps off by one, and miss the first enum that should be converted. This left a single enum not converted properly. Also add a comment to describe exactly what that index was for. It took me a bit too long to figure out what I was thinking when debugging this issue. Link: http://lkml.kernel.org/r/717BE572-2070-4C1E-9902-9F2E0FEDA4F8@oracle.com Cc: stable@vger.kernel.org Fixes: 0c564a538aa93 ("tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values") Reported-by: Chuck Lever <chuck.lever@oracle.com> Teste-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-18ring-buffer: Fix duplicate results in mapping context to bits in recursive lockSteven Rostedt (VMware)
In bringing back the context checks, the code checks first if its normal (non-interrupt) context, and then for NMI then IRQ then softirq. The final check is redundant. Since the if branch is only hit if the context is one of NMI, IRQ, or SOFTIRQ, if it's not NMI or IRQ there's no reason to check if it is SOFTIRQ. The current code returns the same result even if its not a SOFTIRQ. Which is confusing. pc & SOFTIRQ_OFFSET ? 2 : RB_CTX_SOFTIRQ Is redundant as RB_CTX_SOFTIRQ *is* 2! Fixes: a0e3a18f4baf ("ring-buffer: Bring back context level recursive checks") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2018-01-18 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix a divide by zero due to wrong if (src_reg == 0) check in 64-bit mode. Properly handle this in interpreter and mask it also generically in verifier to guard against similar checks in JITs, from Eric and Alexei. 2) Fix a bug in arm64 JIT when tail calls are involved and progs have different stack sizes, from Daniel. 3) Reject stores into BPF context that are not expected BPF_STX | BPF_MEM variant, from Daniel. 4) Mark dst reg as unknown on {s,u}bounds adjustments when the src reg has derived bounds from dead branches, from Daniel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-18irq/matrix: Spread interrupts on allocationThomas Gleixner
Keith reported an issue with vector space exhaustion on a server machine which is caused by the i40e driver allocating 168 MSI interrupts when the driver is initialized, even when most of these interrupts are not used at all. The x86 vector allocation code tries to avoid the immediate allocation with the reservation mode, but the card uses MSI and does not support MSI entry masking, which prevents reservation mode and requires immediate vector allocation. The matrix allocator is a bit naive and prefers the first CPU in the cpumask which describes the possible target CPUs for an allocation. That results in allocating all 168 vectors on CPU0 which later causes vector space exhaustion when the NVMe driver tries to allocate managed interrupts on each CPU for the per CPU queues. Avoid this by finding the CPU which has the lowest vector allocation count to spread out the non managed interrupt accross the possible target CPUs. Fixes: 2f75d9e1c905 ("genirq: Implement bitmap matrix allocator") Reported-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Keith Busch <keith.busch@intel.com> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801171557330.1777@nanos
2018-01-17bpf: mark dst unknown on inconsistent {s, u}bounds adjustmentsDaniel Borkmann
syzkaller generated a BPF proglet and triggered a warning with the following: 0: (b7) r0 = 0 1: (d5) if r0 s<= 0x0 goto pc+0 R0=inv0 R1=ctx(id=0,off=0,imm=0) R10=fp0 2: (1f) r0 -= r1 R0=inv0 R1=ctx(id=0,off=0,imm=0) R10=fp0 verifier internal error: known but bad sbounds What happens is that in the first insn, r0's min/max value are both 0 due to the immediate assignment, later in the jsle test the bounds are updated for the min value in the false path, meaning, they yield smin_val = 1, smax_val = 0, and when ctx pointer is subtracted from r0, verifier bails out with the internal error and throwing a WARN since smin_val != smax_val for the known constant. For min_val > max_val scenario it means that reg_set_min_max() and reg_set_min_max_inv() (which both refine existing bounds) demonstrated that such branch cannot be taken at runtime. In above scenario for the case where it will be taken, the existing [0, 0] bounds are kept intact. Meaning, the rejection is not due to a verifier internal error, and therefore the WARN() is not necessary either. We could just reject such cases in adjust_{ptr,scalar}_min_max_vals() when either known scalars have smin_val != smax_val or umin_val != umax_val or any scalar reg with bounds smin_val > smax_val or umin_val > umax_val. However, there may be a small risk of breakage of buggy programs, so handle this more gracefully and in adjust_{ptr,scalar}_min_max_vals() just taint the dst reg as unknown scalar when we see ops with such kind of src reg. Reported-by: syzbot+6d362cadd45dc0a12ba4@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-17Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "A delayacct statistics correctness fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: delayacct: Account blkio completion on the correct task
2018-01-17Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "Two futex fixes: a input parameters robustness fix, and futex race fixes" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Prevent overflow by strengthen input validation futex: Avoid violating the 10th rule of futex
2018-01-17Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "A one-liner fix which prevents deferrable timers becoming stale when the system does not switch into NOHZ mode" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers: Unconditionally check deferrable base
2018-01-16bpf: reject stores into ctx via st and xaddDaniel Borkmann
Alexei found that verifier does not reject stores into context via BPF_ST instead of BPF_STX. And while looking at it, we also should not allow XADD variant of BPF_STX. The context rewriter is only assuming either BPF_LDX_MEM- or BPF_STX_MEM-type operations, thus reject anything other than that so that assumptions in the rewriter properly hold. Add test cases as well for BPF selftests. Fixes: d691f9e8d440 ("bpf: allow programs to write to certain skb fields") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Two read past end of buffer fixes in AF_KEY, from Eric Biggers. 2) Memory leak in key_notify_policy(), from Steffen Klassert. 3) Fix overflow with bpf arrays, from Daniel Borkmann. 4) Fix RDMA regression with mlx5 due to mlx5 no longer using pci_irq_get_affinity(), from Saeed Mahameed. 5) Missing RCU read locking in nl80211_send_iface() when it calls ieee80211_bss_get_ie(), from Dominik Brodowski. 6) cfg80211 should check dev_set_name()'s return value, from Johannes Berg. 7) Missing module license tag in 9p protocol, from Stephen Hemminger. 8) Fix crash due to too small MTU in udp ipv6 sendmsg, from Mike Maloney. 9) Fix endless loop in netlink extack code, from David Ahern. 10) TLS socket layer sets inverted error codes, resulting in an endless loop. From Robert Hering. 11) Revert openvswitch erspan tunnel support, it's mis-designed and we need to kill it before it goes into a real release. From William Tu. 12) Fix lan78xx failures in full speed USB mode, from Yuiko Oshino. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits) net, sched: fix panic when updating miniq {b,q}stats qed: Fix potential use-after-free in qed_spq_post() nfp: use the correct index for link speed table lan78xx: Fix failure in USB Full Speed sctp: do not allow the v4 socket to bind a v4mapped v6 address sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf sctp: reinit stream if stream outcnt has been change by sinit in sendmsg ibmvnic: Fix pending MAC address changes netlink: extack: avoid parenthesized string constant warning ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY net: Allow neigh contructor functions ability to modify the primary_key sh_eth: fix dumping ARSTR Revert "openvswitch: Add erspan tunnel support." net/tls: Fix inverted error codes to avoid endless loop ipv6: ip6_make_skb() needs to clear cork.base.dst sctp: avoid compiler warning on implicit fallthru net: ipv4: Make "ip route get" match iif lo rules again. netlink: extack needs to be reset each time through loop tipc: fix a memory leak in tipc_nl_node_get_link() ipv6: fix udpv6 sendmsg crash caused by too small MTU ...
2018-01-16Merge tag 'trace-v4.15-rc4-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: - Bring back context level recursive protection in ring buffer. The simpler counter protection failed, due to a path when tracing with trace_clock_global() as it could not be reentrant and depended on the ring buffer recursive protection to keep that from happening. - Prevent branch profiling when FORTIFY_SOURCE is enabled. It causes 50 - 60 MB in warning messages. Branch profiling should never be run on production systems, so there's no reason that it needs to be enabled with FORTIFY_SOURCE. * tag 'trace-v4.15-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Prevent PROFILE_ALL_BRANCHES when FORTIFY_SOURCE=y ring-buffer: Bring back context level recursive checks
2018-01-16delayacct: Account blkio completion on the correct taskJosh Snyder
Before commit: e33a9bba85a8 ("sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler") delayacct_blkio_end() was called after context-switching into the task which completed I/O. This resulted in double counting: the task would account a delay both waiting for I/O and for time spent in the runqueue. With e33a9bba85a8, delayacct_blkio_end() is called by try_to_wake_up(). In ttwu, we have not yet context-switched. This is more correct, in that the delay accounting ends when the I/O is complete. But delayacct_blkio_end() relies on 'get_current()', and we have not yet context-switched into the task whose I/O completed. This results in the wrong task having its delay accounting statistics updated. Instead of doing that, pass the task_struct being woken to delayacct_blkio_end(), so that it can update the statistics of the correct task. Signed-off-by: Josh Snyder <joshs@netflix.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Balbir Singh <bsingharora@gmail.com> Cc: <stable@vger.kernel.org> Cc: Brendan Gregg <bgregg@netflix.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-block@vger.kernel.org Fixes: e33a9bba85a8 ("sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler") Link: http://lkml.kernel.org/r/1513613712-571-1-git-send-email-joshs@netflix.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-15tracing: Prevent PROFILE_ALL_BRANCHES when FORTIFY_SOURCE=yRandy Dunlap
I regularly get 50 MB - 60 MB files during kernel randconfig builds. These large files mostly contain (many repeats of; e.g., 124,594): In file included from ../include/linux/string.h:6:0, from ../include/linux/uuid.h:20, from ../include/linux/mod_devicetable.h:13, from ../scripts/mod/devicetable-offsets.c:3: ../include/linux/compiler.h:64:4: warning: '______f' is static but declared in inline function 'strcpy' which is not static [enabled by default] ______f = { \ ^ ../include/linux/compiler.h:56:23: note: in expansion of macro '__trace_if' ^ ../include/linux/string.h:425:2: note: in expansion of macro 'if' if (p_size == (size_t)-1 && q_size == (size_t)-1) ^ This only happens when CONFIG_FORTIFY_SOURCE=y and CONFIG_PROFILE_ALL_BRANCHES=y, so prevent PROFILE_ALL_BRANCHES if FORTIFY_SOURCE=y. Link: http://lkml.kernel.org/r/9199446b-a141-c0c3-9678-a3f9107f2750@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-15ring-buffer: Bring back context level recursive checksSteven Rostedt (VMware)
Commit 1a149d7d3f45 ("ring-buffer: Rewrite trace_recursive_(un)lock() to be simpler") replaced the context level recursion checks with a simple counter. This would prevent the ring buffer code from recursively calling itself more than the max number of contexts that exist (Normal, softirq, irq, nmi). But this change caused a lockup in a specific case, which was during suspend and resume using a global clock. Adding a stack dump to see where this occurred, the issue was in the trace global clock itself: trace_buffer_lock_reserve+0x1c/0x50 __trace_graph_entry+0x2d/0x90 trace_graph_entry+0xe8/0x200 prepare_ftrace_return+0x69/0xc0 ftrace_graph_caller+0x78/0xa8 queued_spin_lock_slowpath+0x5/0x1d0 trace_clock_global+0xb0/0xc0 ring_buffer_lock_reserve+0xf9/0x390 The function graph tracer traced queued_spin_lock_slowpath that was called by trace_clock_global. This pointed out that the trace_clock_global() is not reentrant, as it takes a spin lock. It depended on the ring buffer recursive lock from letting that happen. By removing the context detection and adding just a max number of allowable recursions, it allowed the trace_clock_global() to be entered again and try to retake the spinlock it already held, causing a deadlock. Fixes: 1a149d7d3f45 ("ring-buffer: Rewrite trace_recursive_(un)lock() to be simpler") Reported-by: David Weinehall <david.weinehall@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-14timers: Unconditionally check deferrable baseThomas Gleixner
When the timer base is checked for expired timers then the deferrable base must be checked as well. This was missed when making the deferrable base independent of base::nohz_active. Fixes: ced6d5c11d3e ("timers: Use deferrable base independent of base::nohz_active") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Cc: rt@linutronix.de
2018-01-14bpf: fix 32-bit divide by zeroAlexei Starovoitov
due to some JITs doing if (src_reg == 0) check in 64-bit mode for div/mod operations mask upper 32-bits of src register before doing the check Fixes: 622582786c9e ("net: filter: x86: internal BPF JIT") Fixes: 7a12b5031c6b ("sparc64: Add eBPF JIT.") Reported-by: syzbot+48340bb518e88849e2e3@syzkaller.appspotmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-14futex: Prevent overflow by strengthen input validationLi Jinyue
UBSAN reports signed integer overflow in kernel/futex.c: UBSAN: Undefined behaviour in kernel/futex.c:2041:18 signed integer overflow: 0 - -2147483648 cannot be represented in type 'int' Add a sanity check to catch negative values of nr_wake and nr_requeue. Signed-off-by: Li Jinyue <lijinyue@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: peterz@infradead.org Cc: dvhart@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1513242294-31786-1-git-send-email-lijinyue@huawei.com
2018-01-14futex: Avoid violating the 10th rule of futexPeter Zijlstra
Julia reported futex state corruption in the following scenario: waiter waker stealer (prio > waiter) futex(WAIT_REQUEUE_PI, uaddr, uaddr2, timeout=[N ms]) futex_wait_requeue_pi() futex_wait_queue_me() freezable_schedule() <scheduled out> futex(LOCK_PI, uaddr2) futex(CMP_REQUEUE_PI, uaddr, uaddr2, 1, 0) /* requeues waiter to uaddr2 */ futex(UNLOCK_PI, uaddr2) wake_futex_pi() cmp_futex_value_locked(uaddr2, waiter) wake_up_q() <woken by waker> <hrtimer_wakeup() fires, clears sleeper->task> futex(LOCK_PI, uaddr2) __rt_mutex_start_proxy_lock() try_to_take_rt_mutex() /* steals lock */ rt_mutex_set_owner(lock, stealer) <preempted> <scheduled in> rt_mutex_wait_proxy_lock() __rt_mutex_slowlock() try_to_take_rt_mutex() /* fails, lock held by stealer */ if (timeout && !timeout->task) return -ETIMEDOUT; fixup_owner() /* lock wasn't acquired, so, fixup_pi_state_owner skipped */ return -ETIMEDOUT; /* At this point, we've returned -ETIMEDOUT to userspace, but the * futex word shows waiter to be the owner, and the pi_mutex has * stealer as the owner */ futex_lock(LOCK_PI, uaddr2) -> bails with EDEADLK, futex word says we're owner. And suggested that what commit: 73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state") removes from fixup_owner() looks to be just what is needed. And indeed it is -- I completely missed that requeue_pi could also result in this case. So we need to restore that, except that subsequent patches, like commit: 16ffa12d7425 ("futex: Pull rt_mutex_futex_unlock() out from under hb->lock") changed all the locking rules. Even without that, the sequence: - if (rt_mutex_futex_trylock(&q->pi_state->pi_mutex)) { - locked = 1; - goto out; - } - raw_spin_lock_irq(&q->pi_state->pi_mutex.wait_lock); - owner = rt_mutex_owner(&q->pi_state->pi_mutex); - if (!owner) - owner = rt_mutex_next_owner(&q->pi_state->pi_mutex); - raw_spin_unlock_irq(&q->pi_state->pi_mutex.wait_lock); - ret = fixup_pi_state_owner(uaddr, q, owner); already suggests there were races; otherwise we'd never have to look at next_owner. So instead of doing 3 consecutive wait_lock sections with who knows what races, we do it all in a single section. Additionally, the usage of pi_state->owner in fixup_owner() was only safe because only the rt_mutex owner would modify it, which this additional case wrecks. Luckily the values can only change away and not to the value we're testing, this means we can do a speculative test and double check once we have the wait_lock. Fixes: 73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state") Reported-by: Julia Cartwright <julia@ni.com> Reported-by: Gratian Crisan <gratian.crisan@ni.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Julia Cartwright <julia@ni.com> Tested-by: Gratian Crisan <gratian.crisan@ni.com> Cc: Darren Hart <dvhart@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20171208124939.7livp7no2ov65rrc@hirez.programming.kicks-ass.net
2018-01-14bpf: fix divides by zeroEric Dumazet
Divides by zero are not nice, lets avoid them if possible. Also do_div() seems not needed when dealing with 32bit operands, but this seems a minor detail. Fixes: bd4cf0ed331a ("net: filter: rework/optimize internal BPF interpreter's instruction set") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-13Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixlets from Andrew Morton: "4 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: tools/objtool/Makefile: don't assume sync-check.sh is executable kdump: write correct address of mem_section into vmcoreinfo kmemleak: allow to coexist with fault injection MAINTAINERS, nilfs2: change project home URLs
2018-01-13kdump: write correct address of mem_section into vmcoreinfoKirill A. Shutemov
Depending on configuration mem_section can now be an array or a pointer to an array allocated dynamically. In most cases, we can continue to refer to it as 'mem_section' regardless of what it is. But there's one exception: '&mem_section' means "address of the array" if mem_section is an array, but if mem_section is a pointer, it would mean "address of the pointer". We've stepped onto this in kdump code. VMCOREINFO_SYMBOL(mem_section) writes down address of pointer into vmcoreinfo, not array as we wanted. Let's introduce VMCOREINFO_SYMBOL_ARRAY() that would handle the situation correctly for both cases. Link: http://lkml.kernel.org/r/20180112162532.35896-1-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Fixes: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-12workqueue: avoid hard lockups in show_workqueue_state()Sergey Senozhatsky
show_workqueue_state() can print out a lot of messages while being in atomic context, e.g. sysrq-t -> show_workqueue_state(). If the console device is slow it may end up triggering NMI hard lockup watchdog. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org # v4.5+
2018-01-12Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "A Kconfig fix, a build fix and a membarrier bug fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: membarrier: Disable preemption when calling smp_call_function_many() sched/isolation: Make CONFIG_CPU_ISOLATION=y depend on SMP or COMPILE_TEST ia64, sched/cputime: Fix build error if CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
2018-01-12Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "No functional effects intended: removes leftovers from recent lockdep and refcounts work" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/refcounts: Remove stale comment from the ARCH_HAS_REFCOUNT Kconfig entry locking/lockdep: Remove cross-release leftovers locking/Documentation: Remove stale crossrelease_fullstack parameter