summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-03-28Linux 4.9.91v4.9.91Greg Kroah-Hartman
2018-03-28bpf, x64: increase number of passesDaniel Borkmann
commit 6007b080d2e2adb7af22bf29165f0594ea12b34c upstream. In Cilium some of the main programs we run today are hitting 9 passes on x64's JIT compiler, and we've had cases already where we surpassed the limit where the JIT then punts the program to the interpreter instead, leading to insertion failures due to CONFIG_BPF_JIT_ALWAYS_ON or insertion failures due to the prog array owner being JITed but the program to insert not (both must have the same JITed/non-JITed property). One concrete case the program image shrunk from 12,767 bytes down to 10,288 bytes where the image converged after 16 steps. I've measured that this took 340us in the JIT until it converges on my i7-6600U. Thus, increase the original limit we had from day one where the JIT covered cBPF only back then before we run into the case (as similar with the complexity limit) where we trip over this and hit program rejections. Also add a cond_resched() into the compilation loop, the JIT process runs without any locks and may sleep anyway. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28bpf: skip unnecessary capability checkChenbo Feng
commit 0fa4fe85f4724fff89b09741c437cbee9cf8b008 upstream. The current check statement in BPF syscall will do a capability check for CAP_SYS_ADMIN before checking sysctl_unprivileged_bpf_disabled. This code path will trigger unnecessary security hooks on capability checking and cause false alarms on unprivileged process trying to get CAP_SYS_ADMIN access. This can be resolved by simply switch the order of the statement and CAP_SYS_ADMIN is not required anyway if unprivileged bpf syscall is allowed. Signed-off-by: Chenbo Feng <fengc@google.com> Acked-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28kbuild: disable clang's default use of -fmerge-all-constantsDaniel Borkmann
commit 87e0d4f0f37fb0c8c4aeeac46fff5e957738df79 upstream. Prasad reported that he has seen crashes in BPF subsystem with netd on Android with arm64 in the form of (note, the taint is unrelated): [ 4134.721483] Unable to handle kernel paging request at virtual address 800000001 [ 4134.820925] Mem abort info: [ 4134.901283] Exception class = DABT (current EL), IL = 32 bits [ 4135.016736] SET = 0, FnV = 0 [ 4135.119820] EA = 0, S1PTW = 0 [ 4135.201431] Data abort info: [ 4135.301388] ISV = 0, ISS = 0x00000021 [ 4135.359599] CM = 0, WnR = 0 [ 4135.470873] user pgtable: 4k pages, 39-bit VAs, pgd = ffffffe39b946000 [ 4135.499757] [0000000800000001] *pgd=0000000000000000, *pud=0000000000000000 [ 4135.660725] Internal error: Oops: 96000021 [#1] PREEMPT SMP [ 4135.674610] Modules linked in: [ 4135.682883] CPU: 5 PID: 1260 Comm: netd Tainted: G S W 4.14.19+ #1 [ 4135.716188] task: ffffffe39f4aa380 task.stack: ffffff801d4e0000 [ 4135.731599] PC is at bpf_prog_add+0x20/0x68 [ 4135.741746] LR is at bpf_prog_inc+0x20/0x2c [ 4135.751788] pc : [<ffffff94ab7ad584>] lr : [<ffffff94ab7ad638>] pstate: 60400145 [ 4135.769062] sp : ffffff801d4e3ce0 [...] [ 4136.258315] Process netd (pid: 1260, stack limit = 0xffffff801d4e0000) [ 4136.273746] Call trace: [...] [ 4136.442494] 3ca0: ffffff94ab7ad584 0000000060400145 ffffffe3a01bf8f8 0000000000000006 [ 4136.460936] 3cc0: 0000008000000000 ffffff94ab844204 ffffff801d4e3cf0 ffffff94ab7ad584 [ 4136.479241] [<ffffff94ab7ad584>] bpf_prog_add+0x20/0x68 [ 4136.491767] [<ffffff94ab7ad638>] bpf_prog_inc+0x20/0x2c [ 4136.504536] [<ffffff94ab7b5d08>] bpf_obj_get_user+0x204/0x22c [ 4136.518746] [<ffffff94ab7ade68>] SyS_bpf+0x5a8/0x1a88 Android's netd was basically pinning the uid cookie BPF map in BPF fs (/sys/fs/bpf/traffic_cookie_uid_map) and later on retrieving it again resulting in above panic. Issue is that the map was wrongly identified as a prog! Above kernel was compiled with clang 4.0, and it turns out that clang decided to merge the bpf_prog_iops and bpf_map_iops into a single memory location, such that the two i_ops could then not be distinguished anymore. Reason for this miscompilation is that clang has the more aggressive -fmerge-all-constants enabled by default. In fact, clang source code has a comment about it in lib/AST/ExprConstant.cpp on why it is okay to do so: Pointers with different bases cannot represent the same object. (Note that clang defaults to -fmerge-all-constants, which can lead to inconsistent results for comparisons involving the address of a constant; this generally doesn't matter in practice.) The issue never appeared with gcc however, since gcc does not enable -fmerge-all-constants by default and even *explicitly* states in it's option description that using this flag results in non-conforming behavior, quote from man gcc: Languages like C or C++ require each variable, including multiple instances of the same variable in recursive calls, to have distinct locations, so using this option results in non-conforming behavior. There are also various clang bug reports open on that matter [1], where clang developers acknowledge the non-conforming behavior, and refer to disabling it with -fno-merge-all-constants. But even if this gets fixed in clang today, there are already users out there that triggered this. Thus, fix this issue by explicitly adding -fno-merge-all-constants to the kernel's Makefile to generically disable this optimization, since potentially other places in the kernel could subtly break as well. Note, there is also a flag called -fmerge-constants (not supported by clang), which is more conservative and only applies to strings and it's enabled in gcc's -O/-O2/-O3/-Os optimization levels. In gcc's code, the two flags -fmerge-{all-,}constants share the same variable internally, so when disabling it via -fno-merge-all-constants, then we really don't merge any const data (e.g. strings), and text size increases with gcc (14,927,214 -> 14,942,646 for vmlinux.o). $ gcc -fverbose-asm -O2 foo.c -S -o foo.S -> foo.S lists -fmerge-constants under options enabled $ gcc -fverbose-asm -O2 -fno-merge-all-constants foo.c -S -o foo.S -> foo.S doesn't list -fmerge-constants under options enabled $ gcc -fverbose-asm -O2 -fno-merge-all-constants -fmerge-constants foo.c -S -o foo.S -> foo.S lists -fmerge-constants under options enabled Thus, as a workaround we need to set both -fno-merge-all-constants *and* -fmerge-constants in the Makefile in order for text size to stay as is. [1] https://bugs.llvm.org/show_bug.cgi?id=18538 Reported-by: Prasad Sodagudi <psodagud@codeaurora.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Chenbo Feng <fengc@google.com> Cc: Richard Smith <richard-llvm@metafoo.co.uk> Cc: Chandler Carruth <chandlerc@gmail.com> Cc: linux-kernel@vger.kernel.org Tested-by: Prasad Sodagudi <psodagud@codeaurora.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28selftests: x86: sysret_ss_attrs doesn't build on a PIE buildShuah Khan
commit 3346a6a4e5ba8c040360f753b26938cec31a4bdc upstream. sysret_ss_attrs fails to compile leading x86 test run to fail on systems configured to build using PIE by default. Add -no-pie fix it. Relocation might still fail if relocated above 4G. For now this change fixes the build and runs x86 tests. tools/testing/selftests/x86$ make gcc -m64 -o .../tools/testing/selftests/x86/single_step_syscall_64 -O2 -g -std=gnu99 -pthread -Wall single_step_syscall.c -lrt -ldl gcc -m64 -o .../tools/testing/selftests/x86/sysret_ss_attrs_64 -O2 -g -std=gnu99 -pthread -Wall sysret_ss_attrs.c thunks.S -lrt -ldl /usr/bin/ld: /tmp/ccS6pvIh.o: relocation R_X86_64_32S against `.text' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: final link failed: Nonrepresentable section on output collect2: error: ld returned 1 exit status Makefile:49: recipe for target '.../tools/testing/selftests/x86/sysret_ss_attrs_64' failed make: *** [.../tools/testing/selftests/x86/sysret_ss_attrs_64] Error 1 Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28x86/pkeys/selftests: Rename 'si_pkey' to 'siginfo_pkey'Dave Hansen
commit 91c49c2deb96ffc3c461eaae70219d89224076b7 upstream. 'si_pkey' is now #defined to be the name of the new siginfo field that protection keys uses. Rename it not to conflict. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20171111001231.DFFC8285@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28signal/testing: Don't look for __SI_FAULT in userspaceEric W. Biederman
commit d12fe87e62d773e81e0cb3a123c5a480a10d7d91 upstream. Fix the debug print statements in these tests where they reference si_codes and in particular __SI_FAULT. __SI_FAULT is a kernel internal value and should never be seen by userspace. While I am in there also fix si_code_str. si_codes are an enumeration there are not a bitmap so == and not & is the apropriate operation to test for an si_code. Cc: Dave Hansen <dave.hansen@linux.intel.com> Fixes: 5f23f6d082a9 ("x86/pkeys: Add self-tests") Fixes: e754aedc26ef ("x86/mpx, selftests: Add MPX self test") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28selftests/x86/protection_keys: Fix syscall NR redefinition warningsAndy Lutomirski
commit 693cb5580fdb026922363aa103add64b3ecd572e upstream. On new enough glibc, the pkey syscalls numbers are available. Check first before defining them to avoid warnings like: protection_keys.c:198:0: warning: "SYS_pkey_alloc" redefined Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1fbef53a9e6befb7165ff855fc1a7d4788a191d6.1509794321.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28selftests, x86, protection_keys: fix wrong offset in siginfoDave Hansen
commit 2195bff041486eb7fcceaf058acaedcd057efbdc upstream. The siginfo contains a bunch of information about the fault. For protection keys, it tells us which protection key's permissions were violated. The wrong offset in here leads to reading garbage and thus failures in the tests. We should probably eventually move this over to using the kernel's headers defining the siginfo instead of a hard-coded offset. But, for now, just do the simplest fix. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28staging: lustre: ptlrpc: kfree used instead of kvfreeNadav Amit
commit c3eec59659cf25916647d2178c541302bb4822ad upstream. rq_reqbuf is allocated using kvmalloc() but released in one occasion using kfree() instead of kvfree(). The issue was found using grep based on a similar bug. Fixes: d7e09d0397e8 ("add Lustre file system client support") Fixes: ee0ec1946ec2 ("lustre: ptlrpc: Replace uses of OBD_{ALLOC,FREE}_LARGE") Cc: Peng Tao <bergwolf@gmail.com> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: James Simmons <jsimmons@infradead.org> Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Andreas Dilger <andreas.dilger@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28iio: ABI: Fix name of timestamp sysfs fileLinus Walleij
commit b9a3589332c2a25fb7edad25a26fcaada3209126 upstream. The name of the file is "current_timetamp_clock" not "timestamp_clock". Fixes: bc2b7dab629a ("iio:core: timestamping clock selection support") Cc: Gregor Boirie <gregor.boirie@parrot.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28perf/x86/intel/uncore: Fix multi-domain PCI CHA enumeration bug on Skylake ↵Kan Liang
servers commit 320b0651f32b830add6497fcdcfdcb6ae8c7b8a0 upstream. The number of CHAs is miscalculated on multi-domain PCI Skylake server systems, resulting in an uncore driver initialization error. Gary Kroening explains: "For systems with a single PCI segment, it is sufficient to look for the bus number to change in order to determine that all of the CHa's have been counted for a single socket. However, for multi PCI segment systems, each socket is given a new segment and the bus number does NOT change. So looking only for the bus number to change ends up counting all of the CHa's on all sockets in the system. This leads to writing CPU MSRs beyond a valid range and causes an error in ivbep_uncore_msr_init_box()." To fix this bug, query the number of CHAs from the CAPID6 register: it should read bits 27:0 in the CAPID6 register located at Device 30, Function 3, Offset 0x9C. These 28 bits form a bit vector of available LLC slices and the CHAs that manage those slices. Reported-by: Kroening, Gary <gary.kroening@hpe.com> Tested-by: Kroening, Gary <gary.kroening@hpe.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: abanman@hpe.com Cc: dimitri.sivanich@hpe.com Cc: hpa@zytor.com Cc: mike.travis@hpe.com Cc: russ.anderson@hpe.com Fixes: cd34cd97b7b4 ("perf/x86/intel/uncore: Add Skylake server uncore support") Link: http://lkml.kernel.org/r/1520967094-13219-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28perf/x86/intel: Don't accidentally clear high bits in bdw_limit_period()Dan Carpenter
commit e5ea9b54a055619160bbfe527ebb7d7191823d66 upstream. We intended to clear the lowest 6 bits but because of a type bug we clear the high 32 bits as well. Andi says that periods are rarely more than U32_MAX so this bug probably doesn't have a huge runtime impact. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 294fe0f52a44 ("perf/x86/intel: Add INST_RETIRED.ALL workarounds") Link: http://lkml.kernel.org/r/20180317115216.GB4035@mwanda Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28perf stat: Fix CVS output format for non-supported countersIlya Pronin
commit 40c21898ba5372c14ef71717040529794a91ccc2 upstream. When printing stats in CSV mode, 'perf stat' appends extra separators when a counter is not supported: <not supported>,,L1-dcache-store-misses,mesos/bd442f34-2b4a-47df-b966-9b281f9f56fc,0,100.00,,,, Which causes a failure when parsing fields. The numbers of separators should be the same for each line, no matter if the counter is or not supported. Signed-off-by: Ilya Pronin <ipronin@twitter.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/20180306064353.31930-1-xiyou.wangcong@gmail.com Fixes: 92a61f6412d3 ("perf stat: Implement CSV metrics output") Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28perf/x86/intel/uncore: Fix Skylake UPI event formatKan Liang
commit 317660940fd9dddd3201c2f92e25c27902c753fa upstream. There is no event extension (bit 21) for SKX UPI, so use 'event' instead of 'event_ext'. Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: cd34cd97b7b4 ("perf/x86/intel/uncore: Add Skylake server uncore support") Link: http://lkml.kernel.org/r/1520004150-4855-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28x86/entry/64: Don't use IST entry for #BP stackAndy Lutomirski
commit d8ba61ba58c88d5207c1ba2f7d9a2280e7d03be9 upstream. There's nothing IST-worthy about #BP/int3. We don't allow kprobes in the small handful of places in the kernel that run at CPL0 with an invalid stack, and 32-bit kernels have used normal interrupt gates for #BP forever. Furthermore, we don't allow kprobes in places that have usergs while in kernel mode, so "paranoid" is also unnecessary. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28x86/boot/64: Verify alignment of the LOAD segmentH.J. Lu
commit c55b8550fa57ba4f5e507be406ff9fc2845713e8 upstream. Since the x86-64 kernel must be aligned to 2MB, refuse to boot the kernel if the alignment of the LOAD segment isn't a multiple of 2MB. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/CAMe9rOrR7xSJgUfiCoZLuqWUwymRxXPoGBW38%2BpN%3D9g%2ByKNhZw@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28x86/build/64: Force the linker to use 2MB page sizeH.J. Lu
commit e3d03598e8ae7d195af5d3d049596dec336f569f upstream. Binutils 2.31 will enable -z separate-code by default for x86 to avoid mixing code pages with data to improve cache performance as well as security. To reduce x86-64 executable and shared object sizes, the maximum page size is reduced from 2MB to 4KB. But x86-64 kernel must be aligned to 2MB. Pass -z max-page-size=0x200000 to linker to force 2MB page size regardless of the default page size used by linker. Tested with Linux kernel 4.15.6 on x86-64. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/CAMe9rOp4_%3D_8twdpTyAP2DhONOCeaTOsniJLoppzhoNptL8xzA@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28kvm/x86: fix icebp instruction handlingLinus Torvalds
commit 32d43cd391bacb5f0814c2624399a5dad3501d09 upstream. The undocumented 'icebp' instruction (aka 'int1') works pretty much like 'int3' in the absense of in-circuit probing equipment (except, obviously, that it raises #DB instead of raising #BP), and is used by some validation test-suites as such. But Andy Lutomirski noticed that his test suite acted differently in kvm than on bare hardware. The reason is that kvm used an inexact test for the icebp instruction: it just assumed that an all-zero VM exit qualification value meant that the VM exit was due to icebp. That is not unlike the guess that do_debug() does for the actual exception handling case, but it's purely a heuristic, not an absolute rule. do_debug() does it because it wants to ascribe _some_ reasons to the #DB that happened, and an empty %dr6 value means that 'icebp' is the most likely casue and we have no better information. But kvm can just do it right, because unlike the do_debug() case, kvm actually sees the real reason for the #DB in the VM-exit interruption information field. So instead of relying on an inexact heuristic, just use the actual VM exit information that says "it was 'icebp'". Right now the 'icebp' instruction isn't technically documented by Intel, but that will hopefully change. The special "privileged software exception" information _is_ actually mentioned in the Intel SDM, even though the cause of it isn't enumerated. Reported-by: Andy Lutomirski <luto@kernel.org> Tested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28selftests/x86/ptrace_syscall: Fix for yet more glibc interferenceAndy Lutomirski
commit 4b0b37d4cc54b21a6ecad7271cbc850555869c62 upstream. glibc keeps getting cleverer, and my version now turns raise() into more than one syscall. Since the test relies on ptrace seeing an exact set of syscalls, this breaks the test. Replace raise(SIGSTOP) with syscall(SYS_tgkill, ...) to force glibc to get out of our way. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kselftest@vger.kernel.org Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/bc80338b453afa187bc5f895bd8e2c8d6e264da2.1521300271.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28tty: vt: fix up tabstops properlyLinus Torvalds
commit f1869a890cdedb92a3fab969db5d0fd982850273 upstream. Tabs on a console with long lines do not wrap properly, so correctly account for the line length when computing the tab placement location. Reported-by: James Holderness <j4_james@hotmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable <stable@vger.kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28can: cc770: Fix use after free in cc770_tx_interrupt()Andri Yngvason
commit 9ffd7503944ec7c0ef41c3245d1306c221aef2be upstream. This fixes use after free introduced by the last cc770 patch. Signed-off-by: Andri Yngvason <andri.yngvason@marel.com> Fixes: 746201235b3f ("can: cc770: Fix queue stall & dropped RTR reply") Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28can: cc770: Fix queue stall & dropped RTR replyAndri Yngvason
commit 746201235b3f876792099079f4c6fea941d76183 upstream. While waiting for the TX object to send an RTR, an external message with a matching id can overwrite the TX data. In this case we must call the rx routine and then try transmitting the message that was overwritten again. The queue was being stalled because the RX event did not generate an interrupt to wake up the queue again and the TX event did not happen because the TXRQST flag is reset by the chip when new data is received. According to the CC770 datasheet the id of a message object should not be changed while the MSGVAL bit is set. This has been fixed by resetting the MSGVAL bit before modifying the object in the transmit function and setting it after. It is not enough to set & reset CPUUPD. It is important to keep the MSGVAL bit reset while the message object is being modified. Otherwise, during RTR transmission, a frame with matching id could trigger an rx-interrupt, which would cause a race condition between the interrupt routine and the transmit function. Signed-off-by: Andri Yngvason <andri.yngvason@marel.com> Tested-by: Richard Weinberger <richard@nod.at> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28can: cc770: Fix stalls on rt-linux, remove redundant IRQ ackAndri Yngvason
commit f4353daf4905c0099fd25fa742e2ffd4a4bab26a upstream. This has been reported to cause stalls on rt-linux. Suggested-by: Richard Weinberger <richard@nod.at> Tested-by: Richard Weinberger <richard@nod.at> Signed-off-by: Andri Yngvason <andri.yngvason@marel.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28can: ifi: Check core revision upon probeMarek Vasut
commit 591d65d5b15496af8d05e252bc1da611c66c0b79 upstream. Older versions of the core are not compatible with the driver due to various intrusive fixes of the core. Read out the VER register, check the core revision bitfield and verify if the core in use is new enough (rev 2.1 or newer) to work correctly with this driver. Signed-off-by: Marek Vasut <marex@denx.de> Cc: Heiko Schocher <hs@denx.de> Cc: Markus Marb <markus@marb.org> Cc: Marc Kleine-Budde <mkl@pengutronix.de> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28can: ifi: Repair the error handlingMarek Vasut
commit 880dd464b4304583c557c4e5f5ecebfd55d232b1 upstream. The new version of the IFI CANFD core has significantly less complex error state indication logic. In particular, the warning/error state bits are no longer all over the place, but are all present in the STATUS register. Moreover, there is a new IRQ register bit indicating transition between error states (active/warning/passive/busoff). This patch makes use of this bit to weed out the obscure selective INTERRUPT register clearing, which was used to carry over the error state indication into the poll function. While at it, this patch fixes the handling of the ACTIVE state, since the hardware provides indication of the core being in ACTIVE state and that in turn fixes the state transition indication toward userspace. Finally, register reads in the poll function are moved to the matching subfunctions since those are also no longer needed in the poll function. Signed-off-by: Marek Vasut <marex@denx.de> Cc: Heiko Schocher <hs@denx.de> Cc: Markus Marb <markus@marb.org> Cc: Marc Kleine-Budde <mkl@pengutronix.de> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28staging: ncpfs: memory corruption in ncp_read_kernel()Dan Carpenter
commit 4c41aa24baa4ed338241d05494f2c595c885af8f upstream. If the server is malicious then *bytes_read could be larger than the size of the "target" buffer. It would lead to memory corruption when we do the memcpy(). Reported-by: Dr Silvio Cesare of InfoSect <Silvio Cesare <silvio.cesare@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28mtd: nand: fsl_ifc: Read ECCSTAT0 and ECCSTAT1 registers for IFC 2.0Jagdish Gediya
commit 6b00c35138b404be98b85f4a703be594cbed501c upstream. Due to missing information in Hardware manual, current implementation doesn't read ECCSTAT0 and ECCSTAT1 registers for IFC 2.0. Add support to read ECCSTAT0 and ECCSTAT1 registers during ecccheck for IFC 2.0. Fixes: 656441478ed5 ("mtd: nand: ifc: Fix location of eccstat registers for IFC V1.0") Cc: stable@vger.kernel.org # v3.18+ Signed-off-by: Jagdish Gediya <jagdish.gediya@nxp.com> Reviewed-by: Prabhakar Kushwaha <prabhakar.kushwaha@nxp.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28mtd: nand: fsl_ifc: Fix eccstat array overflow for IFC ver >= 2.0.0Jagdish Gediya
commit 843c3a59997f18060848b8632607dd04781b52d1 upstream. Number of ECC status registers i.e. (ECCSTATx) has been increased in IFC version 2.0.0 due to increase in SRAM size. This is causing eccstat array to over flow. So, replace eccstat array with u32 variable to make it fail-safe and independent of number of ECC status registers or SRAM size. Fixes: bccb06c353af ("mtd: nand: ifc: update bufnum mask for ver >= 2.0.0") Cc: stable@vger.kernel.org # 3.18+ Signed-off-by: Prabhakar Kushwaha <prabhakar.kushwaha@nxp.com> Signed-off-by: Jagdish Gediya <jagdish.gediya@nxp.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28mtd: nand: fsl_ifc: Fix nand waitfunc return valueJagdish Gediya
commit fa8e6d58c5bc260f4369c6699683d69695daed0a upstream. As per the IFC hardware manual, Most significant 2 bytes in nand_fsr register are the outcome of NAND READ STATUS command. So status value need to be shifted and aligned as per the nand framework requirement. Fixes: 82771882d960 ("NAND Machine support for Integrated Flash Controller") Cc: stable@vger.kernel.org # v3.18+ Signed-off-by: Jagdish Gediya <jagdish.gediya@nxp.com> Reviewed-by: Prabhakar Kushwaha <prabhakar.kushwaha@nxp.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28mtdchar: fix usage of mtd_ooblayout_ecc()OuYang ZhiZhong
commit 6de564939e14327148e31ddcf769e34105176447 upstream. Section was not properly computed. The value of OOB region definition is always ECC section 0 information in the OOB area, but we want to get all the ECC bytes information, so we should call mtd_ooblayout_ecc(mtd, section++, &oobregion) until it returns -ERANGE. Fixes: c2b78452a9db ("mtd: use mtd_ooblayout_xxx() helpers where appropriate") Cc: <stable@vger.kernel.org> Signed-off-by: OuYang ZhiZhong <ouyzz@yealink.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28tracing: probeevent: Fix to support minus offset from symbolMasami Hiramatsu
commit c5d343b6b7badd1f5fe0873eff2e8d63a193e732 upstream. In Documentation/trace/kprobetrace.txt, it says @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol) However, the parser doesn't parse minus offset correctly, since commit 2fba0c8867af ("tracing/kprobes: Fix probe offset to be unsigned") drops minus ("-") offset support for kprobe probe address usage. This fixes the traceprobe_split_symbol_offset() to parse minus offset again with checking the offset range, and add a minus offset check in kprobe probe address usage. Link: http://lkml.kernel.org/r/152129028983.31874.13419301530285775521.stgit@devbox Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Fixes: 2fba0c8867af ("tracing/kprobes: Fix probe offset to be unsigned") Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28rtlwifi: rtl8723be: Fix loss of signalLarry Finger
commit 78dc897b7ee67205423dbbc6b56be49fb18d15b5 upstream. In commit c713fb071edc ("rtlwifi: rtl8821ae: Fix connection lost problem correctly") a problem in rtl8821ae that caused loss of signal was fixed. That same problem has now been reported for rtl8723be. Accordingly, the ASPM L1 latency has been increased from 0 to 7 to fix the instability. Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: Stable <stable@vger.kernel.org> Tested-by: James Cameron <quozl@laptop.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28brcmfmac: fix P2P_DEVICE ethernet address generationArend Van Spriel
commit 455f3e76cfc0d893585a5f358b9ddbe9c1e1e53b upstream. The firmware has a requirement that the P2P_DEVICE address should be different from the address of the primary interface. When not specified by user-space, the driver generates the MAC address for the P2P_DEVICE interface using the MAC address of the primary interface and setting the locally administered bit. However, the MAC address of the primary interface may already have that bit set causing the creation of the P2P_DEVICE interface to fail with -EBUSY. Fix this by using a random address instead to determine the P2P_DEVICE address. Cc: stable@vger.kernel.org # 3.10.y Reported-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Franky Lin <franky.lin@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28libnvdimm, {btt, blk}: do integrity setup before add_disk()Vishal Verma
commit 3ffb0ba9b567a8efb9a04ed3d1ec15ff333ada22 upstream. Prior to 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk") we needed to temporarily add a zero-capacity disk before registering for blk-integrity. But adding a zero-capacity disk caused the partition table scanning to bail early, and this resulted in partitions not coming up after a probe of the BTT or blk namespaces. We can now register for integrity before the disk has been added, and this fixes the rescan problems. Fixes: 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk") Reported-by: Dariusz Dokupil <dariusz.dokupil@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28ACPI / watchdog: Fix off-by-one error at resource assignmentTakashi Iwai
commit b1abf6fc49829d89660c961fafe3f90f3d843c55 upstream. The resource allocation in WDAT watchdog has off-one-by error, it sets one byte more than the actual end address. This may eventually lead to unexpected resource conflicts. Fixes: 058dfc767008 (ACPI / watchdog: Add support for WDAT hardware watchdog) Cc: 4.9+ <stable@vger.kernel.org> # 4.9+ Signed-off-by: Takashi Iwai <tiwai@suse.de> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28acpi, numa: fix pxm to online numa node associationsDan Williams
commit dc9e0a9347e932e3fd3cd03e7ff241022ed6ea8a upstream. Commit 99759869faf1 "acpi: Add acpi_map_pxm_to_online_node()" added support for mapping a given proximity to its nearest, by SLIT distance, online node. However, it sometimes returns unexpected results due to the fact that it switches from comparing the PXM node to the last node that was closer than the current max. for_each_online_node(n) { dist = node_distance(node, n); if (dist < min_dist) { min_dist = dist; node = n; <---- from this point we're using the wrong node for node_distance() Fixes: 99759869faf1 ("acpi: Add acpi_map_pxm_to_online_node()") Cc: <stable@vger.kernel.org> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28drm: udl: Properly check framebuffer mmap offsetsGreg Kroah-Hartman
commit 3b82a4db8eaccce735dffd50b4d4e1578099b8e8 upstream. The memmap options sent to the udl framebuffer driver were not being checked for all sets of possible crazy values. Fix this up by properly bounding the allowed values. Reported-by: Eyal Itkin <eyalit@checkpoint.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20180321154553.GA18454@kroah.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28drm/radeon: Don't turn off DP sink when disconnectedMichel Dänzer
commit 2681bc79eeb640562c932007bfebbbdc55bf6a7d upstream. Turning off the sink in this case causes various issues, because userspace expects it to stay on until it turns it off explicitly. Instead, turn the sink off and back on when a display is connected again. This dance seems necessary for link training to work correctly. Bugzilla: https://bugs.freedesktop.org/105308 Cc: stable@vger.kernel.org Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28drm/vmwgfx: Fix a destoy-while-held mutex problem.Thomas Hellstrom
commit 73a88250b70954a8f27c2444e1c2411bba3c29d9 upstream. When validating legacy surfaces, the backup bo might be destroyed at surface validate time. However, the kms resource validation code may have the bo reserved, so we will destroy a locked mutex. While there shouldn't be any other users of that mutex when it is destroyed, it causes a lock leak and thus throws a lockdep error. Fix this by having the kms resource validation code hold a reference to the bo while we have it reserved. We do this by introducing a validation context which might come in handy when the kms code is extended to validate multiple resources or buffers. Cc: <stable@vger.kernel.org> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28mm/shmem: do not wait for lock_page() in shmem_unused_huge_shrink()Kirill A. Shutemov
commit b3cd54b257ad95d344d121dc563d943ca39b0921 upstream. shmem_unused_huge_shrink() gets called from reclaim path. Waiting for page lock may lead to deadlock there. There was a bug report that may be attributed to this: http://lkml.kernel.org/r/alpine.LRH.2.11.1801242349220.30642@mail.ewheeler.net Replace lock_page() with trylock_page() and skip the page if we failed to lock it. We will get to the page on the next scan. We can test for the PageTransHuge() outside the page lock as we only need protection against splitting the page under us. Holding pin oni the page is enough for this. Link: http://lkml.kernel.org/r/20180316210830.43738-1-kirill.shutemov@linux.intel.com Fixes: 779750d20b93 ("shmem: split huge pages beyond i_size under memory pressure") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Eric Wheeler <linux-mm@lists.ewheeler.net> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> [4.8+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28mm/thp: do not wait for lock_page() in deferred_split_scan()Kirill A. Shutemov
commit fa41b900c30b45fab03783724932dc30cd46a6be upstream. deferred_split_scan() gets called from reclaim path. Waiting for page lock may lead to deadlock there. Replace lock_page() with trylock_page() and skip the page if we failed to lock it. We will get to the page on the next scan. Link: http://lkml.kernel.org/r/20180315150747.31945-1-kirill.shutemov@linux.intel.com Fixes: 9a982250f773 ("thp: introduce deferred_split_huge_page()") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28mm/khugepaged.c: convert VM_BUG_ON() to collapse failKirill A. Shutemov
commit fece2029a9e65b9a990831afe2a2b83290cbbe26 upstream. khugepaged is not yet able to convert PTE-mapped huge pages back to PMD mapped. We do not collapse such pages. See check khugepaged_scan_pmd(). But if between khugepaged_scan_pmd() and __collapse_huge_page_isolate() somebody managed to instantiate THP in the range and then split the PMD back to PTEs we would have a problem -- VM_BUG_ON_PAGE(PageCompound(page)) will get triggered. It's possible since we drop mmap_sem during collapse to re-take for write. Replace the VM_BUG_ON() with graceful collapse fail. Link: http://lkml.kernel.org/r/20180315152353.27989-1-kirill.shutemov@linux.intel.com Fixes: b1caa957ae6d ("khugepaged: ignore pmd tables with THP mapped with ptes") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28x86/mm: implement free pmd/pte page interfacesToshi Kani
commit 28ee90fe6048fa7b7ceaeb8831c0e4e454a4cf89 upstream. Implement pud_free_pmd_page() and pmd_free_pte_page() on x86, which clear a given pud/pmd entry and free up lower level page table(s). The address range associated with the pud/pmd entry must have been purged by INVLPG. Link: http://lkml.kernel.org/r/20180314180155.19492-3-toshi.kani@hpe.com Fixes: e61ce6ade404e ("mm: change ioremap to set up huge I/O mappings") Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Reported-by: Lei Li <lious.lilei@hisilicon.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28mm/vmalloc: add interfaces to free unmapped page tableToshi Kani
commit b6bdb7517c3d3f41f20e5c2948d6bc3f8897394e upstream. On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap() may create pud/pmd mappings. A kernel panic was observed on arm64 systems with Cortex-A75 in the following steps as described by Hanjun Guo. 1. ioremap a 4K size, valid page table will build, 2. iounmap it, pte0 will set to 0; 3. ioremap the same address with 2M size, pgd/pmd is unchanged, then set the a new value for pmd; 4. pte0 is leaked; 5. CPU may meet exception because the old pmd is still in TLB, which will lead to kernel panic. This panic is not reproducible on x86. INVLPG, called from iounmap, purges all levels of entries associated with purged address on x86. x86 still has memory leak. The patch changes the ioremap path to free unmapped page table(s) since doing so in the unmap path has the following issues: - The iounmap() path is shared with vunmap(). Since vmap() only supports pte mappings, making vunmap() to free a pte page is an overhead for regular vmap users as they do not need a pte page freed up. - Checking if all entries in a pte page are cleared in the unmap path is racy, and serializing this check is expensive. - The unmap path calls free_vmap_area_noflush() to do lazy TLB purges. Clearing a pud/pmd entry before the lazy TLB purges needs extra TLB purge. Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(), which clear a given pud/pmd entry and free up a page for the lower level entries. This patch implements their stub functions on x86 and arm64, which work as workaround. [akpm@linux-foundation.org: fix typo in pmd_free_pte_page() stub] Link: http://lkml.kernel.org/r/20180314180155.19492-2-toshi.kani@hpe.com Fixes: e61ce6ade404e ("mm: change ioremap to set up huge I/O mappings") Reported-by: Lei Li <lious.lilei@hisilicon.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Wang Xuefeng <wxf.wang@hisilicon.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28nfsd: remove blocked locks on client teardownJeff Layton
commit 68ef3bc3166468678d5e1fdd216628c35bd1186f upstream. We had some reports of panics in nfsd4_lm_notify, and that showed a nfs4_lockowner that had outlived its so_client. Ensure that we walk any leftover lockowners after tearing down all of the stateids, and remove any blocked locks that they hold. With this change, we also don't need to walk the nbl_lru on nfsd_net shutdown, as that will happen naturally when we tear down the clients. Fixes: 76d348fadff5 (nfsd: have nfsd4_lock use blocking locks for v4.1+ locks) Reported-by: Frank Sorenson <fsorenso@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: stable@vger.kernel.org # 4.9 Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28libata: Modify quirks for MX100 to limit NCQ_TRIM quirk to MU01 versionHans de Goede
commit d418ff56b8f2d2b296daafa8da151fe27689b757 upstream. When commit 9c7be59fc519af ("libata: Apply NOLPM quirk to Crucial MX100 512GB SSDs") was added it inherited the ATA_HORKAGE_NO_NCQ_TRIM quirk from the existing "Crucial_CT*MX100*" entry, but that entry sets model_rev to "MU01", where as the entry adding the NOLPM quirk sets it to NULL. This means that after this commit we no apply the NO_NCQ_TRIM quirk to all "Crucial_CT512MX100*" SSDs even if they have the fixed "MU02" firmware. This commit splits the "Crucial_CT512MX100*" quirk into 2 quirks, one for the "MU01" firmware and one for all other firmware versions, so that we once again only apply the NO_NCQ_TRIM quirk to the "MU01" firmware version. Fixes: 9c7be59fc519af ("libata: Apply NOLPM quirk to ... MX100 512GB SSDs") Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28libata: Make Crucial BX100 500GB LPM quirk apply to all firmware versionsHans de Goede
commit 3bf7b5d6d017c27e0d3b160aafb35a8e7cfeda1f upstream. Commit b17e5729a630 ("libata: disable LPM for Crucial BX100 SSD 500GB drive"), introduced a ATA_HORKAGE_NOLPM quirk for Crucial BX100 500GB SSDs but limited this to the MU02 firmware version, according to: http://www.crucial.com/usa/en/support-ssd-firmware MU02 is the last version, so there are no newer possibly fixed versions and if the MU02 version has broken LPM then the MU01 almost certainly also has broken LPM, so this commit changes the quirk to apply to all firmware versions. Fixes: b17e5729a630 ("libata: disable LPM for Crucial BX100 SSD 500GB...") Cc: stable@vger.kernel.org Cc: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28libata: Apply NOLPM quirk to Crucial M500 480 and 960GB SSDsHans de Goede
commit 62ac3f7305470e3f52f159de448bc1a771717e88 upstream. There have been reports of the Crucial M500 480GB model not working with LPM set to min_power / med_power_with_dipm level. It has not been tested with medium_power, but that typically has no measurable power-savings. Note the reporters Crucial_CT480M500SSD3 has a firmware version of MU03 and there is a MU05 update available, but that update does not mention any LPM fixes in its changelog, so the quirk matches all firmware versions. In my experience the LPM problems with (older) Crucial SSDs seem to be limited to higher capacity versions of the SSDs (different firmware?), so this commit adds a NOLPM quirk for the 480 and 960GB versions of the M500, to avoid LPM causing issues with these SSDs. Cc: stable@vger.kernel.org Reported-and-tested-by: Martin Steigerwald <martin@lichtvoll.de> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28libata: Enable queued TRIM for Samsung SSD 860Ju Hyung Park
commit ca6bfcb2f6d9deab3924bf901e73622a94900473 upstream. Samsung explicitly states that queued TRIM is supported for Linux with 860 PRO and 860 EVO. Make the previous blacklist to cover only 840 and 850 series. Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>