summaryrefslogtreecommitdiff
path: root/kernel/trace/trace_kprobe.c
AgeCommit message (Collapse)Author
2018-03-28tracing: probeevent: Fix to support minus offset from symbolMasami Hiramatsu
commit c5d343b6b7badd1f5fe0873eff2e8d63a193e732 upstream. In Documentation/trace/kprobetrace.txt, it says @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol) However, the parser doesn't parse minus offset correctly, since commit 2fba0c8867af ("tracing/kprobes: Fix probe offset to be unsigned") drops minus ("-") offset support for kprobe probe address usage. This fixes the traceprobe_split_symbol_offset() to parse minus offset again with checking the offset range, and add a minus offset check in kprobe probe address usage. Link: http://lkml.kernel.org/r/152129028983.31874.13419301530285775521.stgit@devbox Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Fixes: 2fba0c8867af ("tracing/kprobes: Fix probe offset to be unsigned") Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-12tracing/kprobes: Allow to create probe with a module name starting with a digitSabrina Dubroca
commit 9e52b32567126fe146f198971364f68d3bc5233f upstream. Always try to parse an address, since kstrtoul() will safely fail when given a symbol as input. If that fails (which will be the case for a symbol), try to parse a symbol instead. This allows creating a probe such as: p:probe/vlan_gro_receive 8021q:vlan_gro_receive+0 Which is necessary for this command to work: perf probe -m 8021q -a vlan_gro_receive Link: http://lkml.kernel.org/r/fd72d666f45b114e2c5b9cf7e27b91de1ec966f1.1498122881.git.sd@queasysnail.net Fixes: 413d37d1e ("tracing: Add kprobe-based event tracer") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25tracing/kprobes: Enforce kprobes teardown after testingThomas Gleixner
commit 30e7d894c1478c88d50ce94ddcdbd7f9763d9cdd upstream. Enabling the tracer selftest triggers occasionally the warning in text_poke(), which warns when the to be modified page is not marked reserved. The reason is that the tracer selftest installs kprobes on functions marked __init for testing. These probes are removed after the tests, but that removal schedules the delayed kprobes_optimizer work, which will do the actual text poke. If the work is executed after the init text is freed, then the warning triggers. The bug can be reproduced reliably when the work delay is increased. Flush the optimizer work and wait for the optimizing/unoptimizing lists to become empty before returning from the kprobes tracer selftest. That ensures that all operations which were queued due to the probes removal have completed. Link: http://lkml.kernel.org/r/20170516094802.76a468bb@gandalf.local.home Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 6274de498 ("kprobes: Support delayed unoptimizing") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-23ftrace: kprobe: uprobe: Add x8/x16/x32/x64 for hexadecimal typesMasami Hiramatsu
Add x8/x16/x32/x64 for hexadecimal type casting to kprobe/uprobe event tracer. These type casts can be used for integer arguments for explicitly showing them in hexadecimal digits in formatted text. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Naohiro Aota <naohiro.aota@hgst.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/147151067029.12957.11591314629326414783.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-06-20tracing: expose current->comm to [ku]probe eventsOmar Sandoval
ftrace is very quick to give up on saving the task command line (see `trace_save_cmdline()`). The workaround for events which really care about the command line is to explicitly assign it as part of the entry. However, this doesn't work for kprobe events, as there's no straightforward way to get access to current->comm. Add a kprobe/uprobe event variable $comm which provides exactly that. Link: http://lkml.kernel.org/r/f59b472033b943a370f5f48d0af37698f409108f.1465435894.git.osandov@fb.com Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-07perf: split perf_trace_buf_prepare into alloc and update partsAlexei Starovoitov
split allows to move expensive update of 'struct trace_entry' to later phase. Repurpose unused 1st argument of perf_tp_event() to indicate event type. While splitting use temp variable 'rctx' instead of '*rctx' to avoid unnecessary loads done by the compiler due to -fno-strict-aliasing Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-22kernel/...: convert pr_warning to pr_warnJoe Perches
Use the more common logging method with the eventual goal of removing pr_warning altogether. Miscellanea: - Realign arguments - Coalesce formats - Add missing space between a few coalesced formats Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [kernel/power/suspend.c] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-09kprobes: Optimize hot path by using percpu counter to collect 'nhit' statisticsMartin KaFai Lau
When doing ebpf+kprobe on some hot TCP functions (e.g. tcp_rcv_established), the kprobe_dispatcher() function shows up in 'perf report'. In kprobe_dispatcher(), there is a lot of cache bouncing on 'tk->nhit++'. 'tk->nhit' and 'tk->tp.flags' also share the same cacheline. perf report (cycles:pp): 8.30% ipv4_dst_check 4.74% copy_user_enhanced_fast_string 3.93% dst_release 2.80% tcp_v4_rcv 2.31% queued_spin_lock_slowpath 2.30% _raw_spin_lock 1.88% mlx4_en_process_rx_cq 1.84% eth_get_headlen 1.81% ip_rcv_finish ~~~~ 1.71% kprobe_dispatcher ~~~~ 1.55% mlx4_en_xmit 1.09% __probe_kernel_read perf report after patch: 9.15% ipv4_dst_check 5.00% copy_user_enhanced_fast_string 4.12% dst_release 2.96% tcp_v4_rcv 2.50% _raw_spin_lock 2.39% queued_spin_lock_slowpath 2.11% eth_get_headlen 2.03% mlx4_en_process_rx_cq 1.69% mlx4_en_xmit 1.19% ip_rcv_finish 1.12% __probe_kernel_read 1.02% ehci_hcd_cleanup Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Josef Bacik <jbacik@fb.com> Cc: Kernel Team <kernel-team@fb.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1454531308-2441898-1-git-send-email-kafai@fb.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-28lib: introduce strncpy_from_unsafe()Alexei Starovoitov
generalize FETCH_FUNC_NAME(memory, string) into strncpy_from_unsafe() and fix sparse warnings that were present in original implementation. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13tracing: Rename ftrace_trigger_soft_disabled() to trace_trigger_soft_disabled()Steven Rostedt (Red Hat)
The name "ftrace" really refers to the function hook infrastructure. It is not about the trace_events. The ftrace_trigger_soft_disabled() tests if a trace_event is soft disabled (called but not traced), and returns true if it is. It has nothing to do with function tracing and should be renamed. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-13tracing: Rename ftrace_event_name() to trace_event_name()Steven Rostedt (Red Hat)
The name "ftrace" really refers to the function hook infrastructure. It is not about the trace_events. ftrace_event_name() returns the name of an event tracepoint, has nothing to do with function tracing. Rename it to trace_event_name(). Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-13tracing: Rename ftrace_event_{call,class} to trace_event_{call,class}Steven Rostedt (Red Hat)
The name "ftrace" really refers to the function hook infrastructure. It is not about the trace_events. The structures ftrace_event_call and ftrace_event_class have nothing to do with the function hooks, and are really trace_event structures. Rename ftrace_event_* to trace_event_*. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-13tracing: Rename ftrace_event_file to trace_event_fileSteven Rostedt (Red Hat)
The name "ftrace" really refers to the function hook infrastructure. It is not about the trace_events. The structure ftrace_event_file is really about trace events and not "ftrace". Rename it to trace_event_file. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-13tracing: Rename (un)register_ftrace_event() to (un)register_trace_event()Steven Rostedt (Red Hat)
The name "ftrace" really refers to the function hook infrastructure. It is not about the trace_events. The functions (un)register_ftrace_event() is really about trace_events, and the name should be register_trace_event() instead. Also renamed ftrace_event_reg() to trace_event_reg() for the same reason. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-04-14Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf changes from Ingo Molnar: "Core kernel changes: - One of the more interesting features in this cycle is the ability to attach eBPF programs (user-defined, sandboxed bytecode executed by the kernel) to kprobes. This allows user-defined instrumentation on a live kernel image that can never crash, hang or interfere with the kernel negatively. (Right now it's limited to root-only, but in the future we might allow unprivileged use as well.) (Alexei Starovoitov) - Another non-trivial feature is per event clockid support: this allows, amongst other things, the selection of different clock sources for event timestamps traced via perf. This feature is sought by people who'd like to merge perf generated events with external events that were measured with different clocks: - cluster wide profiling - for system wide tracing with user-space events, - JIT profiling events etc. Matching perf tooling support is added as well, available via the -k, --clockid <clockid> parameter to perf record et al. (Peter Zijlstra) Hardware enablement kernel changes: - x86 Intel Processor Trace (PT) support: which is a hardware tracer on steroids, available on Broadwell CPUs. The hardware trace stream is directly output into the user-space ring-buffer, using the 'AUX' data format extension that was added to the perf core to support hardware constraints such as the necessity to have the tracing buffer physically contiguous. This patch-set was developed for two years and this is the result. A simple way to make use of this is to use BTS tracing, the PT driver emulates BTS output - available via the 'intel_bts' PMU. More explicit PT specific tooling support is in the works as well - will probably be ready by 4.2. (Alexander Shishkin, Peter Zijlstra) - x86 Intel Cache QoS Monitoring (CQM) support: this is a hardware feature of Intel Xeon CPUs that allows the measurement and allocation/partitioning of caches to individual workloads. These kernel changes expose the measurement side as a new PMU driver, which exposes various QoS related PMU events. (The partitioning change is work in progress and is planned to be merged as a cgroup extension.) (Matt Fleming, Peter Zijlstra; CPU feature detection by Peter P Waskiewicz Jr) - x86 Intel Haswell LBR call stack support: this is a new Haswell feature that allows the hardware recording of call chains, plus tooling support. To activate this feature you have to enable it via the new 'lbr' call-graph recording option: perf record --call-graph lbr perf report or: perf top --call-graph lbr This hardware feature is a lot faster than stack walk or dwarf based unwinding, but has some limitations: - It reuses the current LBR facility, so LBR call stack and branch record can not be enabled at the same time. - It is only available for user-space callchains. (Yan, Zheng) - x86 Intel Broadwell CPU support and various event constraints and event table fixes for earlier models. (Andi Kleen) - x86 Intel HT CPUs event scheduling workarounds. This is a complex CPU bug affecting the SNB,IVB,HSW families that results in counter value corruption. The mitigation code is automatically enabled and is transparent. (Maria Dimakopoulou, Stephane Eranian) The perf tooling side had a ton of changes in this cycle as well, so I'm only able to list the user visible changes here, in addition to the tooling changes outlined above: User visible changes affecting all tools: - Improve support of compressed kernel modules (Jiri Olsa) - Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo) - Bash completion for subcommands (Yunlong Song) - Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa) - Support missing -f to override perf.data file ownership. (Yunlong Song) - Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo) User visible changes in individual tools: 'perf data': New tool for converting perf.data to other formats, initially for the CTF (Common Trace Format) from LTTng (Jiri Olsa, Sebastian Siewior) 'perf diff': Add --kallsyms option (David Ahern) 'perf list': Allow listing events with 'tracepoint' prefix (Yunlong Song) Sort the output of the command (Yunlong Song) 'perf kmem': Respect -i option (Jiri Olsa) Print big numbers using thousands' group (Namhyung Kim) Allow -v option (Namhyung Kim) Fix alignment of slab result table (Namhyung Kim) 'perf probe': Support multiple probes on different binaries on the same command line (Masami Hiramatsu) Support unnamed union/structure members data collection. (Masami Hiramatsu) Check kprobes blacklist when adding new events. (Masami Hiramatsu) 'perf record': Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra) Support recording running/enabled time (Andi Kleen) 'perf sched': Improve the performance of 'perf sched replay' on high CPU core count machines (Yunlong Song) 'perf report' and 'perf top': Allow annotating entries in callchains in the hists browser (Arnaldo Carvalho de Melo) Indicate which callchain entries are annotated in the TUI hists browser (Arnaldo Carvalho de Melo) Add pid/tid filtering to 'report' and 'script' commands (David Ahern) Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT events (Arnaldo Carvalho de Melo) 'perf stat': Report unsupported events properly (Suzuki K. Poulose) Output running time and run/enabled ratio in CSV mode (Andi Kleen) 'perf trace': Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo) Only insert blank duration bracket when tracing syscalls (Arnaldo Carvalho de Melo) Filter out the trace pid when no threads are specified (Arnaldo Carvalho de Melo) Dump stack on segfaults (Arnaldo Carvalho de Melo) No need to explicitely enable evsels for workload started from perf, let it be enabled via perf_event_attr.enable_on_exec, removing some events that take place in the 'perf trace' before a workload is really started by it. (Arnaldo Carvalho de Melo) Allow mixing with tracepoints and suppressing plain syscalls. (Arnaldo Carvalho de Melo) There's also been a ton of infrastructure work done, such as the split-out of perf's build system into tools/build/ and other changes - see the shortlog and changelog for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (358 commits) perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init() perf evlist: Fix type for references to data_head/tail perf probe: Check the orphaned -x option perf probe: Support multiple probes on different binaries perf buildid-list: Fix segfault when show DSOs with hits perf tools: Fix cross-endian analysis perf tools: Fix error path to do closedir() when synthesizing threads perf tools: Fix synthesizing fork_event.ppid for non-main thread perf tools: Add 'I' event modifier for exclude_idle bit perf report: Don't call map__kmap if map is NULL. perf tests: Fix attr tests perf probe: Fix ARM 32 building error perf tools: Merge all perf_event_attr print functions perf record: Add clockid parameter perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10 perf sched replay: Support using -f to override perf.data file ownership perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task perf sched replay: Fix the segmentation fault problem caused by pr_err in threads perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations ...
2015-04-14Merge tag 'trace-v4.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "Some clean ups and small fixes, but the biggest change is the addition of the TRACE_DEFINE_ENUM() macro that can be used by tracepoints. Tracepoints have helper functions for the TP_printk() called __print_symbolic() and __print_flags() that lets a numeric number be displayed as a a human comprehensible text. What is placed in the TP_printk() is also shown in the tracepoint format file such that user space tools like perf and trace-cmd can parse the binary data and express the values too. Unfortunately, the way the TRACE_EVENT() macro works, anything placed in the TP_printk() will be shown pretty much exactly as is. The problem arises when enums are used. That's because unlike macros, enums will not be changed into their values by the C pre-processor. Thus, the enum string is exported to the format file, and this makes it useless for user space tools. The TRACE_DEFINE_ENUM() solves this by converting the enum strings in the TP_printk() format into their number, and that is what is shown to user space. For example, the tracepoint tlb_flush currently has this in its format file: __print_symbolic(REC->reason, { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" }, { TLB_REMOTE_SHOOTDOWN, "remote shootdown" }, { TLB_LOCAL_SHOOTDOWN, "local shootdown" }, { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" }) After adding: TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH); TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN); TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN); TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN); Its format file will contain this: __print_symbolic(REC->reason, { 0, "flush on task switch" }, { 1, "remote shootdown" }, { 2, "local shootdown" }, { 3, "local mm shootdown" })" * tag 'trace-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (27 commits) tracing: Add enum_map file to show enums that have been mapped writeback: Export enums used by tracepoint to user space v4l: Export enums used by tracepoints to user space SUNRPC: Export enums in tracepoints to user space mm: tracing: Export enums in tracepoints to user space irq/tracing: Export enums in tracepoints to user space f2fs: Export the enums in the tracepoints to userspace net/9p/tracing: Export enums in tracepoints to userspace x86/tlb/trace: Export enums in used by tlb_flush tracepoint tracing/samples: Update the trace-event-sample.h with TRACE_DEFINE_ENUM() tracing: Allow for modules to convert their enums to values tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values tracing: Update trace-event-sample with TRACE_SYSTEM_VAR documentation tracing: Give system name a pointer brcmsmac: Move each system tracepoints to their own header iwlwifi: Move each system tracepoints to their own header mac80211: Move message tracepoints to their own header tracing: Add TRACE_SYSTEM_VAR to xhci-hcd tracing: Add TRACE_SYSTEM_VAR to kvm-s390 tracing: Add TRACE_SYSTEM_VAR to intel-sst ...
2015-04-14Merge tag 'trace-4.1-tracefs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracefs from Steven Rostedt: "This adds the new tracefs file system. This has been in linux-next for more than one release, as I had it ready for the 4.0 merge window, but a last minute thing that needed to go into Linux first had to be done. That was that perf hard coded the file system number when reading /sys/kernel/debugfs/tracing directory making sure that the path had the debugfs mount # before it would parse the tracing file. This broke other use cases of perf, and the check is removed. Now when mounting /sys/kernel/debug, tracefs is automatically mounted in /sys/kernel/debug/tracing such that old tools will still see that path as expected. But now system admins can mount tracefs directly and not need to mount debugfs, which can expose security issues. A new directory is created when tracefs is configured such that system admins can now mount it separately (/sys/kernel/tracing)" * tag 'trace-4.1-tracefs' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Have mkdir and rmdir be part of tracefs tracefs: Add directory /sys/kernel/tracing tracing: Automatically mount tracefs on debugfs/tracing tracing: Convert the tracing facility over to use tracefs tracefs: Add new tracefs file system tracing: Create cmdline tracer options on tracing fs init tracing: Only create tracer options files if directory exists debugfs: Provide a file creation function that also takes an initial size
2015-04-02tracing, perf: Implement BPF programs attached to kprobesAlexei Starovoitov
BPF programs, attached to kprobes, provide a safe way to execute user-defined BPF byte-code programs without being able to crash or hang the kernel in any way. The BPF engine makes sure that such programs have a finite execution time and that they cannot break out of their sandbox. The user interface is to attach to a kprobe via the perf syscall: struct perf_event_attr attr = { .type = PERF_TYPE_TRACEPOINT, .config = event_id, ... }; event_fd = perf_event_open(&attr,...); ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd); 'prog_fd' is a file descriptor associated with BPF program previously loaded. 'event_id' is an ID of the kprobe created. Closing 'event_fd': close(event_fd); ... automatically detaches BPF program from it. BPF programs can call in-kernel helper functions to: - lookup/update/delete elements in maps - probe_read - wraper of probe_kernel_read() used to access any kernel data structures BPF programs receive 'struct pt_regs *' as an input ('struct pt_regs' is architecture dependent) and return 0 to ignore the event and 1 to store kprobe event into the ring buffer. Note, kprobes are a fundamentally _not_ a stable kernel ABI, so BPF programs attached to kprobes must be recompiled for every kernel version and user must supply correct LINUX_VERSION_CODE in attr.kern_version during bpf_prog_load() call. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1427312966-8434-4-git-send-email-ast@plumgrid.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02tracing: Add kprobe flagAlexei Starovoitov
add TRACE_EVENT_FL_KPROBE flag to differentiate kprobe type of tracepoints, since bpf programs can only be attached to kprobe type of PERF_TYPE_TRACEPOINT perf events. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1427312966-8434-3-git-send-email-ast@plumgrid.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-25trace: Don't use __weak in header filesStephen Rothwell
The commit that added a check for this to checkpatch says: "Using weak declarations can have unintended link defects. The __weak on the declaration causes non-weak definitions to become weak." In this case, when a PowerPC kernel is built with CONFIG_KPROBE_EVENT but not CONFIG_UPROBE_EVENT, it generates the following warning: WARNING: 1 bad relocations c0000000014f2190 R_PPC64_ADDR64 uprobes_fetch_type_table This is fixed by passing the fetch_table arrays to traceprobe_parse_probe_arg() which also means that they can never be NULL. Link: http://lkml.kernel.org/r/20150312165834.4482cb48@canb.auug.org.au Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-02-12Merge tag 'trace-v3.20' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "The updates included in this pull request for ftrace are: o Several clean ups to the code One such clean up was to convert to 64 bit time keeping, in the ring buffer benchmark code. o Adding of __print_array() helper macro for TRACE_EVENT() o Updating the sample/trace_events/ to add samples of different ways to make trace events. Lots of features have been added since the sample code was made, and these features are mostly unknown. Developers have been making their own hacks to do things that are already available. o Performance improvements. Most notably, I found a performance bug where a waiter that is waiting for a full page from the ring buffer will see that a full page is not available, and go to sleep. The sched event caused by it going to sleep would cause it to wake up again. It would see that there was still not a full page, and go back to sleep again, and that would wake it up again, until finally it would see a full page. This change has been marked for stable. Other improvements include removing global locks from fast paths" * tag 'trace-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ring-buffer: Do not wake up a splice waiter when page is not full tracing: Fix unmapping loop in tracing_mark_write tracing: Add samples of DECLARE_EVENT_CLASS() and DEFINE_EVENT() tracing: Add TRACE_EVENT_FN example tracing: Add TRACE_EVENT_CONDITION sample tracing: Update the TRACE_EVENT fields available in the sample code tracing: Separate out initializing top level dir from instances tracing: Make tracing_init_dentry_tr() static trace: Use 64-bit timekeeping tracing: Add array printing helper tracing: Remove newline from trace_printk warning banner tracing: Use IS_ERR() check for return value of tracing_init_dentry() tracing: Remove unneeded includes of debugfs.h and fs.h tracing: Remove taking of trace_types_lock in pipe files tracing: Add ref count to tracer for when they are being read by pipe
2015-02-03tracing: Convert the tracing facility over to use tracefsSteven Rostedt (Red Hat)
debugfs was fine for the tracing facility as a quick way to get an interface. Now that tracing has matured, it should separate itself from debugfs such that it can be mounted separately without needing to mount all of debugfs with it. That is, users resist using tracing because it requires mounting debugfs. Having tracing have its own file system lets users get the features of tracing without needing to bring in the rest of the kernel's debug infrastructure. Another reason for tracefs is that debubfs does not support mkdir. Currently, to create instances, one does a mkdir in the tracing/instance directory. This is implemented via a hack that forces debugfs to do something it is not intended on doing. By converting over to tracefs, this hack can be removed and mkdir can be properly implemented. This patch does not address this yet, but it lays the ground work for that to be done. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-01-22tracing: Use IS_ERR() check for return value of tracing_init_dentry()Steven Rostedt (Red Hat)
tracing_init_dentry() will soon return NULL as a valid pointer for the top level tracing directroy. NULL can not be used as an error value. Instead, switch to ERR_PTR() and check the return status with IS_ERR(). Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-01-14perf: Avoid horrible stack usagePeter Zijlstra (Intel)
Both Linus (most recent) and Steve (a while ago) reported that perf related callbacks have massive stack bloat. The problem is that software events need a pt_regs in order to properly report the event location and unwind stack. And because we could not assume one was present we allocated one on stack and filled it with minimal bits required for operation. Now, pt_regs is quite large, so this is undesirable. Furthermore it turns out that most sites actually have a pt_regs pointer available, making this even more onerous, as the stack space is pointless waste. This patch addresses the problem by observing that software events have well defined nesting semantics, therefore we can use static per-cpu storage instead of on-stack. Linus made the further observation that all but the scheduler callers of perf_sw_event() have a pt_regs available, so we change the regular perf_sw_event() to require a valid pt_regs (where it used to be optional) and add perf_sw_event_sched() for the scheduler. We have a scheduler specific call instead of a more generic _noregs() like construct because we can assume non-recursion from the scheduler and thereby simplify the code further (_noregs would have to put the recursion context call inline in order to assertain which __perf_regs element to use). One last note on the implementation of perf_trace_buf_prepare(); we allow .regs = NULL for those cases where we already have a pt_regs pointer available and do not need another. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Javi Merino <javi.merino@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Petr Mladek <pmladek@suse.cz> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Vaibhav Nagarnaik <vnagarnaik@google.com> Link: http://lkml.kernel.org/r/20141216115041.GW3337@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-19kprobes/tracing: Use trace_seq_has_overflowed() for overflow checksSteven Rostedt (Red Hat)
Instead of checking the return value of trace_seq_printf() and friends for overflowing of the buffer, use the trace_seq_has_overflowed() helper function. This cleans up the code quite a bit and also takes us a step closer to changing the return values of trace_seq_printf() and friends to void. Link: http://lkml.kernel.org/r/20141114011411.181812785@goodmis.org Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Reviewed-by: Petr Mladek <pmladek@suse.cz> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-14trace: Replace single-character seq_puts with seq_putcRasmus Villemoes
Printing a single character to a seqfile might as well be done with seq_putc instead of seq_puts; this avoids a strlen() call and a memory access. It also shaves another few bytes off the generated code. Link: http://lkml.kernel.org/r/1415479332-25944-4-git-send-email-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-13tracing: Replace seq_printf by simpler equivalentsRasmus Villemoes
Using seq_printf to print a simple string or a single character is a lot more expensive than it needs to be, since seq_puts and seq_putc exist. These patches do seq_printf(m, s) -> seq_puts(m, s) seq_printf(m, "%s", s) -> seq_puts(m, s) seq_printf(m, "%c", c) -> seq_putc(m, c) Subsequent patches will simplify further. Link: http://lkml.kernel.org/r/1415479332-25944-2-git-send-email-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-12Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more perf updates from Ingo Molnar: "A second round of perf updates: - wide reaching kprobes sanitization and robustization, with the hope of fixing all 'probe this function crashes the kernel' bugs, by Masami Hiramatsu. - uprobes updates from Oleg Nesterov: tmpfs support, corner case fixes and robustization work. - perf tooling updates and fixes from Jiri Olsa, Namhyung Ki, Arnaldo et al: * Add support to accumulate hist periods (Namhyung Kim) * various fixes, refactorings and enhancements" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits) perf: Differentiate exec() and non-exec() comm events perf: Fix perf_event_comm() vs. exec() assumption uprobes/x86: Rename arch_uprobe->def to ->defparam, minor comment updates perf/documentation: Add description for conditional branch filter perf/x86: Add conditional branch filtering support perf/tool: Add conditional branch filter 'cond' to perf record perf: Add new conditional branch filter 'PERF_SAMPLE_BRANCH_COND' uprobes: Teach copy_insn() to support tmpfs uprobes: Shift ->readpage check from __copy_insn() to uprobe_register() perf/x86: Use common PMU interrupt disabled code perf/ARM: Use common PMU interrupt disabled code perf: Disable sampled events if no PMU interrupt perf: Fix use after free in perf_remove_from_context() perf tools: Fix 'make help' message error perf record: Fix poll return value propagation perf tools: Move elide bool into perf_hpp_fmt struct perf tools: Remove elide setup for SORT_MODE__MEMORY mode perf tools: Fix "==" into "=" in ui_browser__warning assignment perf tools: Allow overriding sysfs and proc finding with env var perf tools: Consider header files outside perf directory in tags target ...
2014-06-06tracing/kprobes: Avoid self tests if tracing is disabled on boot upYoshihiro YUNOMAE
If tracing is disabled on boot up, the kernel should not execute tracing self tests. The kernel should check whether tracing is disabled or not before executing any of the tracing self tests. Link: http://lkml.kernel.org/p/20140605223520.32311.56097.stgit@yunodevel Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-04-24kprobes, ftrace: Use NOKPROBE_SYMBOL macro in ftraceMasami Hiramatsu
Use NOKPROBE_SYMBOL macro to protect functions from kprobes instead of __kprobes annotation in ftrace. This applies nokprobe_inline annotation for some cases, because NOKPROBE_SYMBOL() will inhibit inlining by referring the symbol address. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20140417081828.26341.55152.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24kprobes, ftrace: Allow probing on some functionsMasami Hiramatsu
There is no need to prohibit probing on the functions used for preparation and uprobe only fetch functions. Those are safely probed because those are not invoked from kprobe's breakpoint/fault/debug handlers. So there is no chance to cause recursive exceptions. Following functions are now removed from the kprobes blacklist: update_bitfield_fetch_param free_bitfield_fetch_param kprobe_register FETCH_FUNC_NAME(stack, type) in trace_uprobe.c FETCH_FUNC_NAME(memory, type) in trace_uprobe.c FETCH_FUNC_NAME(memory, string) in trace_uprobe.c FETCH_FUNC_NAME(memory, string_size) in trace_uprobe.c FETCH_FUNC_NAME(file_offset, type) in trace_uprobe.c Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20140417081800.26341.56504.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-08tracepoint: Use struct pointer instead of name hash for reg/unreg tracepointsMathieu Desnoyers
Register/unregister tracepoint probes with struct tracepoint pointer rather than tracepoint name. This change, which vastly simplifies tracepoint.c, has been proposed by Steven Rostedt. It also removes 8.8kB (mostly of text) to the vmlinux size. From this point on, the tracers need to pass a struct tracepoint pointer to probe register/unregister. A probe can now only be connected to a tracepoint that exists. Moreover, tracers are responsible for unregistering the probe before the module containing its associated tracepoint is unloaded. text data bss dec hex filename 10443444 4282528 10391552 25117524 17f4354 vmlinux.orig 10434930 4282848 10391552 25109330 17f2352 vmlinux Link: http://lkml.kernel.org/r/1396992381-23785-2-git-send-email-mathieu.desnoyers@efficios.com CC: Ingo Molnar <mingo@kernel.org> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: Frank Ch. Eigler <fche@redhat.com> CC: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> [ SDR - fixed return val in void func in tracepoint_module_going() ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-02-20tracing/uprobes: Support ftrace_event_file base multibufferzhangwei(Jovi)
Support multi-buffer on uprobe-based dynamic events by using ftrace_event_file. This patch is based kprobe-based dynamic events multibuffer support work initially, commited by Masami(commit 41a7dd420c), but revised as below: Oleg changed the kprobe-based multibuffer design from array-pointers of ftrace_event_file into simple list, so this patch also change to the list design. rcu_read_lock/unlock added into uprobe_trace_func/uretprobe_trace_func, to synchronize with ftrace_event_file list add and delete. Even though we allow multi-uprobes instances now, but TP_FLAG_PROFILE/TP_FLAG_TRACE are still mutually exclusive in probe_event_enable currently, this means we cannot allow one user is using uprobe-tracer, and another user is using perf-probe on same uprobe concurrently. (Perhaps this will be fix in future, kprobe don't have this limitation now) Link: http://lkml.kernel.org/r/1389946120-19610-4-git-send-email-namhyung@kernel.org Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-09tracing: Consolidate event trigger codeSteven Rostedt (Red Hat)
The event trigger code that checks for callback triggers before and after recording of an event has lots of flags checks. This code is duplicated throughout the ftrace events, kprobes and system calls. They all do the exact same checks against the event flags. Added helper functions ftrace_trigger_soft_disabled(), event_trigger_unlock_commit() and event_trigger_unlock_commit_regs() that consolidated the code and these are used instead. Link: http://lkml.kernel.org/r/20140106222703.5e7dbba2@gandalf.local.home Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-06tracing/kprobes: Add trace event trigger invocationsTom Zanussi
Add code to the kprobe/kretprobe event functions that will invoke any event triggers associated with a probe's ftrace_event_file. The code to do this is very similar to the invocation code already used to invoke the triggers associated with static events and essentially replaces the existing soft-disable checks with a superset that preserves the original behavior but adds the bits needed to support event triggers. Link: http://lkml.kernel.org/r/f2d49f157b608070045fdb26c9564d5a05a5a7d0.1389036657.git.tom.zanussi@linux.intel.com Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-02tracing/uprobes: Add @+file_offset fetch methodNamhyung Kim
Enable to fetch data from a file offset. Currently it only supports fetching from same binary uprobe set. It'll translate the file offset to a proper virtual address in the process. The syntax is "@+OFFSET" as it does similar to normal memory fetching (@ADDR) which does no address translation. Suggested-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2014-01-02tracing/probes: Implement 'memory' fetch method for uprobesNamhyung Kim
Use separate method to fetch from memory. Move existing functions to trace_kprobe.c and make them static. Also add new memory fetch implementation for uprobes. Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2014-01-02tracing/probes: Move 'symbol' fetch method to kprobesNamhyung Kim
Move existing functions to trace_kprobe.c and add NULL entries to the uprobes fetch type table. I don't make them static since some generic routines like update/free_XXX_fetch_param() require pointers to the functions. Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2014-01-02tracing/probes: Implement 'stack' fetch method for uprobesNamhyung Kim
Use separate method to fetch from stack. Move existing functions to trace_kprobe.c and make them static. Also add new stack fetch implementation for uprobes. Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2014-01-02tracing/probes: Split [ku]probes_fetch_type_tableNamhyung Kim
Use separate fetch_type_table for kprobes and uprobes. It currently shares all fetch methods but some of them will be implemented differently later. This is not to break build if [ku]probes is configured alone (like !CONFIG_KPROBE_EVENT and CONFIG_UPROBE_EVENT). So I added '__weak' to the table declaration so that it can be safely omitted when it configured out. Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2014-01-02tracing/probes: Integrate duplicate set_print_fmt()Namhyung Kim
The set_print_fmt() functions are implemented almost same for [ku]probes. Move it to a common place and get rid of the duplication. Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2014-01-02tracing/kprobes: Move common functions to trace_probe.hNamhyung Kim
The __get_data_size() and store_trace_args() will be used by uprobes too. Move them to a common location. Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2014-01-02tracing/kprobes: Factor out struct trace_probeNamhyung Kim
There are functions that can be shared to both of kprobes and uprobes. Separate common data structure to struct trace_probe and use it from the shared functions. Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2013-11-05tracing: Update event filters for multibufferTom Zanussi
The trace event filters are still tied to event calls rather than event files, which means you don't get what you'd expect when using filters in the multibuffer case: Before: # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 8192 # mkdir /sys/kernel/debug/tracing/instances/test1 # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 2048 # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter bytes_alloc > 2048 Setting the filter in tracing/instances/test1/events shouldn't affect the same event in tracing/events as it does above. After: # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 8192 # mkdir /sys/kernel/debug/tracing/instances/test1 # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 8192 # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter bytes_alloc > 2048 We'd like to just move the filter directly from ftrace_event_call to ftrace_event_file, but there are a couple cases that don't yet have multibuffer support and therefore have to continue using the current event_call-based filters. For those cases, a new USE_CALL_FILTER bit is added to the event_call flags, whose main purpose is to keep the old behavior for those cases until they can be updated with multibuffer support; at that point, the USE_CALL_FILTER flag (and the new associated call_filter_check_discard() function) can go away. The multibuffer support also made filter_current_check_discard() redundant, so this change removes that function as well and replaces it with filter_check_discard() (or call_filter_check_discard() as appropriate). Link: http://lkml.kernel.org/r/f16e9ce4270c62f46b2e966119225e1c3cca7e60.1382620672.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-31tracing/kprobes: Fail to unregister if probe event files are in useSteven Rostedt (Red Hat)
When a probe is being removed, it cleans up the event files that correspond to the probe. But there is a race between writing to one of these files and deleting the probe. This is especially true for the "enable" file. CPU 0 CPU 1 ----- ----- fd = open("enable",O_WRONLY); probes_open() release_all_trace_probes() unregister_trace_probe() if (trace_probe_is_enabled(tp)) return -EBUSY write(fd, "1", 1) __ftrace_set_clr_event() call->class->reg() (kprobe_register) enable_trace_probe(tp) __unregister_trace_probe(tp); list_del(&tp->list) unregister_probe_event(tp) <-- fails! free_trace_probe(tp) write(fd, "0", 1) __ftrace_set_clr_event() call->class->unreg (kprobe_register) disable_trace_probe(tp) <-- BOOM! A test program was written that used two threads to simulate the above scenario adding a nanosleep() interval to change the timings and after several thousand runs, it was able to trigger this bug and crash: BUG: unable to handle kernel paging request at 00000005000000f9 IP: [<ffffffff810dee70>] probes_open+0x3b/0xa7 PGD 7808a067 PUD 0 Oops: 0000 [#1] PREEMPT SMP Dumping ftrace buffer: --------------------------------- Modules linked in: ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6 CPU: 1 PID: 2070 Comm: test-kprobe-rem Not tainted 3.11.0-rc3-test+ #47 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007 task: ffff880077756440 ti: ffff880076e52000 task.ti: ffff880076e52000 RIP: 0010:[<ffffffff810dee70>] [<ffffffff810dee70>] probes_open+0x3b/0xa7 RSP: 0018:ffff880076e53c38 EFLAGS: 00010203 RAX: 0000000500000001 RBX: ffff88007844f440 RCX: 0000000000000003 RDX: 0000000000000003 RSI: 0000000000000003 RDI: ffff880076e52000 RBP: ffff880076e53c58 R08: ffff880076e53bd8 R09: 0000000000000000 R10: ffff880077756440 R11: 0000000000000006 R12: ffffffff810dee35 R13: ffff880079250418 R14: 0000000000000000 R15: ffff88007844f450 FS: 00007f87a276f700(0000) GS:ffff88007d480000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000005000000f9 CR3: 0000000077262000 CR4: 00000000000007e0 Stack: ffff880076e53c58 ffffffff81219ea0 ffff88007844f440 ffffffff810dee35 ffff880076e53ca8 ffffffff81130f78 ffff8800772986c0 ffff8800796f93a0 ffffffff81d1b5d8 ffff880076e53e04 0000000000000000 ffff88007844f440 Call Trace: [<ffffffff81219ea0>] ? security_file_open+0x2c/0x30 [<ffffffff810dee35>] ? unregister_trace_probe+0x4b/0x4b [<ffffffff81130f78>] do_dentry_open+0x162/0x226 [<ffffffff81131186>] finish_open+0x46/0x54 [<ffffffff8113f30b>] do_last+0x7f6/0x996 [<ffffffff8113cc6f>] ? inode_permission+0x42/0x44 [<ffffffff8113f6dd>] path_openat+0x232/0x496 [<ffffffff8113fc30>] do_filp_open+0x3a/0x8a [<ffffffff8114ab32>] ? __alloc_fd+0x168/0x17a [<ffffffff81131f4e>] do_sys_open+0x70/0x102 [<ffffffff8108f06e>] ? trace_hardirqs_on_caller+0x160/0x197 [<ffffffff81131ffe>] SyS_open+0x1e/0x20 [<ffffffff81522742>] system_call_fastpath+0x16/0x1b Code: e5 41 54 53 48 89 f3 48 83 ec 10 48 23 56 78 48 39 c2 75 6c 31 f6 48 c7 RIP [<ffffffff810dee70>] probes_open+0x3b/0xa7 RSP <ffff880076e53c38> CR2: 00000005000000f9 ---[ end trace 35f17d68fc569897 ]--- The unregister_trace_probe() must be done first, and if it fails it must fail the removal of the kprobe. Several changes have already been made by Oleg Nesterov and Masami Hiramatsu to allow moving the unregister_probe_event() before the removal of the probe and exit the function if it fails. This prevents the tp structure from being used after it is freed. Link: http://lkml.kernel.org/r/20130704034038.819592356@goodmis.org Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-18tracing/kprobe: Wait for disabling all running kprobe handlersMasami Hiramatsu
Wait for disabling all running kprobe handlers when a kprobe event is disabled, since the caller, trace_remove_event_call() supposes that a removing event is disabled completely by disabling the event. With this change, ftrace can ensure that there is no running event handlers after disabling it. Link: http://lkml.kernel.org/r/20130709093526.20138.93100.stgit@mhiramat-M0-7522 Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-18tracing/perf: Move the PERF_MAX_TRACE_SIZE check into perf_trace_buf_prepare()Oleg Nesterov
Every perf_trace_buf_prepare() caller does WARN_ONCE(size > PERF_MAX_TRACE_SIZE, message) and "message" is almost the same. Shift this WARN_ONCE() into perf_trace_buf_prepare(). This changes the meaning of _ONCE, but I think this is fine. - 4947014 2932448 10104832 17984294 1126b26 vmlinux + 4948422 2932448 10104832 17985702 11270a6 vmlinux on my build. Link: http://lkml.kernel.org/r/20130617170211.GA19813@redhat.com Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-01tracing/kprobes: Don't pass addr=ip to perf_trace_buf_submit()Oleg Nesterov
kprobe_perf_func() and kretprobe_perf_func() pass addr=ip to perf_trace_buf_submit() for no reason. This sets perf_sample_data->addr for PERF_SAMPLE_ADDR, we already have perf_sample_data->ip initialized if PERF_SAMPLE_IP. Link: http://lkml.kernel.org/r/20130620173811.GA13161@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-01tracing/kprobes: Turn trace_probe->files into list_headOleg Nesterov
I think that "ftrace_event_file *trace_probe[]" complicates the code for no reason, turn it into list_head to simplify the code. enable_trace_probe() no longer needs synchronize_sched(). This needs the extra sizeof(list_head) memory for every attached ftrace_event_file, hopefully not a problem in this case. Link: http://lkml.kernel.org/r/20130620173814.GA13165@redhat.com Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-01tracing/kprobes: Kill probe_enable_lockOleg Nesterov
enable_trace_probe() and disable_trace_probe() should not worry about serialization, the caller (perf_trace_init or __ftrace_set_clr_event) holds event_mutex. They are also called by kprobe_trace_self_tests_init(), but this __init function can't race with itself or trace_events.c And note that this code depended on event_mutex even before 41a7dd420c which introduced probe_enable_lock. In fact it assumes that the caller kprobe_register() can never race with itself. Otherwise, say, tp->flags manipulations are racy. Link: http://lkml.kernel.org/r/20130620173809.GA13158@redhat.com Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>