summaryrefslogtreecommitdiff
path: root/include/linux/netfilter.h
AgeCommit message (Collapse)Author
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28netfilter: convert hook list to an arrayAaron Conole
This converts the storage and layout of netfilter hook entries from a linked list to an array. After this commit, hook entries will be stored adjacent in memory. The next pointer is no longer required. The ops pointers are stored at the end of the array as they are only used in the register/unregister path and in the legacy br_netfilter code. nf_unregister_net_hooks() is slower than needed as it just calls nf_unregister_net_hook in a loop (i.e. at least n synchronize_net() calls), this will be addressed in followup patch. Test setup: - ixgbe 10gbit - netperf UDP_STREAM, 64 byte packets - 5 hooks: (raw + mangle prerouting, mangle+filter input, inet filter): empty mangle and raw prerouting, mangle and filter input hooks: 353.9 this patch: 364.2 Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-07-17netfilter: remove old pre-netns era hook apiFlorian Westphal
no more users in the tree, remove this. The old api is racy wrt. module removal, all users have been converted to the netns-aware api. The old api pretended we still have global hooks but that has not been true for a long time. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: decouple nf_hook_entry and nf_hook_opsAaron Conole
During nfhook traversal we only need a very small subset of nf_hook_ops members. We need: - next element - hook function to call - hook function priv argument Bridge netfilter also needs 'thresh'; can be obtained via ->orig_ops. nf_hook_entry struct is now 32 bytes on x86_64. A followup patch will turn the run-time list into an array that only stores hook functions plus their priv arguments, eliminating the ->next element. Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: introduce accessor functions for hook entriesAaron Conole
This allows easier future refactoring. Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-03netfilter: remove hook_entries field from nf_hook_statePablo Neira Ayuso
This field is only useful for nf_queue, so store it in the nf_queue_entry structure instead, away from the core path. Pass hook_head to nf_hook_slow(). Since we always have a valid entry on the first iteration in nf_iterate(), we can use 'do { ... } while (entry)' loop instead. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-03netfilter: kill NF_HOOK_THRESH() and state->treshPablo Neira Ayuso
Patch c5136b15ea36 ("netfilter: bridge: add and use br_nf_hook_thresh") introduced br_nf_hook_thresh(). Replace NF_HOOK_THRESH() by br_nf_hook_thresh from br_nf_forward_finish(), so we have no more callers for this macro. As a result, state->thresh and explicit thresh parameter in the hook state structure is not required anymore. And we can get rid of skip-hook-under-thresh loop in nf_iterate() in the core path that is only used by br_netfilter to search for the filter hook. Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25Merge branch 'master' of ↵Pablo Neira Ayuso
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Conflicts: net/netfilter/core.c net/netfilter/nf_tables_netdev.c Resolve two conflicts before pull request for David's net-next tree: 1) Between c73c24849011 ("netfilter: nf_tables_netdev: remove redundant ip_hdr assignment") from the net tree and commit ddc8b6027ad0 ("netfilter: introduce nft_set_pktinfo_{ipv4, ipv6}_validate()"). 2) Between e8bffe0cf964 ("net: Add _nf_(un)register_hooks symbols") and Aaron Conole's patches to replace list_head with single linked list. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25netfilter: replace list_head with single linked listAaron Conole
The netfilter hook list never uses the prev pointer, and so can be trimmed to be a simple singly-linked list. In addition to having a more light weight structure for hook traversal, struct net becomes 5568 bytes (down from 6400) and struct net_device becomes 2176 bytes (down from 2240). Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-24netfilter: call nf_hook_state_init with rcu_read_lock heldFlorian Westphal
This makes things simpler because we can store the head of the list in the nf_state structure without worrying about concurrent add/delete of hook elements from the list. A future commit will make use of this to implement a simpler linked-list. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-19net: Add _nf_(un)register_hooks symbolsMahesh Bandewar
Add _nf_register_hooks() and _nf_unregister_hooks() calls which allow caller to hold RTNL mutex. Signed-off-by: Mahesh Bandewar <maheshb@google.com> CC: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02netfilter: don't call hooks unless neededFlorian Westphal
With the previous patches in place, a netns nf_hook_list might be empty, even if e.g. init_net performs filtering. Thus change nf_hook_thresh to check the hook_list as well before initializing hook_state and calling nf_hook_slow(). We still make use of static keys; if no netfilter modules are loaded list is guaranteed to be empty. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-16netfilter: turn NF_HOOK into an inline functionArnd Bergmann
A recent change to the dst_output handling caused a new warning when the call to NF_HOOK() is the only used of a local variable passed as 'dev', and CONFIG_NETFILTER is disabled: net/ipv6/ip6_output.c: In function 'ip6_output': net/ipv6/ip6_output.c:135:21: warning: unused variable 'dev' [-Wunused-variable] The reason for this is that the NF_HOOK macro in this case does not reference the variable at all, and the call to dev_net(dev) got removed from the ip6_output function. To avoid that warning now and in the future, this changes the macro into an equivalent inline function, which tells the compiler that the variable is passed correctly but still unused. The dn_forward function apparently had the same problem in the past and added a local workaround that no longer works with the inline function. In order to avoid a regression, we have to also remove the #ifdef from decnet in the same patch. Fixes: ede2059dbaf9 ("dst: Pass net into dst->output") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-16netfilter: remove hook owner refcountingFlorian Westphal
since commit 8405a8fff3f8 ("netfilter: nf_qeueue: Drop queue entries on nf_unregister_hook") all pending queued entries are discarded. So we can simply remove all of the owner handling -- when module is removed it also needs to unregister all its hooks. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-05netfilter: ctnetlink: add const qualifier to nfnl_hook.get_ctKen-ichirou MATSUZAWA
get_ct as is and will not update its skb argument, and users of nfnl_ct_hook is currently only nfqueue, we can add const qualifier. Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp>
2015-10-05netfilter: nfnetlink_queue: rename related to nfqueue attaching conntrack infoKen-ichirou MATSUZAWA
The idea of this series of patch is to attach conntrack information to nflog like nfqueue has already done. nfqueue conntrack info attaching basis is generic, rename those names to generic one, glue. Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-04netfilter: nfnetlink_queue: get rid of nfnetlink_queue_ct.cPablo Neira Ayuso
The original intention was to avoid dependencies between nfnetlink_queue and conntrack without ifdef pollution. However, we can achieve this by moving the conntrack dependent code into ctnetlink and keep some glue code to access the nfq_ct indirection from nfqueue. After this patch, the nfq_ct indirection is always compiled in the netfilter core to avoid polluting nfqueue with ifdefs. Thus, if nf_conntrack is not compiled this results in only 8-bytes of memory waste in x86_64. This patch also adds ctnetlink_nfqueue_seqadj() to avoid that the nf_conn structure layout if exposed to nf_queue, which creates another dependency with nf_conntrack at compilation time. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-09-29netfilter: Push struct net down into nf_afinfo.rerouteEric W. Biederman
The network namespace is needed when routing a packet. Stop making nf_afinfo.reroute guess which network namespace is the proper namespace to route the packet in. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-09-18netfilter: Pass priv instead of nf_hook_ops to netfilter hooksEric W. Biederman
Only pass the void *priv parameter out of the nf_hook_ops. That is all any of the functions are interested now, and by limiting what is passed it becomes simpler to change implementation details. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-09-17netfilter: Pass net into okfnEric W. Biederman
This is immediately motivated by the bridge code that chains functions that call into netfilter. Without passing net into the okfns the bridge code would need to guess about the best expression for the network namespace to process packets in. As net is frequently one of the first things computed in continuation functions after netfilter has done it's job passing in the desired network namespace is in many cases a code simplification. To support this change the function dst_output_okfn is introduced to simplify passing dst_output as an okfn. For the moment dst_output_okfn just silently drops the struct net. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17netfilter: Pass struct net into the netfilter hooksEric W. Biederman
Pass a network namespace parameter into the netfilter hooks. At the call site of the netfilter hooks the path a packet is taking through the network stack is well known which allows the network namespace to be easily and reliabily. This allows the replacement of magic code like "dev_net(state->in?:state->out)" that appears at the start of most netfilter hooks with "state->net". In almost all cases the network namespace passed in is derived from the first network device passed in, guaranteeing those paths will not see any changes in practice. The exceptions are: xfrm/xfrm_output.c:xfrm_output_resume() xs_net(skb_dst(skb)->xfrm) ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont() ip_vs_conn_net(cp) ipvs/ip_vs_xmit.c:ip_vs_send_or_cont() ip_vs_conn_net(cp) ipv4/raw.c:raw_send_hdrinc() sock_net(sk) ipv6/ip6_output.c:ip6_xmit() sock_net(sk) ipv6/ndisc.c:ndisc_send_skb() dev_net(skb->dev) not dev_net(dst->dev) ipv6/raw.c:raw6_send_hdrinc() sock_net(sk) br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev In all cases these exceptions seem to be a better expression for the network namespace the packet is being processed in then the historic "dev_net(in?in:out)". I am documenting them in case something odd pops up and someone starts trying to track down what happened. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17netfilter: Pass net to nf_hook_threshEric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17netfilter: Store net in nf_hook_stateEric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17netfilter: Remove !CONFIG_NETFITLER definition of nf_hook_threshEric W. Biederman
The !CONFIG_NETFILTER definition of nf_hook_thresh calls okfn when the CONFIG_NETFITLER defintion does not, making it buggy. As the !CONFIG_NETFILTER defintion of nf_hook_thresh is not used remove it rather than fix it. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-02netfilter: nf_conntrack: make nf_ct_zone_dflt built-inDaniel Borkmann
Fengguang reported, that some randconfig generated the following linker issue with nf_ct_zone_dflt object involved: [...] CC init/version.o LD init/built-in.o net/built-in.o: In function `ipv4_conntrack_defrag': nf_defrag_ipv4.c:(.text+0x93e95): undefined reference to `nf_ct_zone_dflt' net/built-in.o: In function `ipv6_defrag': nf_defrag_ipv6_hooks.c:(.text+0xe3ffe): undefined reference to `nf_ct_zone_dflt' make: *** [vmlinux] Error 1 Given that configurations exist where we have a built-in part, which is accessing nf_ct_zone_dflt such as the two handlers nf_ct_defrag_user() and nf_ct6_defrag_user(), and a part that configures nf_conntrack as a module, we must move nf_ct_zone_dflt into a fixed, guaranteed built-in area when netfilter is configured in general. Therefore, split the more generic parts into a common header under include/linux/netfilter/ and move nf_ct_zone_dflt into the built-in section that already holds parts related to CONFIG_NF_CONNTRACK in the netfilter core. This fixes the issue on my side. Fixes: 308ac9143ee2 ("netfilter: nf_conntrack: push zone object into functions") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-23netfilter: rename local nf_hook_list to hook_listPablo Neira Ayuso
085db2c04557 ("netfilter: Per network namespace netfilter hooks.") introduced a new nf_hook_list that is global, so let's avoid this overlap. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-07-15netfilter: move tee_active to coreFlorian Westphal
This prepares for a TEE like expression in nftables. We want to ensure only one duplicate is sent, so both will use the same percpu variable to detect duplication. The other use case is detection of recursive call to xtables, but since we don't want dependency from nft to xtables core its put into core.c instead of the x_tables core. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-07-15netfilter: Per network namespace netfilter hooks.Eric W. Biederman
- Add a new set of functions for registering and unregistering per network namespace hooks. - Modify the old global namespace hook functions to use the per network namespace hooks in their implementation, so their remains a single list that needs to be walked for any hook (this is important for keeping the hook priority working and for keeping the code walking the hooks simple). - Only allow registering the per netdevice hooks in the network namespace where the network device lives. - Dynamically allocate the structures in the per network namespace hook list in nf_register_net_hook, and unregister them in nf_unregister_net_hook. Dynamic allocate is required somewhere as the number of network namespaces are not fixed so we might as well allocate them in the registration function. The chain of registered hooks on any list is expected to be small so the cost of walking that list to find the entry we are unregistering should also be small. Performing the management of the dynamically allocated list entries in the registration and unregistration functions keeps the complexity from spreading. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-07-15netfilter: kill nf_hooks_activeEric W. Biederman
The function obscures what is going on in nf_hook_thresh and it's existence requires computing the hook list twice. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-06-18netfilter: don't pull include/linux/netfilter.h from netns headersPablo Neira Ayuso
This pulls the full hook netfilter definitions from all those that include net_namespace.h. Instead let's just include the bare minimum required in the new linux/netfilter_defs.h file, and use it from the netfilter netns header files. I also needed to include in.h and in6.h from linux/netfilter.h otherwise we hit this compilation error: In file included from include/linux/netfilter_defs.h:4:0, from include/net/netns/netfilter.h:4, from include/net/net_namespace.h:22, from include/linux/netdevice.h:43, from net/netfilter/nfnetlink_queue_core.c:23: include/uapi/linux/netfilter.h:76:17: error: field ‘in’ has incomplete type struct in_addr in; And also explicit include linux/netfilter.h in several spots. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2015-05-14netfilter: add netfilter ingress hook after handle_ing() under unique static keyPablo Neira
This patch adds the Netfilter ingress hook just after the existing tc ingress hook, that seems to be the consensus solution for this. Note that the Netfilter hook resides under the global static key that enables ingress filtering. Nonetheless, Netfilter still also has its own static key for minimal impact on the existing handle_ing(). * Without this patch: Result: OK: 6216490(c6216338+d152) usec, 100000000 (60byte,0frags) 16086246pps 7721Mb/sec (7721398080bps) errors: 100000000 42.46% kpktgend_0 [kernel.kallsyms] [k] __netif_receive_skb_core 25.92% kpktgend_0 [kernel.kallsyms] [k] kfree_skb 7.81% kpktgend_0 [pktgen] [k] pktgen_thread_worker 5.62% kpktgend_0 [kernel.kallsyms] [k] ip_rcv 2.70% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_internal 2.34% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_sk 1.44% kpktgend_0 [kernel.kallsyms] [k] __build_skb * With this patch: Result: OK: 6214833(c6214731+d101) usec, 100000000 (60byte,0frags) 16090536pps 7723Mb/sec (7723457280bps) errors: 100000000 41.23% kpktgend_0 [kernel.kallsyms] [k] __netif_receive_skb_core 26.57% kpktgend_0 [kernel.kallsyms] [k] kfree_skb 7.72% kpktgend_0 [pktgen] [k] pktgen_thread_worker 5.55% kpktgend_0 [kernel.kallsyms] [k] ip_rcv 2.78% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_internal 2.06% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_sk 1.43% kpktgend_0 [kernel.kallsyms] [k] __build_skb * Without this patch + tc ingress: tc filter add dev eth4 parent ffff: protocol ip prio 1 \ u32 match ip dst 4.3.2.1/32 Result: OK: 9269001(c9268821+d179) usec, 100000000 (60byte,0frags) 10788648pps 5178Mb/sec (5178551040bps) errors: 100000000 40.99% kpktgend_0 [kernel.kallsyms] [k] __netif_receive_skb_core 17.50% kpktgend_0 [kernel.kallsyms] [k] kfree_skb 11.77% kpktgend_0 [cls_u32] [k] u32_classify 5.62% kpktgend_0 [kernel.kallsyms] [k] tc_classify_compat 5.18% kpktgend_0 [pktgen] [k] pktgen_thread_worker 3.23% kpktgend_0 [kernel.kallsyms] [k] tc_classify 2.97% kpktgend_0 [kernel.kallsyms] [k] ip_rcv 1.83% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_internal 1.50% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_sk 0.99% kpktgend_0 [kernel.kallsyms] [k] __build_skb * With this patch + tc ingress: tc filter add dev eth4 parent ffff: protocol ip prio 1 \ u32 match ip dst 4.3.2.1/32 Result: OK: 9308218(c9308091+d126) usec, 100000000 (60byte,0frags) 10743194pps 5156Mb/sec (5156733120bps) errors: 100000000 42.01% kpktgend_0 [kernel.kallsyms] [k] __netif_receive_skb_core 17.78% kpktgend_0 [kernel.kallsyms] [k] kfree_skb 11.70% kpktgend_0 [cls_u32] [k] u32_classify 5.46% kpktgend_0 [kernel.kallsyms] [k] tc_classify_compat 5.16% kpktgend_0 [pktgen] [k] pktgen_thread_worker 2.98% kpktgend_0 [kernel.kallsyms] [k] ip_rcv 2.84% kpktgend_0 [kernel.kallsyms] [k] tc_classify 1.96% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_internal 1.57% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_sk Note that the results are very similar before and after. I can see gcc gets the code under the ingress static key out of the hot path. Then, on that cold branch, it generates the code to accomodate the netfilter ingress static key. My explanation for this is that this reduces the pressure on the instruction cache for non-users as the new code is out of the hot path, and it comes with minimal impact for tc ingress users. Using gcc version 4.8.4 on: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 [...] L1d cache: 16K L1i cache: 64K L2 cache: 2048K L3 cache: 8192K Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14netfilter: add nf_hook_list_active()Pablo Neira
In preparation to have netfilter ingress per-device hook list. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14netfilter: add hook list to nf_hook_statePablo Neira
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14netfilter: cleanup struct nf_hook_ops indentationPablo Neira
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-07netfilter: Pass socket pointer down through okfn().David Miller
On the output paths in particular, we have to sometimes deal with two socket contexts. First, and usually skb->sk, is the local socket that generated the frame. And second, is potentially the socket used to control a tunneling socket, such as one the encapsulates using UDP. We do not want to disassociate skb->sk when encapsulating in order to fix this, because that would break socket memory accounting. The most extreme case where this can cause huge problems is an AF_PACKET socket transmitting over a vxlan device. We hit code paths doing checks that assume they are dealing with an ipv4 socket, but are actually operating upon the AF_PACKET one. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-07netfilter: Add socket pointer to nf_hook_state.David Miller
It is currently always set to NULL, but nf_queue is adjusted to be prepared for it being set to a real socket by taking and releasing a reference to that socket when necessary. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-07netfilter: Add nf_hook_state initializer function.David Miller
This way we can consolidate where we setup new nf_hook_state objects, to make sure the entire thing is initialized. The only other place an nf_hook_object is instantiated is nf_queue, wherein a structure copy is used. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-04netfilter: Make nf_hookfn use nf_hook_state.David S. Miller
Pass the nf_hook_state all the way down into the hook functions themselves. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-04netfilter: Create and use nf_hook_state.David S. Miller
Instead of passing a large number of arguments down into the nf_hook() entry points, create a structure which carries this state down through the hook processing layers. This makes is so that if we want to change the types or signatures of any of these pieces of state, there are less places that need to be changed. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-25netfilter: HAVE_JUMP_LABEL instead of CONFIG_JUMP_LABELZhouyi Zhou
Use HAVE_JUMP_LABEL as elsewhere in the kernel to ensure that the toolchain has the required support in addition to CONFIG_JUMP_LABEL being set. Signed-off-by: Zhouyi Zhou <yizhouzhou@ict.ac.cn> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: add nftablesPatrick McHardy
This patch adds nftables which is the intended successor of iptables. This packet filtering framework reuses the existing netfilter hooks, the connection tracking system, the NAT subsystem, the transparent proxying engine, the logging infrastructure and the userspace packet queueing facilities. In a nutshell, nftables provides a pseudo-state machine with 4 general purpose registers of 128 bits and 1 specific purpose register to store verdicts. This pseudo-machine comes with an extensible instruction set, a.k.a. "expressions" in the nftables jargon. The expressions included in this patch provide the basic functionality, they are: * bitwise: to perform bitwise operations. * byteorder: to change from host/network endianess. * cmp: to compare data with the content of the registers. * counter: to enable counters on rules. * ct: to store conntrack keys into register. * exthdr: to match IPv6 extension headers. * immediate: to load data into registers. * limit: to limit matching based on packet rate. * log: to log packets. * meta: to match metainformation that usually comes with the skbuff. * nat: to perform Network Address Translation. * payload: to fetch data from the packet payload and store it into registers. * reject (IPv4 only): to explicitly close connection, eg. TCP RST. Using this instruction-set, the userspace utility 'nft' can transform the rules expressed in human-readable text representation (using a new syntax, inspired by tcpdump) to nftables bytecode. nftables also inherits the table, chain and rule objects from iptables, but in a more configurable way, and it also includes the original datatype-agnostic set infrastructure with mapping support. This set infrastructure is enhanced in the follow up patch (netfilter: nf_tables: add netlink set API). This patch includes the following components: * the netlink API: net/netfilter/nf_tables_api.c and include/uapi/netfilter/nf_tables.h * the packet filter core: net/netfilter/nf_tables_core.c * the expressions (described above): net/netfilter/nft_*.c * the filter tables: arp, IPv4, IPv6 and bridge: net/ipv4/netfilter/nf_tables_ipv4.c net/ipv6/netfilter/nf_tables_ipv6.c net/ipv4/netfilter/nf_tables_arp.c net/bridge/netfilter/nf_tables_bridge.c * the NAT table (IPv4 only): net/ipv4/netfilter/nf_table_nat_ipv4.c * the route table (similar to mangle): net/ipv4/netfilter/nf_table_route_ipv4.c net/ipv6/netfilter/nf_table_route_ipv6.c * internal definitions under: include/net/netfilter/nf_tables.h include/net/netfilter/nf_tables_core.h * It also includes an skeleton expression: net/netfilter/nft_expr_template.c and the preliminary implementation of the meta target net/netfilter/nft_meta_target.c It also includes a change in struct nf_hook_ops to add a new pointer to store private data to the hook, that is used to store the rule list per chain. This patch is based on the patch from Patrick McHardy, plus merged accumulated cleanups, fixes and small enhancements to the nftables code that has been done since 2009, which are: From Patrick McHardy: * nf_tables: adjust netlink handler function signatures * nf_tables: only retry table lookup after successful table module load * nf_tables: fix event notification echo and avoid unnecessary messages * nft_ct: add l3proto support * nf_tables: pass expression context to nft_validate_data_load() * nf_tables: remove redundant definition * nft_ct: fix maxattr initialization * nf_tables: fix invalid event type in nf_tables_getrule() * nf_tables: simplify nft_data_init() usage * nf_tables: build in more core modules * nf_tables: fix double lookup expression unregistation * nf_tables: move expression initialization to nf_tables_core.c * nf_tables: build in payload module * nf_tables: use NFPROTO constants * nf_tables: rename pid variables to portid * nf_tables: save 48 bits per rule * nf_tables: introduce chain rename * nf_tables: check for duplicate names on chain rename * nf_tables: remove ability to specify handles for new rules * nf_tables: return error for rule change request * nf_tables: return error for NLM_F_REPLACE without rule handle * nf_tables: include NLM_F_APPEND/NLM_F_REPLACE flags in rule notification * nf_tables: fix NLM_F_MULTI usage in netlink notifications * nf_tables: include NLM_F_APPEND in rule dumps From Pablo Neira Ayuso: * nf_tables: fix stack overflow in nf_tables_newrule * nf_tables: nft_ct: fix compilation warning * nf_tables: nft_ct: fix crash with invalid packets * nft_log: group and qthreshold are 2^16 * nf_tables: nft_meta: fix socket uid,gid handling * nft_counter: allow to restore counters * nf_tables: fix module autoload * nf_tables: allow to remove all rules placed in one chain * nf_tables: use 64-bits rule handle instead of 16-bits * nf_tables: fix chain after rule deletion * nf_tables: improve deletion performance * nf_tables: add missing code in route chain type * nf_tables: rise maximum number of expressions from 12 to 128 * nf_tables: don't delete table if in use * nf_tables: fix basechain release From Tomasz Bursztyka: * nf_tables: Add support for changing users chain's name * nf_tables: Change chain's name to be fixed sized * nf_tables: Add support for replacing a rule by another one * nf_tables: Update uapi nftables netlink header documentation From Florian Westphal: * nft_log: group is u16, snaplen u32 From Phil Oester: * nf_tables: operational limit match Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: pass hook ops to hookfnPatrick McHardy
Pass the hook ops to the hookfn to allow for generic hook functions. This change is required by nf_tables. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-09-26netfilter: Remove extern from function prototypesJoe Perches
There are a mix of function prototypes with and without extern in the kernel sources. Standardize on not using extern for function prototypes. Function prototypes don't need to be written with extern. extern is assumed by the compiler. Its use is as unnecessary as using auto to declare automatic/local variables in a block. Signed-off-by: Joe Perches <joe@perches.com>
2013-08-28netfilter: nf_conntrack: make sequence number adjustments usuable without NATPatrick McHardy
Split out sequence number adjustments from NAT and move them to the conntrack core to make them usable for SYN proxying. The sequence number adjustment information is moved to a seperate extend. The extend is added to new conntracks when a NAT mapping is set up for a connection using a helper. As a side effect, this saves 24 bytes per connection with NAT in the common case that a connection does not have a helper assigned. Signed-off-by: Patrick McHardy <kaber@trash.net> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-13netfilter: nfnetlink_queue: allow to attach expectations to conntracksPablo Neira Ayuso
This patch adds the capability to attach expectations via nfnetlink_queue. This is required by conntrack helpers that trigger expectations based on the first packet seen like the TFTP and the DHCPv6 user-space helpers. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31netfilter: nf_nat: change sequence number adjustments to 32 bitsPatrick McHardy
Using 16 bits is too small, when many adjustments happen the offsets might overflow and break the connection. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31netfilter: nf_conntrack: constify sk_buff argument to nf_ct_attach()Patrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-05-23netfilter: don't panic on error while walking through the init pathPablo Neira Ayuso
Don't panic if we hit an error while adding the nf_log or pernet netfilter support, just bail out. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-04-05netfilter: remove unneeded variable proc_net_netfilterPablo Neira Ayuso
Now that this supports net namespace for nflog and nfqueue, we can remove the global proc_net_netfilter which has no clients anymore. Based on patch from Gao feng. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-10-13UAPI: (Scripted) Disintegrate include/linuxDavid Howells
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>