summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* ptp: ixp46x: remove NO_IRQ handlingArnd Bergmann2016-09-061-7/+8
| | | | | | | | | | | | | gpio_to_irq does not return NO_IRQ but instead returns a negative error code on failure. Returning NO_IRQ from the function has no negative effects as we only compare the result to the expected interrupt number, but it's better to return a proper failure code for consistency, and we should remove NO_IRQ from the kernel entirely. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sfc: check MTU against minimum thresholdBert Kenward2016-09-062-1/+14
| | | | | | | | Reported-by: Ma Yuying <yuma@redhat.com> Suggested-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Bert Kenward <bkenward@solarflare.com> Reviewed-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* perf, bpf: fix conditional call to bpf_overflow_handlerArnd Bergmann2016-09-061-1/+1
| | | | | | | | | | | | | | | | | The newly added bpf_overflow_handler function is only built of both CONFIG_EVENT_TRACING and CONFIG_BPF_SYSCALL are enabled, but the caller only checks the latter: kernel/events/core.c: In function 'perf_event_alloc': kernel/events/core.c:9106:27: error: 'bpf_overflow_handler' undeclared (first use in this function) This changes the caller so we also skip this call if CONFIG_EVENT_TRACING is disabled entirely. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: aa6a5f3cb2b2 ("perf, bpf: add perf events core support for BPF_PROG_TYPE_PERF_EVENT programs") Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: arc_emac: mark arc_mdio_reset() staticBaoyou Xie2016-09-061-1/+1
| | | | | | | | | | | | We get 1 warning when building kernel with W=1: drivers/net/ethernet/arc/emac_mdio.c:107:5: warning: no previous prototype for 'arc_mdio_reset' [-Wmissing-prototypes] In fact, this function is only used in the file in which it is declared and don't need a declaration, but can be made static. so this patch marks this function with 'static'. Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* lan78xx: mark symbols static where possibleBaoyou Xie2016-09-061-11/+13
| | | | | | | | | | | | | | | We get a few warnings when building kernel with W=1: drivers/net/usb/lan78xx.c:1182:6: warning: no previous prototype for 'lan78xx_defer_kevent' [-Wmissing-prototypes] drivers/net/usb/lan78xx.c:1409:5: warning: no previous prototype for 'lan78xx_nway_reset' [-Wmissing-prototypes] drivers/net/usb/lan78xx.c:2000:5: warning: no previous prototype for 'lan78xx_set_mac_addr' [-Wmissing-prototypes] .... In fact, these functions are only used in the file in which they are declared and don't need a declaration, but can be made static. so this patch marks these functions with 'static'. Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'qed-get_regs'David S. Miller2016-09-064-0/+131
|\
| * qed: Add infrastructure for debug data collectionTomer Tayar2016-09-064-0/+131
|/ | | | | | | | | Adds support for several infrastructure operations that are done as part of debug data collection. Signed-off-by: Tomer Tayar <Tomer.Tayar@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* bnx2x: Add support for segmentation of tunnels with outer checksumsAlexander Duyck2016-09-061-3/+12
| | | | | | Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* qed: Remove OOM messagesJoe Perches2016-09-0611-141/+47
| | | | | | | | | | | | | | | | | These messages are unnecessary as OOM allocation failures already do a dump_stack() giving more or less the same information. $ size drivers/net/ethernet/qlogic/qed/built-in.o* (defconfig x86-64) text data bss dec hex filename 127817 27969 32800 188586 2e0aa drivers/net/ethernet/qlogic/qed/built-in.o.new 132474 27969 32800 193243 2f2db drivers/net/ethernet/qlogic/qed/built-in.o.old Miscellanea: o Change allocs to the generally preferred forms where possible. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'rxrpc-rewrite-20160904-2' of ↵David S. Miller2016-09-065-640/+660
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Split output code from sendmsg code Here's a set of small patches that split the packet transmission code from the sendmsg code and simply rearrange the new file to make it more logically laid out ready for being rewritten. An enum is also moved out of the header file to there as it's only used there. This needs to be applied on top of the just-posted fixes patch set. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * rxrpc Move enum rxrpc_command to sendmsg.cDavid Howells2016-09-042-7/+7
| | | | | | | | | | | | Move enum rxrpc_command to sendmsg.c as it's now only used in that file. Signed-off-by: David Howells <dhowells@redhat.com>
| * rxrpc: Rearrange net/rxrpc/sendmsg.cDavid Howells2016-09-041-281/+277
| | | | | | | | | | | | | | Rearrange net/rxrpc/sendmsg.c to be in a more logical order. This makes it easier to follow and eliminates forward declarations. Signed-off-by: David Howells <dhowells@redhat.com>
| * rxrpc: Split sendmsg from packet transmission codeDavid Howells2016-09-045-633/+657
| | | | | | | | | | | | | | Split the sendmsg code from the packet transmission code (mostly to be found in output.c). Signed-off-by: David Howells <dhowells@redhat.com>
* | Merge tag 'rxrpc-rewrite-20160904-1' of ↵David S. Miller2016-09-069-38/+31
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Small fixes Here's a set of small fix patches: (1) Fix some uninitialised variables. (2) Set the client call state before making it live by attaching it to the conn struct. (3) Randomise the epoch and starting client conn ID values, and don't change the epoch when the client conn ID rolls round. (4) Replace deprecated create_singlethread_workqueue() calls. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * fs/afs/flock: Remove deprecated create_singlethread_workqueueBhaktipriya Shridhar2016-09-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The workqueue "afs_lock_manager" queues work item &vnode->lock_work, per vnode. Since there can be multiple vnodes and since their work items can be executed concurrently, alloc_workqueue has been used to replace the deprecated create_singlethread_workqueue instance. The WQ_MEM_RECLAIM flag has been set to ensure forward progress under memory pressure because the workqueue is being used on a memory reclaim path. Since there are fixed number of work items, explicit concurrency limit is unnecessary here. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
| * fs/afs/callback: Remove deprecated create_singlethread_workqueueBhaktipriya Shridhar2016-09-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | The workqueue "afs_callback_update_worker" queues multiple work items viz &vnode->cb_broken_work, &server->cb_break_work which require strict execution ordering. Hence, an ordered dedicated workqueue has been used. Since the workqueue is being used on a memory reclaim path, WQ_MEM_RECLAIM has been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
| * fs/afs/rxrpc: Remove deprecated create_singlethread_workqueueBhaktipriya Shridhar2016-09-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The workqueue "afs_async_calls" queues work item &call->async_work per afs_call. Since there could be multiple calls and since these calls can be run concurrently, alloc_workqueue has been used to replace the deprecated create_singlethread_workqueue instance. The WQ_MEM_RECLAIM flag has been set to ensure forward progress under memory pressure because the workqueue is being used on a memory reclaim path. Since there are fixed number of work items, explicit concurrency limit is unnecessary here. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
| * fs/afs/vlocation: Remove deprecated create_singlethread_workqueueBhaktipriya Shridhar2016-09-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The workqueue "afs_vlocation_update_worker" queues a single work item &afs_vlocation_update and hence it doesn't require execution ordering. Hence, alloc_workqueue has been used to replace the deprecated create_singlethread_workqueue instance. Since the workqueue is being used on a memory reclaim path, WQ_MEM_RECLAIM flag has been set to ensure forward progress under memory pressure. Since there are fixed number of work items, explicit concurrency limit is unnecessary here. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
| * rxrpc: Don't change the epochDavid Howells2016-09-041-24/+8
| | | | | | | | | | | | | | It seems the local epoch should only be changed on boot, so remove the code that changes it for client connections. Signed-off-by: David Howells <dhowells@redhat.com>
| * rxrpc: Randomise epoch and starting client conn ID valuesDavid Howells2016-09-042-1/+9
| | | | | | | | | | | | | | | | | | | | Create a random epoch value rather than a time-based one on startup and set the top bit to indicate that this is the case. Also create a random starting client connection ID value. This will be incremented from here as new client connections are created. Signed-off-by: David Howells <dhowells@redhat.com>
| * rxrpc: The client call state must be changed before attachment to connDavid Howells2016-09-042-2/+4
| | | | | | | | | | | | | | | | | | | | We must set the client call state to RXRPC_CALL_CLIENT_SEND_REQUEST before attaching the call to the connection struct, not after, as it's liable to receive errors and conn aborts as soon as the assignment is made - and these will cause its state to be changed outside of the initiating thread's control. Signed-off-by: David Howells <dhowells@redhat.com>
| * rxrpc: Fix uninitialised variable warningDavid Howells2016-09-021-3/+2
| | | | | | | | | | | | | | | | | | | | | | Fix the following uninitialised variable warning: ../net/rxrpc/call_event.c: In function 'rxrpc_process_call': ../net/rxrpc/call_event.c:879:58: warning: 'error' may be used uninitialized in this function [-Wmaybe-uninitialized] _debug("post net error %d", error); ^ Signed-off-by: David Howells <dhowells@redhat.com>
| * rxrpc: fix undefined behavior in rxrpc_mark_call_releasedArnd Bergmann2016-09-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gcc -Wmaybe-initialized correctly points out a newly introduced bug through which we can end up calling rxrpc_queue_call() for a dead connection: net/rxrpc/call_object.c: In function 'rxrpc_mark_call_released': net/rxrpc/call_object.c:600:5: error: 'sched' may be used uninitialized in this function [-Werror=maybe-uninitialized] This sets the 'sched' variable to zero to restore the previous behavior. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: f5c17aaeb2ae ("rxrpc: Calls should only have one terminal state") Signed-off-by: David Howells <dhowells@redhat.com>
* | vxlan: Update tx_errors statistics if vxlan_build_skb return err.Haishuang Yan2016-09-061-0/+1
| | | | | | | | | | | | | | | | If vxlan_build_skb return err < 0, tx_errors should be also increased. Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/mlx4_en: protect ring->xdp_prog with rcu_read_lockBrenden Blanco2016-09-063-11/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Depending on the preempt mode, the bpf_prog stored in xdp_prog may be freed despite the use of call_rcu inside bpf_prog_put. The situation is possible when running in PREEMPT_RCU=y mode, for instance, since the rcu callback for destroying the bpf prog can run even during the bh handling in the mlx4 rx path. Several options were considered before this patch was settled on: Add a napi_synchronize loop in mlx4_xdp_set, which would occur after all of the rings are updated with the new program. This approach has the disadvantage that as the number of rings increases, the speed of update will slow down significantly due to napi_synchronize's msleep(1). Add a new rcu_head in bpf_prog_aux, to be used by a new bpf_prog_put_bh. The action of the bpf_prog_put_bh would be to then call bpf_prog_put later. Those drivers that consume a bpf prog in a bh context (like mlx4) would then use the bpf_prog_put_bh instead when the ring is up. This has the problem of complexity, in maintaining proper refcnts and rcu lists, and would likely be harder to review. In addition, this approach to freeing must be exclusive with other frees of the bpf prog, for instance a _bh prog must not be referenced from a prog array that is consumed by a non-_bh prog. The placement of rcu_read_lock in this patch is functionally the same as putting an rcu_read_lock in napi_poll. Actually doing so could be a potentially controversial change, but would bring the implementation in line with sk_busy_loop (though of course the nature of those two paths is substantially different), and would also avoid future copy/paste problems with future supporters of XDP. Still, this patch does not take that opinionated option. Testing was done with kernels in either PREEMPT_RCU=y or CONFIG_PREEMPT_VOLUNTARY=y+PREEMPT_RCU=n modes, with neither exhibiting any drawback. With PREEMPT_RCU=n, the extra call to rcu_read_lock did not show up in the perf report whatsoever, and with PREEMPT_RCU=y the overhead of rcu_read_lock (according to perf) was the same before/after. In the rx path, rcu_read_lock is eventually called for every packet from netif_receive_skb_internal, so the napi poll call's rcu_read_lock is easily amortized. v2: Remove extra rcu_read_lock in mlx4_en_process_rx_cq body Annotate xdp_prog with __rcu, and convert all usages to rcu_assign or rcu_dereference[_protected] as appropriate. Add explicit mutex lock around rcu_assign instead of xchg loop. Fixes: d576acf0a22 ("net/mlx4_en: add page recycle to prepare rx ring for tx support") Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'mediatek-rx-path-enhancements'David S. Miller2016-09-061-11/+15
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sean Wang says: ==================== net: ethernet: mediatek: add enhancements to RX path Changes since v1: - fix message typos and add coverletter Changes since v2: - split from the previous series for submitting add enhancements as a series targeting 'net-next' and add indents before comments. Changes since v3: - merge the patch using PDMA RX path - fixed the input of mtk_poll_rx is with the remaining budget Changes since v4: - save one wmb and register update when no packet is being handled inside mtk_poll_rx call - fixed incorrect return packet count from mtk_napi_rx ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: ethernet: mediatek: enhance RX path by aggregating more SKBs into NAPISean Wang2016-09-061-14/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch adds support for aggregating more SKBs feed into NAPI in order to get more benefits from generic receive offload (GRO) by peeking at the RX ring status and moving more packets right before returning from NAPI RX polling handler if NAPI budgets are still available and some packets already present in hardware. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: ethernet: mediatek: enhance RX path by reducing the frequency of the ↵Sean Wang2016-09-061-5/+6
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | memory barrier used The patch makes move wmb() to outside the loop that could help RX path handling more faster although that RX descriptors aren't freed for DMA to use as soon as possible, but based on my experiment and the result shows it still can reach about 943Mbpis without performance drop that is tested based on the setup with one port using Giga PHY and 256 RX descriptors for DMA to move. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'hso-neatening'David S. Miller2016-09-061-63/+55
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Joe Perches says: ==================== hso: neatening This seems to be the only code in the kernel that uses macro defines with a trailing underscore. Fix that. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | hso: Convert printk to pr_<level>Joe Perches2016-09-061-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use a more common logging style Miscellanea: o Add pr_fmt to prefix each output message o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | hso: Use a more common logging styleJoe Perches2016-09-061-53/+44
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | Macros that end in an underscore are just odd. Add hso_dbg(level, fmt, ...) and use it everwhere instead. Several uses had additional unnecessary newlines as the macro added a newline. Remove the newline from the macro and add newlines to each use as appropriate. Remove now unused D<digit> macros. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | smsc95xx: Add mdix control via ethtoolWoojung Huh2016-09-061-3/+106
| | | | | | | | | | | | | | Add mdix control through ethtool. Signed-off-by: Woojung Huh <Woojung.huh@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | smsc95xx: Add register defineWoojung Huh2016-09-061-0/+8
| | | | | | | | | | | | | | Add STRAP_STATUS defines. Signed-off-by: Woojung Huh <Woojung.huh@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | smsc95xx: Add maintainerWoojung Huh2016-09-061-0/+1
| | | | | | | | | | | | | | | | | | Add Microchip Linux Driver Support as maintainer because this driver is maintaining by Microchip. Signed-off-by: Woojung Huh <Woojung.huh@gmail.com> Acked-by: Steve Glendinning <steve.glendinning@shawell.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'mv88e6xxx-isolate-Global2'David S. Miller2016-09-066-451/+596
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Vivien Didelot says: ==================== net: dsa: mv88e6xxx: isolate Global2 support Registers of Marvell chips are organized in internal SMI devices. One of them at address 0x1C is called Global2. It provides an extended set of registers, used for interrupt control, EEPROM access, indirect PHY access (to bypass the PHY Polling Unit) and cross-chip setup. Most chips have it, but some others don't (older ones such as 6060). Now that its related code is isolated in mv88e6xxx_g2_* functions, move it to its own global2.c file, making most of its setup code static. Then make its compilation optional, which allows to reduce the size of the mv88e6xxx driver for devices such as home routers embedding Ethernet chips without Global2 support. It is present on most recent chips, thus enable its support by default. Changes in v2: fail probe if GLOBAL2 is required but not enabled. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: dsa: mv88e6xxx: make global2 code optionalVivien Didelot2016-09-064-1/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since not every chip has a Global2 set of registers, make its support optional, in which case the related functions will return -EOPNOTSUPP. This also allows to reduce the size of the mv88e6xxx driver for devices such as home routers embedding Ethernet chips without Global2 support. It is present on most recent chips, thus enable its support by default. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: dsa: mv88e6xxx: move Global2 codeVivien Didelot2016-09-065-450/+521
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Marvell chips are composed of multiple SMI devices. One of them at address 0x1C is called Global2. It provides an extended set of registers, used for interrupt control, EEPROM access, indirect PHY access (to bypass the PHY Polling Unit) and cross-chip related setup. Most chips have it, but some others don't (older ones such as 6060). Now that its related code is isolated in mv88e6xxx_g2_* functions, move it to its own global2.c file, making most of its setup code static. Document each registers in the meantime. Its compilation can be later avoided for chips without such registers. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: dsa: mv88e6xxx: fix module namingVivien Didelot2016-09-061-1/+2
|/ / | | | | | | | | | | | | | | | | Since the mv88e6xxx.c file has been renamed, the driver compiled as a module is called chip.ko instead of mv88e6xxx.ko. Fix this. Fixes: fad09c73c270 ("net: dsa: mv88e6xxx: rename single-chip support") Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller2016-09-0646-1613/+1380
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree. Most relevant updates are the removal of per-conntrack timers to use a workqueue/garbage collection approach instead from Florian Westphal, the hash and numgen expression for nf_tables from Laura Garcia, updates on nf_tables hash set to honor the NLM_F_EXCL flag, removal of ip_conntrack sysctl and many other incremental updates on our Netfilter codebase. More specifically, they are: 1) Retrieve only 4 bytes to fetch ports in case of non-linear skb transport area in dccp, sctp, tcp, udp and udplite protocol conntrackers, from Gao Feng. 2) Missing whitespace on error message in physdev match, from Hangbin Liu. 3) Skip redundant IPv4 checksum calculation in nf_dup_ipv4, from Liping Zhang. 4) Add nf_ct_expires() helper function and use it, from Florian Westphal. 5) Replace opencoded nf_ct_kill() call in IPVS conntrack support, also from Florian. 6) Rename nf_tables set implementation to nft_set_{name}.c 7) Introduce the hash expression to allow arbitrary hashing of selector concatenations, from Laura Garcia Liebana. 8) Remove ip_conntrack sysctl backward compatibility code, this code has been around for long time already, and we have two interfaces to do this already: nf_conntrack sysctl and ctnetlink. 9) Use nf_conntrack_get_ht() helper function whenever possible, instead of opencoding fetch of hashtable pointer and size, patch from Liping Zhang. 10) Add quota expression for nf_tables. 11) Add number generator expression for nf_tables, this supports incremental and random generators that can be combined with maps, very useful for load balancing purpose, again from Laura Garcia Liebana. 12) Fix a typo in a debug message in FTP conntrack helper, from Colin Ian King. 13) Introduce a nft_chain_parse_hook() helper function to parse chain hook configuration, this is used by a follow up patch to perform better chain update validation. 14) Add rhashtable_lookup_get_insert_key() to rhashtable and use it from the nft_set_hash implementation to honor the NLM_F_EXCL flag. 15) Missing nulls check in nf_conntrack from nf_conntrack_tuple_taken(), patch from Florian Westphal. 16) Don't use the DYING bit to know if the conntrack event has been already delivered, instead a state variable to track event re-delivery states, also from Florian. 17) Remove the per-conntrack timer, use the workqueue approach that was discussed during the NFWS, from Florian Westphal. 18) Use the netlink conntrack table dump path to kill stale entries, again from Florian. 19) Add a garbage collector to get rid of stale conntracks, from Florian. 20) Reschedule garbage collector if eviction rate is high. 21) Get rid of the __nf_ct_kill_acct() helper. 22) Use ARPHRD_ETHER instead of hardcoded 1 from ARP logger. 23) Make nf_log_set() interface assertive on unsupported families. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netfilter: log: Check param to avoid overflow in nf_log_setGao Feng2016-08-306-13/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | The nf_log_set is an interface function, so it should do the strict sanity check of parameters. Convert the return value of nf_log_set as int instead of void. When the pf is invalid, return -EOPNOTSUPP. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: log_arp: Use ARPHRD_ETHER instead of literal '1'Gao Feng2016-08-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | There is one macro ARPHRD_ETHER which defines the ethernet proto for ARP, so we could use it instead of the literal number '1'. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: remove __nf_ct_kill_acct helperFlorian Westphal2016-08-302-17/+8
| | | | | | | | | | | | | | | | | | | | | | | | After timer removal this just calls nf_ct_delete so remove the __ prefix version and make nf_ct_kill a shorthand for nf_ct_delete. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: conntrack: resched gc again if eviction rate is highFlorian Westphal2016-08-301-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we evicted a large fraction of the scanned conntrack entries re-schedule the next gc cycle for immediate execution. This triggers during tests where load is high, then drops to zero and many connections will be in TW/CLOSE state with < 30 second timeouts. Without this change it will take several minutes until conntrack count comes back to normal. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: conntrack: add gc worker to remove timed-out entriesFlorian Westphal2016-08-301-0/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conntrack gc worker to evict stale entries. GC happens once every 5 seconds, but we only scan at most 1/64th of the table (and not more than 8k) buckets to avoid hogging cpu. This means that a complete scan of the table will take several minutes of wall-clock time. Considering that the gc run will never have to evict any entries during normal operation because those will happen from packet path this should be fine. We only need gc to make sure userspace (conntrack event listeners) eventually learn of the timeout, and for resource reclaim in case the system becomes idle. We do not disable BH and cond_resched for every bucket so this should not introduce noticeable latencies either. A followup patch will add a small change to speed up GC for the extreme case where most entries are timed out on an otherwise idle system. v2: Use cond_resched_rcu_qs & add comment wrt. missing restart on nulls value change in gc worker, suggested by Eric Dumazet. v3: don't call cancel_delayed_work_sync twice (again, Eric). Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: evict stale entries on netlink dumpsFlorian Westphal2016-08-301-1/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When dumping we already have to look at the entire table, so we might as well toss those entries whose timeout value is in the past. We also look at every entry during resize operations. However, eviction there is not as simple because we hold the global resize lock so we can't evict without adding a 'expired' list to drop from later. Considering that resizes are very rare it doesn't seem worth doing it. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: conntrack: get rid of conntrack timerFlorian Westphal2016-08-305-63/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With stats enabled this eats 80 bytes on x86_64 per nf_conn entry, as Eric Dumazet pointed out during netfilter workshop 2016. Eric also says: "Another reason was the fact that Thomas was about to change max timer range [..]" (500462a9de657f8, 'timers: Switch to a non-cascading wheel'). Remove the timer and use a 32bit jiffies value containing timestamp until entry is valid. During conntrack lookup, even before doing tuple comparision, check the timeout value and evict the entry in case it is too old. The dying bit is used as a synchronization point to avoid races where multiple cpus try to evict the same entry. Because lookup is always lockless, we need to bump the refcnt once when we evict, else we could try to evict already-dead entry that is being recycled. This is the standard/expected way when conntrack entries are destroyed. Followup patches will introduce garbage colliction via work queue and further places where we can reap obsoleted entries (e.g. during netlink dumps), this is needed to avoid expired conntracks from hanging around for too long when lookup rate is low after a busy period. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: don't rely on DYING bit to detect when destroy event was sentFlorian Westphal2016-08-302-13/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The reliable event delivery mode currently (ab)uses the DYING bit to detect which entries on the dying list have to be skipped when re-delivering events from the eache worker in reliable event mode. Currently when we delete the conntrack from main table we only set this bit if we could also deliver the netlink destroy event to userspace. If we fail we move it to the dying list, the ecache worker will reattempt event delivery for all confirmed conntracks on the dying list that do not have the DYING bit set. Once timer is gone, we can no longer use if (del_timer()) to detect when we 'stole' the reference count owned by the timer/hash entry, so we need some other way to avoid racing with other cpu. Pablo suggested to add a marker in the ecache extension that skips entries that have been unhashed from main table but are still waiting for the last reference count to be dropped (e.g. because one skb waiting on nfqueue verdict still holds a reference). We do this by adding a tristate. If we fail to deliver the destroy event, make a note of this in the eache extension. The worker can then skip all entries that are in a different state. Either they never delivered a destroy event, e.g. because the netlink backend was not loaded, or redelivery took place already. Once the conntrack timer is removed we will now be able to replace del_timer() test with test_and_set_bit(DYING, &ct->status) to avoid racing with other cpu that tries to evict the same conntrack. Because DYING will then be set right before we report the destroy event we can no longer skip event reporting when dying bit is set. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: restart search if moved to other chainFlorian Westphal2016-08-301-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case nf_conntrack_tuple_taken did not find a conflicting entry check that all entries in this hash slot were tested and restart in case an entry was moved to another chain. Reported-by: Eric Dumazet <edumazet@google.com> Fixes: ea781f197d6a ("netfilter: nf_conntrack: use SLAB_DESTROY_BY_RCU and get rid of call_rcu()") Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nf_tables: Use nla_put_be32() to dump immediate parametersPablo Neira Ayuso2016-08-262-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | nft_dump_register() should only be used with registers, not with immediates. Fixes: cb1b69b0b15b ("netfilter: nf_tables: add hash expression") Fixes: 91dbc6be0a62("netfilter: nf_tables: add number generator expression") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nf_tables: honor NLM_F_EXCL flag in set element insertionPablo Neira Ayuso2016-08-264-14/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the NLM_F_EXCL flag is set, then new elements that clash with an existing one return EEXIST. In case you try to add an element whose data area differs from what we have, then this returns EBUSY. If no flag is specified at all, then this returns success to userspace. This patch also update the set insert operation so we can fetch the existing element that clashes with the one you want to add, we need this to make sure the element data doesn't differ. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>