| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes following warnings :
net/core/secure_seq.c:125:28: warning: incorrect type in argument 1
(different base types)
net/core/secure_seq.c:125:28: expected unsigned int const [unsigned]
[usertype] a
net/core/secure_seq.c:125:28: got restricted __be32 [usertype] saddr
net/core/secure_seq.c:125:35: warning: incorrect type in argument 2
(different base types)
net/core/secure_seq.c:125:35: expected unsigned int const [unsigned]
[usertype] b
net/core/secure_seq.c:125:35: got restricted __be32 [usertype] daddr
net/core/secure_seq.c:125:43: warning: cast from restricted __be16
net/core/secure_seq.c:125:61: warning: restricted __be16 degrades to
integer
Fixes: 7cd23e5300c1 ("secure_seq: use SipHash in place of MD5")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes the following sparse warning:
net/core/lwt_bpf.c:355:5: warning:
symbol 'bpf_lwt_prog_cmp' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When structs are used to store temporary state in cb[] buffer that is
used with programs and among tail calls, then the generated code will
not always access the buffer in bpf_w chunks. We can ease programming
of it and let this act more natural by allowing for aligned b/h/w/dw
sized access for cb[] ctx member. Various test cases are attached as
well for the selftest suite. Potentially, this can also be reused for
other program types to pass data around.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, when calling convert_ctx_access() callback for the various
program types, we pass in insn->dst_reg, insn->src_reg, insn->off from
the original instruction. This information is needed to rewrite the
instruction that is based on the user ctx structure into a kernel
representation for the ctx. As we'd like to allow access size beyond
just BPF_W, we'd need also insn->code for that in order to decode the
original access size. Given that, lets just pass insn directly to the
convert_ctx_access() callback and work on that to not clutter the
callback with even more arguments we need to pass when everything is
already contained in insn. So lets go through that once, no functional
change.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When introducing the new socket family AF_SMC in
commit ac7138746e14 ("smc: establish new socket family"),
a typo in af_family_clock_key_strings has slipped in.
This patch repairs it.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Fixes: ac7138746e14 ("smc: establish new socket family")
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
| |
netif_wake_subqueue() is duplicating the same thing that netif_tx_wake_queue()
does, so make it call it directly after looking up the queue from the index.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| | |
Two AF_* families adding entries to the lockdep tables
at the same time.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Merge fixes from Andrew Morton:
"27 fixes.
There are three patches that aren't actually fixes. They're simple
function renamings which are nice-to-have in mainline as ongoing net
development depends on them."
* akpm: (27 commits)
timerfd: export defines to userspace
mm/hugetlb.c: fix reservation race when freeing surplus pages
mm/slab.c: fix SLAB freelist randomization duplicate entries
zram: support BDI_CAP_STABLE_WRITES
zram: revalidate disk under init_lock
mm: support anonymous stable page
mm: add documentation for page fragment APIs
mm: rename __page_frag functions to __page_frag_cache, drop order from drain
mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free
mm, memcg: fix the active list aging for lowmem requests when memcg is enabled
mm: don't dereference struct page fields of invalid pages
mailmap: add codeaurora.org names for nameless email commits
signal: protect SIGNAL_UNKILLABLE from unintentional clearing.
mm: pmd dirty emulation in page fault handler
ipc/sem.c: fix incorrect sem_lock pairing
lib/Kconfig.debug: fix frv build failure
mm: get rid of __GFP_OTHER_NODE
mm: fix remote numa hits statistics
mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}
ocfs2: fix crash caused by stale lvb with fsdlm plugin
...
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
page_frag_free
Patch series "Page fragment updates", v4.
This patch series takes care of a few cleanups for the page fragments
API.
First we do some renames so that things are much more consistent. First
we move the page_frag_ portion of the name to the front of the functions
names. Secondly we split out the cache specific functions from the
other page fragment functions by adding the word "cache" to the name.
Finally I added a bit of documentation that will hopefully help to
explain some of this. I plan to revisit this later as we get things
more ironed out in the near future with the changes planned for the DMA
setup to support eXpress Data Path.
This patch (of 3):
This patch renames the page frag functions to be more consistent with
other APIs. Specifically we place the name page_frag first in the name
and then have either an alloc or free call name that we append as the
suffix. This makes it a bit clearer in terms of naming.
In addition we drop the leading double underscores since we are
technically no longer a backing interface and instead the front end that
is called from the networking APIs.
Link: http://lkml.kernel.org/r/20170104023854.13451.67390.stgit@localhost.localdomain
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
On 32bit arches, (skb->end - skb->data) is not 'unsigned int',
so we shall use min_t() instead of min() to avoid a compiler error.
Fixes: 1272ce87fa01 ("gro: Enter slow-path if there is no tailroom")
Reported-by: kernel test robot <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The GRO path has a fast-path where we avoid calling pskb_may_pull
and pskb_expand by directly accessing frag0. However, this should
only be done if we have enough tailroom in the skb as otherwise
we'll have to expand it later anyway.
This patch adds the check by capping frag0_len with the skb tailroom.
Fixes: cb18978cbf45 ("gro: Open-code final pskb_may_pull")
Reported-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Commit bdabad3e363d ("net: Add Qualcomm IPC router") introduced a
new address family. Update the family name tables accordingly so
that the lockdep initialization can use the proper names for this
family.
Cc: Courtney Cavin <courtney.cavin@sonymobile.com>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Removes following sparse complain :
net/core/flow_dissector.c:70:8: warning: symbol 'skb_flow_get_be16'
was not declared. Should it be static?
Fixes: 972d3876faa8 ("flow dissector: ICMP support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Allow dissection of (R)ARP operation hardware and protocol addresses
for Ethernet hardware and IPv4 protocol addresses.
There are currently no users of FLOW_DISSECTOR_KEY_ARP.
A follow-up patch will allow FLOW_DISSECTOR_KEY_ARP to be used by the
flower classifier.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
since ARG_PTR_TO_STACK is no longer just pointer to stack
rename it to ARG_PTR_TO_MEM and adjust comment.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* enable smc module loading and unloading
* register new socket family
* basic smc socket creation and deletion
* use backing TCP socket to run CLC (Connection Layer Control)
handshake of SMC protocol
* Setup for infiniband traffic is implemented in follow-on patches.
For now fallback to TCP socket is always used.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Utz Bacher <utz.bacher@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Direct call of tcp_set_keepalive() function from protocol-agnostic
sock_setsockopt() function in net/core/sock.c violates network
layering. And newly introduced protocol (SMC-R) will need its own
keepalive function. Therefore, add "keepalive" function pointer
to "struct proto", and call it from sock_setsockopt() via this pointer.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Utz Bacher <utz.bacher@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This gives a clear speed and security improvement. Siphash is both
faster and is more solid crypto than the aging MD5.
Rather than manually filling MD5 buffers, for IPv6, we simply create
a layout by a simple anonymous struct, for which gcc generates
rather efficient code. For IPv4, we pass the values directly to the
short input convenience functions.
64-bit x86_64:
[ 1.683628] secure_tcpv6_sequence_number_md5# cycles: 99563527
[ 1.717350] secure_tcp_sequence_number_md5# cycles: 92890502
[ 1.741968] secure_tcpv6_sequence_number_siphash# cycles: 67825362
[ 1.762048] secure_tcp_sequence_number_siphash# cycles: 67485526
32-bit x86:
[ 1.600012] secure_tcpv6_sequence_number_md5# cycles: 103227892
[ 1.634219] secure_tcp_sequence_number_md5# cycles: 94732544
[ 1.669102] secure_tcpv6_sequence_number_siphash# cycles: 96299384
[ 1.700165] secure_tcp_sequence_number_siphash# cycles: 86015473
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: David Laight <David.Laight@aculab.com>
Cc: Tom Herbert <tom@herbertland.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Field tc_at is used only within tc actions to distinguish ingress from
egress processing. A single bit is sufficient for this purpose.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Extract the remaining two fields from tc_verd and remove the __u16
completely. TC_AT and TC_FROM are converted to equivalent two-bit
integer fields tc_at and tc_from. Where possible, use existing
helper skb_at_tc_ingress when reading tc_at. Introduce helper
skb_reset_tc to clear fields.
Not documenting tc_from and tc_at, because they will be replaced
with single bit fields in follow-on patches.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Packets sent by the IFB device skip subsequent tc classification.
A single bit governs this state. Move it out of tc_verd in
anticipation of removing that __u16 completely.
The new bitfield tc_skip_classify temporarily uses one bit of a
hole, until tc_verd is removed completely in a follow-up patch.
Remove the bit hole comment. It could be 2, 3, 4 or 5 bits long.
With that many options, little value in documenting it.
Introduce a helper function to deduplicate the logic in the two
sites that check this bit.
The field tc_skip_classify is set only in IFB on skbs cloned in
act_mirred, so original packet sources do not have to clear the
bit when reusing packets (notably, pktgen and octeon).
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\| |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Final nlmsg_len field update must reflect inserted net_dm_drop_point
data.
This patch depends on previous patch:
"drop_monitor: add missing call to genlmsg_end"
Signed-off-by: Reiter Wolfgang <wr0112358@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
__skb_flow_dissect can be called with a skb or a data packet, either
can be NULL. All calls seems to have been moved to __skb_header_pointer
except the pptp handling which is still calling skb_header_pointer.
skb_header_pointer will use skb->data and thus:
[ 109.556866] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
[ 109.557102] IP: [<ffffffff88dc02f8>] __skb_flow_dissect+0xa88/0xce0
[ 109.557263] PGD 0
[ 109.557338]
[ 109.557484] Oops: 0000 [#1] SMP
[ 109.557562] Modules linked in: chaoskey
[ 109.557783] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.9.0 #79
[ 109.557867] Hardware name: Supermicro A1SRM-LN7F/LN5F/A1SRM-LN7F-2758, BIOS 1.0c 11/04/2015
[ 109.557957] task: ffff94085c27bc00 task.stack: ffffb745c0068000
[ 109.558041] RIP: 0010:[<ffffffff88dc02f8>] [<ffffffff88dc02f8>] __skb_flow_dissect+0xa88/0xce0
[ 109.558203] RSP: 0018:ffff94087fc83d40 EFLAGS: 00010206
[ 109.558286] RAX: 0000000000000130 RBX: ffffffff8975bf80 RCX: ffff94084fab6800
[ 109.558373] RDX: 0000000000000010 RSI: 000000000000000c RDI: 0000000000000000
[ 109.558460] RBP: 0000000000000b88 R08: 0000000000000000 R09: 0000000000000022
[ 109.558547] R10: 0000000000000008 R11: ffff94087fc83e04 R12: 0000000000000000
[ 109.558763] R13: ffff94084fab6800 R14: ffff94087fc83e04 R15: 000000000000002f
[ 109.558979] FS: 0000000000000000(0000) GS:ffff94087fc80000(0000) knlGS:0000000000000000
[ 109.559326] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 109.559539] CR2: 0000000000000080 CR3: 0000000281809000 CR4: 00000000001026e0
[ 109.559753] Stack:
[ 109.559957] 000000000000000c ffff94084fab6822 0000000000000001 ffff94085c2b5fc0
[ 109.560578] 0000000000000001 0000000000002000 0000000000000000 0000000000000000
[ 109.561200] 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 109.561820] Call Trace:
[ 109.562027] <IRQ>
[ 109.562108] [<ffffffff88dfb4fa>] ? eth_get_headlen+0x7a/0xf0
[ 109.562522] [<ffffffff88c5a35a>] ? igb_poll+0x96a/0xe80
[ 109.562737] [<ffffffff88dc912b>] ? net_rx_action+0x20b/0x350
[ 109.562953] [<ffffffff88546d68>] ? __do_softirq+0xe8/0x280
[ 109.563169] [<ffffffff8854704a>] ? irq_exit+0xaa/0xb0
[ 109.563382] [<ffffffff8847229b>] ? do_IRQ+0x4b/0xc0
[ 109.563597] [<ffffffff8902d4ff>] ? common_interrupt+0x7f/0x7f
[ 109.563810] <EOI>
[ 109.563890] [<ffffffff88d57530>] ? cpuidle_enter_state+0x130/0x2c0
[ 109.564304] [<ffffffff88d57520>] ? cpuidle_enter_state+0x120/0x2c0
[ 109.564520] [<ffffffff8857eacf>] ? cpu_startup_entry+0x19f/0x1f0
[ 109.564737] [<ffffffff8848d55a>] ? start_secondary+0x12a/0x140
[ 109.564950] Code: 83 e2 20 a8 80 0f 84 60 01 00 00 c7 04 24 08 00
00 00 66 85 d2 0f 84 be fe ff ff e9 69 fe ff ff 8b 34 24 89 f2 83 c2
04 66 85 c0 <41> 8b 84 24 80 00 00 00 0f 49 d6 41 8d 31 01 d6 41 2b 84
24 84
[ 109.569959] RIP [<ffffffff88dc02f8>] __skb_flow_dissect+0xa88/0xce0
[ 109.570245] RSP <ffff94087fc83d40>
[ 109.570453] CR2: 0000000000000080
Fixes: ab10dccb1160 ("rps: Inspect PPTP encapsulated by GRE to get flow hash")
Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Update nlmsg_len field with genlmsg_end to enable userspace processing
using nlmsg_next helper. Also adds error handling.
Signed-off-by: Reiter Wolfgang <wr0112358@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We miss to check if the netlink message is actually big enough to contain
a struct if_stats_msg.
Add a check to prevent userland from sending us short messages that would
make us access memory beyond the end of the message.
Fixes: 10c9ead9f3c6 ("rtnetlink: add new RTM_GETSTATS message to dump...")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
sizeof(struct cmsghdr) and sizeof(struct compat_cmsghdr) already aligned.
remove use CMSG_ALIGN(sizeof(struct cmsghdr)) and
CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr)) keep code consistent.
Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Oftenly, introducing side effects on packet processing on the other half
of the stack by adjusting one of TX/RX via sysctl is not desirable.
There are cases of demand for asymmetric, orthogonal configurability.
This holds true especially for nodes where RPS for RFS usage on top is
configured and therefore use the 'old dev_weight'. This is quite a
common base configuration setup nowadays, even with NICs of superior processing
support (e.g. aRFS).
A good example use case are nodes acting as noSQL data bases with a
large number of tiny requests and rather fewer but large packets as responses.
It's affordable to have large budget and rx dev_weights for the
requests. But as a side effect having this large a number on TX
processed in one run can overwhelm drivers.
This patch therefore introduces an independent configurability via sysctl to
userland.
Signed-off-by: Matthias Tafelmeier <matthias.tafelmeier@gmx.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|/
|
|
|
|
|
|
| |
Different namespace application might require different maximal
number of remembered connection requests.
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull networking fixes from David Miller:
1) Various ipvlan fixes from Eric Dumazet and Mahesh Bandewar.
The most important is to not assume the packet is RX just because
the destination address matches that of the device. Such an
assumption causes problems when an interface is put into loopback
mode.
2) If we retry when creating a new tc entry (because we dropped the
RTNL mutex in order to load a module, for example) we end up with
-EAGAIN and then loop trying to replay the request. But we didn't
reset some state when looping back to the top like this, and if
another thread meanwhile inserted the same tc entry we were trying
to, we re-link it creating an enless loop in the tc chain. Fix from
Daniel Borkmann.
3) There are two different WRITE bits in the MDIO address register for
the stmmac chip, depending upon the chip variant. Due to a bug we
could set them both, fix from Hock Leong Kweh.
4) Fix mlx4 bug in XDP_TX handling, from Tariq Toukan.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
net: stmmac: fix incorrect bit set in gmac4 mdio addr register
r8169: add support for RTL8168 series add-on card.
net: xdp: remove unused bfp_warn_invalid_xdp_buffer()
openvswitch: upcall: Fix vlan handling.
ipv4: Namespaceify tcp_tw_reuse knob
net: korina: Fix NAPI versus resources freeing
net, sched: fix soft lockup in tc_classify
net/mlx4_en: Fix user prio field in XDP forward
tipc: don't send FIN message from connectionless socket
ipvlan: fix multicast processing
ipvlan: fix various issues in ipvlan_process_multicast()
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
After commit 73b62bd085f4737679ea9afc7867fa5f99ba7d1b ("virtio-net:
remove the warning before XDP linearizing"), there's no users for
bpf_warn_invalid_xdp_buffer(), so remove it. This is a revert for
commit f23bc46c30ca5ef58b8549434899fcbac41b2cfc.
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.
Get rid of the union and just keep ktime_t as simple typedef of type s64.
The conversion was done with coccinelle and some manual mopping up.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
|
|/
|
|
|
|
|
|
|
|
|
|
|
| |
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)
to do the replacement at the end of the merge window.
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
neigh_cleanup_and_release() is always called after marking a neighbour
as dead, but it only notifies user space and not in-kernel listeners of
the netevent notification chain.
This can cause multiple problems. In my specific use case, it causes the
listener (a switch driver capable of L3 offloads) to believe a neighbour
entry is still valid, and is thus erroneously kept in the device's
table.
Fix that by sending a netevent after marking the neighbour as dead.
Fixes: a6bf9e933daf ("mlxsw: spectrum_router: Offload neighbours based on NUD state change")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull networking fixes and cleanups from David Miller:
1) Revert bogus nla_ok() change, from Alexey Dobriyan.
2) Various bpf validator fixes from Daniel Borkmann.
3) Add some necessary SET_NETDEV_DEV() calls to hsis_femac and hip04
drivers, from Dongpo Li.
4) Several ethtool ksettings conversions from Philippe Reynes.
5) Fix bugs in inet port management wrt. soreuseport, from Tom Herbert.
6) XDP support for virtio_net, from John Fastabend.
7) Fix NAT handling within a vrf, from David Ahern.
8) Endianness fixes in dpaa_eth driver, from Claudiu Manoil
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (63 commits)
net: mv643xx_eth: fix build failure
isdn: Constify some function parameters
mlxsw: spectrum: Mark split ports as such
cgroup: Fix CGROUP_BPF config
qed: fix old-style function definition
net: ipv6: check route protocol when deleting routes
r6040: move spinlock in r6040_close as SOFTIRQ-unsafe lock order detected
irda: w83977af_ir: cleanup an indent issue
net: sfc: use new api ethtool_{get|set}_link_ksettings
net: davicom: dm9000: use new api ethtool_{get|set}_link_ksettings
net: cirrus: ep93xx: use new api ethtool_{get|set}_link_ksettings
net: chelsio: cxgb3: use new api ethtool_{get|set}_link_ksettings
net: chelsio: cxgb2: use new api ethtool_{get|set}_link_ksettings
bpf: fix mark_reg_unknown_value for spilled regs on map value marking
bpf: fix overflow in prog accounting
bpf: dynamically allocate digest scratch buffer
gtp: Fix initialization of Flags octet in GTPv1 header
gtp: gtp_check_src_ms_ipv4() always return success
net/x25: use designated initializers
isdn: use designated initializers
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This adds a warning for drivers to use when encountering an invalid
buffer for XDP. For normal cases this should not happen but to catch
this in virtual/qemu setups that I may not have expected from the
emulation layer having a standard warning is useful.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull audit updates from Paul Moore:
"After the small number of patches for v4.9, we've got a much bigger
pile for v4.10.
The bulk of these patches involve a rework of the audit backlog queue
to enable us to move the netlink multicasting out of the task/thread
that generates the audit record and into the kernel thread that emits
the record (just like we do for the audit unicast to auditd).
While we were playing with the backlog queue(s) we fixed a number of
other little problems with the code, and from all the testing so far
things look to be in much better shape now. Doing this also allowed us
to re-enable disabling IRQs for some netns operations ("netns: avoid
disabling irq for netns id").
The remaining patches fix some small problems that are well documented
in the commit descriptions, as well as adding session ID filtering
support"
* 'stable-4.10' of git://git.infradead.org/users/pcmoore/audit:
audit: use proper refcount locking on audit_sock
netns: avoid disabling irq for netns id
audit: don't ever sleep on a command record/message
audit: handle a clean auditd shutdown with grace
audit: wake up kauditd_thread after auditd registers
audit: rework audit_log_start()
audit: rework the audit queue handling
audit: rename the queues and kauditd related functions
audit: queue netlink multicast sends just like we do for unicast sends
audit: fixup audit_init()
audit: move kaudit thread start from auditd registration to kaudit init (#2)
audit: add support for session ID user filter
audit: fix formatting of AUDIT_CONFIG_CHANGE events
audit: skip sessionid sentinel value when auto-incrementing
audit: tame initialization warning len_abuf in audit_log_execve_info
audit: less stack usage for /proc/*/loginuid
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Bring back commit bc51dddf98c9 ("netns: avoid disabling irq for netns
id") now that we've fixed some audit multicast issues that caused
problems with original attempt. Additional information, and history,
can be found in the links below:
* https://github.com/linux-audit/audit-kernel/issues/22
* https://github.com/linux-audit/audit-kernel/issues/23
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp hotplug updates from Thomas Gleixner:
"This is the final round of converting the notifier mess to the state
machine. The removal of the notifiers and the related infrastructure
will happen around rc1, as there are conversions outstanding in other
trees.
The whole exercise removed about 2000 lines of code in total and in
course of the conversion several dozen bugs got fixed. The new
mechanism allows to test almost every hotplug step standalone, so
usage sites can exercise all transitions extensively.
There is more room for improvement, like integrating all the
pointlessly different architecture mechanisms of synchronizing,
setting cpus online etc into the core code"
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
tracing/rb: Init the CPU mask on allocation
soc/fsl/qbman: Convert to hotplug state machine
soc/fsl/qbman: Convert to hotplug state machine
zram: Convert to hotplug state machine
KVM/PPC/Book3S HV: Convert to hotplug state machine
arm64/cpuinfo: Convert to hotplug state machine
arm64/cpuinfo: Make hotplug notifier symmetric
mm/compaction: Convert to hotplug state machine
iommu/vt-d: Convert to hotplug state machine
mm/zswap: Convert pool to hotplug state machine
mm/zswap: Convert dst-mem to hotplug state machine
mm/zsmalloc: Convert to hotplug state machine
mm/vmstat: Convert to hotplug state machine
mm/vmstat: Avoid on each online CPU loops
mm/vmstat: Drop get_online_cpus() from init_cpu_node_state/vmstat_cpu_dead()
tracing/rb: Convert to hotplug state machine
oprofile/nmi timer: Convert to hotplug state machine
net/iucv: Use explicit clean up labels in iucv_init()
x86/pci/amd-bus: Convert to hotplug state machine
x86/oprofile/nmi: Convert to hotplug state machine
...
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Install the callbacks via the state machine. Use multi state support to avoid
custom list handling for the multiple instances.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: netdev@vger.kernel.org
Cc: rt@linutronix.de
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20161103145021.28528-10-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Install the callbacks via the state machine.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: netdev@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161103145021.28528-9-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|\ \ \ \
| |_|_|/
|/| | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"The tree got pretty big in this development cycle, but the net effect
is pretty good:
115 files changed, 673 insertions(+), 1522 deletions(-)
The main changes were:
- Rework and generalize the mutex code to remove per arch mutex
primitives. (Peter Zijlstra)
- Add vCPU preemption support: add an interface to query the
preemption status of vCPUs and use it in locking primitives - this
optimizes paravirt performance. (Pan Xinhui, Juergen Gross,
Christian Borntraeger)
- Introduce cpu_relax_yield() and remov cpu_relax_lowlatency() to
clean up and improve the s390 lock yielding machinery and its core
kernel impact. (Christian Borntraeger)
- Micro-optimize mutexes some more. (Waiman Long)
- Reluctantly add the to-be-deprecated mutex_trylock_recursive()
interface on a temporary basis, to give the DRM code more time to
get rid of its locking hacks. Any other users will be NAK-ed on
sight. (We turned off the deprecation warning for the time being to
not pollute the build log.) (Peter Zijlstra)
- Improve the rtmutex code a bit, in light of recent long lived
bugs/races. (Thomas Gleixner)
- Misc fixes, cleanups"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
x86/paravirt: Fix bool return type for PVOP_CALL()
x86/paravirt: Fix native_patch()
locking/ww_mutex: Use relaxed atomics
locking/rtmutex: Explain locking rules for rt_mutex_proxy_unlock()/init_proxy_locked()
locking/rtmutex: Get rid of RT_MUTEX_OWNER_MASKALL
x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()
locking/mutex: Break out of expensive busy-loop on {mutex,rwsem}_spin_on_owner() when owner vCPU is preempted
locking/osq: Break out of spin-wait busy waiting loop for a preempted vCPU in osq_lock()
Documentation/virtual/kvm: Support the vCPU preemption check
x86/xen: Support the vCPU preemption check
x86/kvm: Support the vCPU preemption check
x86/kvm: Support the vCPU preemption check
kvm: Introduce kvm_write_guest_offset_cached()
locking/core, x86/paravirt: Implement vcpu_is_preempted(cpu) for KVM and Xen guests
locking/spinlocks, s390: Implement vcpu_is_preempted(cpu)
locking/core, powerpc: Implement vcpu_is_preempted(cpu)
sched/core: Introduce the vcpu_is_preempted(cpu) interface
sched/wake_q: Rename WAKE_Q to DEFINE_WAKE_Q
locking/core: Provide common cpu_relax_yield() definition
locking/mutex: Don't mark mutex_trylock_recursive() as deprecated, temporarily
...
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
It seems attackers can also send UDP packets with no payload at all.
skb_condense() can still be a win in this case.
It will be possible to replace the custom code in tcp_add_backlog()
to get full benefit from skb_condense()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This patch allows XDP prog to extend/remove the packet
data at the head (like adding or removing header). It is
done by adding a new XDP helper bpf_xdp_adjust_head().
It also renames bpf_helper_changes_skb_data() to
bpf_helper_changes_pkt_data() to better reflect
that XDP prog does not work on skb.
This patch adds one "xdp_adjust_head" bit to bpf_prog for the
XDP-capable driver to check if the XDP prog requires
bpf_xdp_adjust_head() support. The driver can then decide
to error out during XDP_SETUP_PROG.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Under UDP flood, many softirq producers try to add packets to
UDP receive queue, and one user thread is burning one cpu trying
to dequeue packets as fast as possible.
Two parts of the per packet cost are :
- copying payload from kernel space to user space,
- freeing memory pieces associated with skb.
If socket is under pressure, softirq handler(s) can try to pull in
skb->head the payload of the packet if it fits.
Meaning the softirq handler(s) can free/reuse the page fragment
immediately, instead of letting udp_recvmsg() do this hundreds of usec
later, possibly from another node.
Additional gains :
- We reduce skb->truesize and thus can store more packets per SO_RCVBUF
- We avoid cache line misses at copyout() time and consume_skb() time,
and avoid one put_page() with potential alien freeing on NUMA hosts.
This comes at the cost of a copy, bounded to available tail room, which
is usually small. (We might have to fix GRO_MAX_HEAD which looks bigger
than necessary)
This patch gave me about 5 % increase in throughput in my tests.
skb_condense() helper could probably used in other contexts.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
RFS is not commonly used, so add a jump label to avoid some conditionals
in fast path.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Allow dissection of ICMP(V6) type and code. This should only occur
if a packet is ICMP(V6) and the dissector has FLOW_DISSECTOR_KEY_ICMP set.
There are currently no users of FLOW_DISSECTOR_KEY_ICMP.
A follow-up patch will allow FLOW_DISSECTOR_KEY_ICMP to be used by
the flower classifier.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
In UDP recvmsg() path we currently access 3 cache lines from an skb
while holding receive queue lock, plus another one if packet is
dequeued, since we need to change skb->next->prev
1st cache line (contains ->next/prev pointers, offsets 0x00 and 0x08)
2nd cache line (skb->len & skb->peeked, offsets 0x80 and 0x8e)
3rd cache line (skb->truesize/users, offsets 0xe0 and 0xe4)
skb->peeked is only needed to make sure 0-length packets are properly
handled while MSG_PEEK is operated.
I had first the intent to remove skb->peeked but the "MSG_PEEK at
non-zero offset" support added by Sam Kumar makes this not possible.
This patch avoids one cache line miss during the locked section, when
skb->len and skb->peeked do not have to be read.
It also avoids the skb_set_peeked() cost for non empty UDP datagrams.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Commit d691f9e8d440 ("bpf: allow programs to write to certain skb
fields") pushed access type check outside of __is_valid_access()
to have different restrictions for socket filters and tc programs.
type is thus not used anymore within __is_valid_access() and should
be removed as a function argument. Same for __is_valid_xdp_access()
introduced by 6a773a15a1e8 ("bpf: add XDP prog type for early driver
filter").
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
1) Old code was hard to maintain, due to complex lock chains.
(We probably will be able to remove some kfree_rcu() in callers)
2) Using a single timer to update all estimators does not scale.
3) Code was buggy on 32bit kernel (WRITE_ONCE() on 64bit quantity
is not supposed to work well)
In this rewrite :
- I removed the RB tree that had to be scanned in
gen_estimator_active(). qdisc dumps should be much faster.
- Each estimator has its own timer.
- Estimations are maintained in net_rate_estimator structure,
instead of dirtying the qdisc. Minor, but part of the simplification.
- Reading the estimator uses RCU and a seqcount to provide proper
support for 32bit kernels.
- We reduce memory need when estimators are not used, since
we store a pointer, instead of the bytes/packets counters.
- xt_rateest_mt() no longer has to grab a spinlock.
(In the future, xt_rateest_tg() could be switched to per cpu counters)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|