summaryrefslogtreecommitdiffstats
path: root/net/netlink
Commit message (Collapse)AuthorAgeFilesLines
* netlink: Annotate RCU locking for seq_file walkerThomas Graf2014-08-141-0/+2
| | | | | | | | | Silences the following sparse warnings: net/netlink/af_netlink.c:2926:21: warning: context imbalance in 'netlink_seq_start' - wrong count at exit net/netlink/af_netlink.c:2972:13: warning: context imbalance in 'netlink_seq_stop' - unexpected unlock Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: reset network header before passing to tapsDaniel Borkmann2014-08-071-1/+1
| | | | | | | | | | | | | | | | | | | netlink doesn't set any network header offset thus when the skb is being passed to tap devices via dev_queue_xmit_nit(), it emits klog false positives due to it being unset like: ... [ 124.990397] protocol 0000 is buggy, dev nlmon0 [ 124.990411] protocol 0000 is buggy, dev nlmon0 ... So just reset the network header before passing to the device; for packet sockets that just means nothing will change - mac and net offset hold the same value just as before. Reported-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: hold nl_sock_hash_lock during diag dumpThomas Graf2014-08-063-0/+5
| | | | | | | | | | | | | | | | Although RCU protection would be possible during diag dump, doing so allows for concurrent table mutations which can render the in-table offset between individual Netlink messages invalid and thus cause legitimate sockets to be skipped in the dump. Since the diag dump is relatively low volume and consistency is more important than performance, the table mutex is held during dump. Reported-by: Andrey Wagin <avagin@gmail.com> Signed-off-by: Thomas Graf <tgraf@suug.ch> Fixes: e341694e3eb57fc ("netlink: Convert netlink_lookup() to use RCU protected hash table") Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: fix lockdep splatsEric Dumazet2014-08-041-2/+2
| | | | | | | | | | With netlink_lookup() conversion to RCU, we need to use appropriate rcu dereference in netlink_seq_socket_idx() & netlink_seq_next() Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: e341694e3eb57fc ("netlink: Convert netlink_lookup() to use RCU protected hash table") Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: Convert netlink_lookup() to use RCU protected hash tableThomas Graf2014-08-023-195/+119
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Heavy Netlink users such as Open vSwitch spend a considerable amount of time in netlink_lookup() due to the read-lock on nl_table_lock. Use of RCU relieves the lock contention. Makes use of the new resizable hash table to avoid locking on the lookup. The hash table will grow if entries exceeds 75% of table size up to a total table size of 64K. It will automatically shrink if usage falls below 30%. Also splits nl_table_lock into a separate mutex to protect hash table mutations and allow synchronize_rcu() to sleep while waiting for readers during expansion and shrinking. Before: 9.16% kpktgend_0 [openvswitch] [k] masked_flow_lookup 6.42% kpktgend_0 [pktgen] [k] mod_cur_headers 6.26% kpktgend_0 [pktgen] [k] pktgen_thread_worker 6.23% kpktgend_0 [kernel.kallsyms] [k] memset 4.79% kpktgend_0 [kernel.kallsyms] [k] netlink_lookup 4.37% kpktgend_0 [kernel.kallsyms] [k] memcpy 3.60% kpktgend_0 [openvswitch] [k] ovs_flow_extract 2.69% kpktgend_0 [kernel.kallsyms] [k] jhash2 After: 15.26% kpktgend_0 [openvswitch] [k] masked_flow_lookup 8.12% kpktgend_0 [pktgen] [k] pktgen_thread_worker 7.92% kpktgend_0 [pktgen] [k] mod_cur_headers 5.11% kpktgend_0 [kernel.kallsyms] [k] memset 4.11% kpktgend_0 [openvswitch] [k] ovs_flow_extract 4.06% kpktgend_0 [kernel.kallsyms] [k] _raw_spin_lock 3.90% kpktgend_0 [kernel.kallsyms] [k] jhash2 [...] 0.67% kpktgend_0 [kernel.kallsyms] [k] netlink_lookup Signed-off-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: Use PAGE_ALIGNED macroTobias Klauser2014-07-311-1/+1
| | | | | | | Use PAGE_ALIGNED(...) instead of IS_ALIGNED(..., PAGE_SIZE). Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: remove bool varibleVarka Bhadram2014-07-161-4/+2
| | | | | | | | This patch removes the bool variable 'pass'. If the swith case exist return true or return false. Signed-off-by: Varka Bhadram <varkab@cdac.in> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2014-07-161-2/+2
|\ | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
| * netlink: Fix handling of error from netlink_dump().Ben Pfaff2014-07-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | netlink_dump() returns a negative errno value on error. Until now, netlink_recvmsg() directly recorded that negative value in sk->sk_err, but that's wrong since sk_err takes positive errno values. (This manifests as userspace receiving a positive return value from the recv() system call, falsely indicating success.) This bug was introduced in the commit that started checking the netlink_dump() return value, commit b44d211 (netlink: handle errors from netlink_dump()). Multithreaded Netlink dumps are one way to trigger this behavior in practice, as described in the commit message for the userspace workaround posted here: http://openvswitch.org/pipermail/dev/2014-June/042339.html This commit also fixes the same bug in netlink_poll(), introduced in commit cd1df525d (netlink: add flow control for memory mapped I/O). Signed-off-by: Ben Pfaff <blp@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netlink: Fix do_one_broadcast() prototype.Rami Rosen2014-07-071-9/+6
|/ | | | | | | This patch changes the prototype of the do_one_broadcast() method so that it will return void. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2014-06-031-1/+6
|\ | | | | | | | | | | | | | | | | | | Conflicts: include/net/inetpeer.h net/ipv6/output_core.c Changes in net were fixing bugs in code removed in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
| * netlink: Only check file credentials for implicit destinationsEric W. Biederman2014-06-021-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was possible to get a setuid root or setcap executable to write to it's stdout or stderr (which has been set made a netlink socket) and inadvertently reconfigure the networking stack. To prevent this we check that both the creator of the socket and the currentl applications has permission to reconfigure the network stack. Unfortunately this breaks Zebra which always uses sendto/sendmsg and creates it's socket without any privileges. To keep Zebra working don't bother checking if the creator of the socket has privilege when a destination address is specified. Instead rely exclusively on the privileges of the sender of the socket. Note from Andy: This is exactly Eric's code except for some comment clarifications and formatting fixes. Neither I nor, I think, anyone else is thrilled with this approach, but I'm hesitant to wait on a better fix since 3.15 is almost here. Note to stable maintainers: This is a mess. An earlier series of patches in 3.15 fix a rather serious security issue (CVE-2014-0181), but they did so in a way that breaks Zebra. The offending series includes: commit aa4cf9452f469f16cea8c96283b641b4576d4a7b Author: Eric W. Biederman <ebiederm@xmission.com> Date: Wed Apr 23 14:28:03 2014 -0700 net: Add variants of capable for use on netlink messages If a given kernel version is missing that series of fixes, it's probably worth backporting it and this patch. if that series is present, then this fix is critical if you care about Zebra. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | genetlink: remove superfluous assignmentDenis ChengRq2014-06-021-5/+1
| | | | | | | | | | | | | | | | | | | | | | the local variable ops and n_ops were just read out from family, and not changed, hence no need to assign back. Validation functions should operate on const parameters and not change anything. Signed-off-by: Cheng Renquan <crquan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2014-05-122-6/+71
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/altera/altera_sgdma.c net/netlink/af_netlink.c net/sched/cls_api.c net/sched/sch_api.c The netlink conflict dealt with moving to netlink_capable() and netlink_ns_capable() in the 'net' tree vs. supporting 'tc' operations in non-init namespaces. These were simple transformations from netlink_capable to netlink_ns_capable. The Altera driver conflict was simply code removal overlapping some void pointer cast cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: Use netlink_ns_capable to verify the permisions of netlink messagesEric W. Biederman2014-04-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible by passing a netlink socket to a more privileged executable and then to fool that executable into writing to the socket data that happens to be valid netlink message to do something that privileged executable did not intend to do. To keep this from happening replace bare capable and ns_capable calls with netlink_capable, netlink_net_calls and netlink_ns_capable calls. Which act the same as the previous calls except they verify that the opener of the socket had the desired permissions as well. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: Add variants of capable for use on netlink messagesEric W. Biederman2014-04-241-0/+65
| | | | | | | | | | | | | | | | | | | | | | | | netlink_net_capable - The common case use, for operations that are safe on a network namespace netlink_capable - For operations that are only known to be safe for the global root netlink_ns_capable - The general case of capable used to handle special cases __netlink_ns_capable - Same as netlink_ns_capable except taking a netlink_skb_parms instead of the skbuff of a netlink message. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netlink: Rename netlink_capable netlink_allowedEric W. Biederman2014-04-241-5/+5
| | | | | | | | | | | | | | | | netlink_capable is a static internal function in af_netlink.c and we have better uses for the name netlink_capable. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netlink: implement unbind to netlink_setsockopt NETLINK_DROP_MEMBERSHIPRichard Guy Briggs2014-04-221-1/+3
| | | | | | | | | | | | | | | | Call the per-protocol unbind function rather than bind function on NETLINK_DROP_MEMBERSHIP in netlink_setsockopt(). Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netlink: have netlink per-protocol bind function return an error code.Richard Guy Briggs2014-04-222-22/+52
|/ | | | | | | | | | | | | | | | | | | | | | | Have the netlink per-protocol optional bind function return an int error code rather than void to signal a failure. This will enable netlink protocols to perform extra checks including capabilities and permissions verifications when updating memberships in multicast groups. In netlink_bind() and netlink_setsockopt() the call to the per-protocol bind function was moved above the multicast group update to prevent any access to the multicast socket groups before checking with the per-protocol bind function. This will enable the per-protocol bind function to be used to check permissions which could be denied before making them available, and to avoid the messy job of undoing the addition should the per-protocol bind function fail. The netfilter subsystem seems to be the only one currently using the per-protocol bind function. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Fix use after free by removing length arg from sk_data_ready callbacks.David S. Miller2014-04-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Several spots in the kernel perform a sequence like: skb_queue_tail(&sk->s_receive_queue, skb); sk->sk_data_ready(sk, skb->len); But at the moment we place the SKB onto the socket receive queue it can be consumed and freed up. So this skb->len access is potentially to freed up memory. Furthermore, the skb->len can be modified by the consumer so it is possible that the value isn't accurate. And finally, no actual implementation of this callback actually uses the length argument. And since nobody actually cared about it's value, lots of call sites pass arbitrary values in such as '0' and even '1'. So just remove the length argument from the callback, that way there is no confusion whatsoever and all of these use-after-free cases get fixed as a side effect. Based upon a patch by Eric Dumazet and his suggestion to audit this issue tree-wide. Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: autosize skb lengthesEric Dumazet2014-03-102-1/+27
| | | | | | | | | | | | | | | | | | | One known problem with netlink is the fact that NLMSG_GOODSIZE is really small on PAGE_SIZE==4096 architectures, and it is difficult to know in advance what buffer size is used by the application. This patch adds an automatic learning of the size. First netlink message will still be limited to ~4K, but if user used bigger buffers, then following messages will be able to use up to 16KB. This speedups dump() operations by a large factor and should be safe for legacy applications. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Thomas Graf <tgraf@suug.ch> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2014-03-051-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/wireless/ath/ath9k/recv.c drivers/net/wireless/mwifiex/pcie.c net/ipv6/sit.c The SIT driver conflict consists of a bug fix being done by hand in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper was created (netdev_alloc_pcpu_stats()) which takes care of this. The two wireless conflicts were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: Fix permission check in netlink_connect()Mike Pecovnik2014-02-251-2/+2
| | | | | | | | | | | | | | | | | | | | netlink_sendmsg() was changed to prevent non-root processes from sending messages with dst_pid != 0. netlink_connect() however still only checks if nladdr->nl_groups is set. This patch modifies netlink_connect() to check for the same condition. Signed-off-by: Mike Pecovnik <mike.pecovnik@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netlink: fix checkpatch errors space and "foo *bar"Wang Yufen2014-02-171-2/+2
|/ | | | | | | ERROR: spaces required and "(foo*)" should be "(foo *)" Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: add build-time checks for msg->msg_name sizeSteffen Hurrle2014-01-181-2/+2
| | | | | | | | | | | | | | | This is a follow-up patch to f3d3342602f8bc ("net: rework recvmsg handler msg_name and msg_namelen logic"). DECLARE_SOCKADDR validates that the structure we use for writing the name information to is not larger than the buffer which is reserved for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR consistently in sendmsg code paths. Signed-off-by: Steffen Hurrle <steffen@hurrle.net> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2014-01-062-0/+25
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Jesse Gross says: ==================== [GIT net-next] Open vSwitch Open vSwitch changes for net-next/3.14. Highlights are: * Performance improvements in the mechanism to get packets to userspace using memory mapped netlink and skb zero copy where appropriate. * Per-cpu flow stats in situations where flows are likely to be shared across CPUs. Standard flow stats are used in other situations to save memory and allocation time. * A handful of code cleanups and rationalization. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * netlink: Avoid netlink mmap alloc if msg size exceeds frame sizeThomas Graf2014-01-061-0/+4
| | | | | | | | | | | | | | | | | | | | | | An insufficent ring frame size configuration can lead to an unnecessary skb allocation for every Netlink message. Check frame size before taking the queue lock and allocating the skb and re-check with lock to be safe. Signed-off-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
| * genl: Add genlmsg_new_unicast() for unicast message allocationThomas Graf2014-01-061-0/+21
| | | | | | | | | | | | | | | | | | | | | | Allocates a new sk_buff large enough to cover the specified payload plus required Netlink headers. Will check receiving socket for memory mapped i/o capability and use it if enabled. Will fall back to non-mapped skb if message size exceeds the frame size of the ring. Signed-of-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
* | netlink: cleanup tap related functionsstephen hemminger2014-01-011-17/+1
| | | | | | | | | | | | | | | | | | | | Cleanups in netlink_tap code * remove unused function netlink_clear_multicast_users * make local function static Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netlink: specify netlink packet direction for nlmonDaniel Borkmann2013-12-311-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to facilitate development for netlink protocol dissector, fill the unused field skb->pkt_type of the cloned skb with a hint of the address space of the new owner (receiver) socket in the notion of "to kernel" resp. "to user". At the time we invoke __netlink_deliver_tap_skb(), we already have set the new skb owner via netlink_skb_set_owner_r(), so we can use that for netlink_is_kernel() probing. In normal PF_PACKET network traffic, this field denotes if the packet is destined for us (PACKET_HOST), if it's broadcast (PACKET_BROADCAST), etc. As we only have 3 bit reserved, we can use the value (= 6) of PACKET_FASTROUTE as it's _not used_ anywhere in the whole kernel and not supported anywhere, and packets of such type were never exposed to user space, so there are no overlapping users of such kind. Thus, as wished, that seems the only way to make both PACKET_* values non-overlapping and therefore device agnostic. By using those two flags for netlink skbs on nlmon devices, they can be made available and picked up via sll_pkttype (previously unused in netlink context) in struct sockaddr_ll. We now have these two directions: - PACKET_USER (= 6) -> to user space - PACKET_KERNEL (= 7) -> to kernel space Partial `ip a` example strace for sa_family=AF_NETLINK with detected nl msg direction: syscall: direction: sendto(3, ...) = 40 /* to kernel */ recvmsg(3, ...) = 3404 /* to user */ recvmsg(3, ...) = 1120 /* to user */ recvmsg(3, ...) = 20 /* to user */ sendto(3, ...) = 40 /* to kernel */ recvmsg(3, ...) = 168 /* to user */ recvmsg(3, ...) = 144 /* to user */ recvmsg(3, ...) = 20 /* to user */ Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Jakub Zawadzki <darkjames-ws@darkjames.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netlink: only do not deliver to tap when both sides are kernel sksDaniel Borkmann2013-12-311-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | We should also deliver packets to nlmon devices when we are in netlink_unicast_kernel(), and only one of the {src,dst} sockets is user sk and the other one kernel sk. That's e.g. the case in netlink diag, netlink route, etc. Still, forbid to deliver messages from kernel to kernel sks. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Jakub Zawadzki <darkjames-ws@darkjames.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
* | genetlink/pmcraid: use proper genetlink multicast APIJohannes Berg2013-11-281-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pmcraid driver is abusing the genetlink API and is using its family ID as the multicast group ID, which is invalid and may belong to somebody else (and likely will.) Make it use the correct API, but since this may already be used as-is by userspace, reserve a family ID for this code and also reserve that group ID to not break userspace assumptions. My previous patch broke event delivery in the driver as I missed that it wasn't using the right API and forgot to update it later in my series. While changing this, I noticed that the genetlink code could use the static group ID instead of a strcmp(), so also do that for the VFS_DQUOT family. Cc: Anil Ravindranath <anil_ravindranath@pmc-sierra.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | genetlink: Fix uninitialized variable in genl_validate_assign_mc_groups()Geert Uytterhoeven2013-11-281-1/+1
|/ | | | | | | | | | | | | | | net/netlink/genetlink.c: In function ‘genl_validate_assign_mc_groups’: net/netlink/genetlink.c:217: warning: ‘err’ may be used uninitialized in this function Commit 2a94fe48f32ccf7321450a2cc07f2b724a444e5b ("genetlink: make multicast groups const, prevent abuse") split genl_register_mc_group() in multiple functions, but dropped the initialization of err. Initialize err to zero to fix this. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: fix genlmsg_multicast() bugJohannes Berg2013-11-211-2/+2
| | | | | | | | | | | | | | | | | Unfortunately, I introduced a tremendously stupid bug into genlmsg_multicast() when doing all those multicast group changes: it adjusts the group number, but then passes it to genlmsg_multicast_netns() which does that again. Somehow, my tests failed to catch this, so add a warning into genlmsg_multicast_netns() and remove the offending group ID adjustment. Also add a warning to the similar code in other functions so people who misuse them are more loudly warned. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: rework recvmsg handler msg_name and msg_namelen logicHannes Frederic Sowa2013-11-201-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch now always passes msg->msg_namelen as 0. recvmsg handlers must set msg_namelen to the proper size <= sizeof(struct sockaddr_storage) to return msg_name to the user. This prevents numerous uninitialized memory leaks we had in the recvmsg handlers and makes it harder for new code to accidentally leak uninitialized memory. Optimize for the case recvfrom is called with NULL as address. We don't need to copy the address at all, so set it to NULL before invoking the recvmsg handler. We can do so, because all the recvmsg handlers must cope with the case a plain read() is called on them. read() also sets msg_name to NULL. Also document these changes in include/linux/net.h as suggested by David Miller. Changes since RFC: Set msg->msg_name = NULL if user specified a NULL in msg_name but had a non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't affect sendto as it would bail out earlier while trying to copy-in the address. It also more naturally reflects the logic by the callers of verify_iovec. With this change in place I could remove " if (!uaddr || msg_sys->msg_namelen == 0) msg->msg_name = NULL ". This change does not alter the user visible error logic as we ignore msg_namelen as long as msg_name is NULL. Also remove two unnecessary curly brackets in ___sys_recvmsg and change comments to netdev style. Cc: David Miller <davem@davemloft.net> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: make multicast groups const, prevent abuseJohannes Berg2013-11-191-112/+166
| | | | | | | | | | | | | | | | | | Register generic netlink multicast groups as an array with the family and give them contiguous group IDs. Then instead of passing the global group ID to the various functions that send messages, pass the ID relative to the family - for most families that's just 0 because the only have one group. This avoids the list_head and ID in each group, adding a new field for the mcast group ID offset to the family. At the same time, this allows us to prevent abusing groups again like the quota and dropmon code did, since we can now check that a family only uses a group it owns. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: pass family to functions using groupsJohannes Berg2013-11-191-5/+7
| | | | | | | | | This doesn't really change anything, but prepares for the next patch that will change the APIs to pass the group ID within the family, rather than the global group ID. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: remove family pointer from genl_multicast_groupJohannes Berg2013-11-191-20/+18
| | | | | | | | There's no reason to have the family pointer there since it can just be passed internally where needed, so remove it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: remove genl_unregister_mc_group()Johannes Berg2013-11-191-23/+0
| | | | | | | | There are no users of this API remaining, and we'll soon change group registration to be static (like ops are now) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* quota/genetlink: use proper genetlink multicast APIsJohannes Berg2013-11-191-2/+8
| | | | | | | | | | | | | | | The quota code is abusing the genetlink API and is using its family ID as the multicast group ID, which is invalid and may belong to somebody else (and likely will.) Make the quota code use the correct API, but since this is already used as-is by userspace, reserve a family ID for this code and also reserve that group ID to not break userspace assumptions. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* drop_monitor/genetlink: use proper genetlink multicast APIsJohannes Berg2013-11-191-3/+10
| | | | | | | | | | | | | | | | The drop monitor code is abusing the genetlink API and is statically using the generic netlink multicast group 1, even if that group belongs to somebody else (which it invariably will, since it's not reserved.) Make the drop monitor code use the proper APIs to reserve a group ID, but also reserve the group id 1 in generic netlink code to preserve the userspace API. Since drop monitor can be a module, don't clear the bit for it on unregistration. Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: only pass array to genl_register_family_with_ops()Johannes Berg2013-11-191-6/+8
| | | | | | | | | | | | | | As suggested by David Miller, make genl_register_family_with_ops() a macro and pass only the array, evaluating ARRAY_SIZE() in the macro, this is a little safer. The openvswitch has some indirection, assing ops/n_ops directly in that code. This might ultimately just assign the pointers in the family initializations, saving the struct genl_family_and_ops and code (once mcast groups are handled differently.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: fix documentation typo in netlink_set_err()Johannes Berg2013-11-191-1/+1
| | | | | | | The parameter is just 'group', not 'groups', fix the documentation typo. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: rename shadowed variableJohannes Berg2013-11-181-5/+5
| | | | | | | | | Sparse pointed out that the new flags variable I had added shadowed an existing one, rename the new one to avoid that, making the code clearer. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: unify registration functionsJohannes Berg2013-11-151-37/+16
| | | | | | | | | | | | Now that the ops assignment is just two variables rather than a long list iteration etc., there's no reason to separately export __genl_register_family() and __genl_register_family_with_ops(). Unify the two functions into __genl_register_family() and make genl_register_family_with_ops() call it after assigning the ops. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: allow making ops constJohannes Berg2013-11-141-15/+21
| | | | | | | | | | | | Allow making the ops array const by not modifying the ops flags on registration but rather only when ops are sent out in the family information. No users are updated yet except for the pre_doit/post_doit calls in wireless (the only ones that exist now.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: register family ops as arrayJohannes Berg2013-11-141-45/+33
| | | | | | | | | | | | | | Instead of using a linked list, use an array. This reduces the data size needed by the users of genetlink, for example in wireless (net/wireless/nl80211.c) on 64-bit it frees up over 1K of data space. Remove the attempted sending of CTRL_CMD_NEWOPS ctrl event since genl_ctrl_event(CTRL_CMD_NEWOPS, ...) only returns -EINVAL anyway, therefore no such event could ever be sent. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: remove genl_register_ops/genl_unregister_opsJohannes Berg2013-11-141-56/+1
| | | | | | | | genl_register_ops() is still needed for internal registration, but is no longer available to users of the API. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: netlink: filter particular protocols from analyzersDaniel Borkmann2013-09-061-0/+30
| | | | | | | | | | | | Fix finer-grained control and let only a whitelist of allowed netlink protocols pass, in our case related to networking. If later on, other subsystems decide they want to add their protocol as well to the list of allowed protocols they shall simply add it. While at it, we also need to tell what protocol is in use otherwise BPF_S_ANC_PROTOCOL can not pick it up (as it's not filled out). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-09-051-12/+55
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c net/bridge/br_multicast.c net/ipv6/sit.c The conflicts were minor: 1) sit.c changes overlap with change to ip_tunnel_xmit() signature. 2) br_multicast.c had an overlap between computing max_delay using msecs_to_jiffies and turning MLDV2_MRC() into an inline function with a name using lowercase instead of uppercase letters. 3) stmmac had two overlapping changes, one which conditionally allocated and hooked up a dma_cfg based upon the presence of the pbl OF property, and another one handling store-and-forward DMA made. The latter of which should not go into the new of_find_property() basic block. Signed-off-by: David S. Miller <davem@davemloft.net>