summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* samples: bpf: trivial eBPF program in CAlexei Starovoitov2014-12-054-1/+89
| | | | | | | | | | | | | | | | | | | | this example does the same task as previous socket example in assembler, but this one does it in C. eBPF program in kernel does: /* assume that packet is IPv4, load one byte of IP->proto */ int index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol)); long *value; value = bpf_map_lookup_elem(&my_map, &index); if (value) __sync_fetch_and_add(value, 1); Corresponding user space reads map[tcp], map[udp], map[icmp] and prints protocol stats every second Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* samples: bpf: elf_bpf file loaderAlexei Starovoitov2014-12-053-0/+267
| | | | | | | | | | | | | | | | | | | | | | simple .o parser and loader using BPF syscall. .o is a standard ELF generated by LLVM backend It parses elf file compiled by llvm .c->.o - parses 'maps' section and creates maps via BPF syscall - parses 'license' section and passes it to syscall - parses elf relocations for BPF maps and adjusts BPF_LD_IMM64 insns by storing map_fd into insn->imm and marking such insns as BPF_PSEUDO_MAP_FD - loads eBPF programs via BPF syscall One ELF file can contain multiple BPF programs. int load_bpf_file(char *path); populates prog_fd[] and map_fd[] with FDs received from bpf syscall bpf_helpers.h - helper functions available to eBPF programs written in C Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* samples: bpf: example of stateful socket filteringAlexei Starovoitov2014-12-054-0/+144
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | this socket filter example does: - creates arraymap in kernel with key 4 bytes and value 8 bytes - loads eBPF program which assumes that packet is IPv4 and loads one byte of IP->proto from the packet and uses it as a key in a map r0 = skb->data[ETH_HLEN + offsetof(struct iphdr, protocol)]; *(u32*)(fp - 4) = r0; value = bpf_map_lookup_elem(map_fd, fp - 4); if (value) (*(u64*)value) += 1; - attaches this program to raw socket - every second user space reads map[IPPROTO_TCP], map[IPPROTO_UDP], map[IPPROTO_ICMP] to see how many packets of given protocol were seen on loopback interface Usage: $sudo samples/bpf/sock_example TCP 0 UDP 0 ICMP 0 packets TCP 187600 UDP 0 ICMP 4 packets TCP 376504 UDP 0 ICMP 8 packets TCP 563116 UDP 0 ICMP 12 packets TCP 753144 UDP 0 ICMP 16 packets Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sock: allow eBPF programs to be attached to socketsAlexei Starovoitov2014-12-0518-2/+155
| | | | | | | | | | | | | | | | | | | | introduce new setsockopt() command: setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd)) where prog_fd was received from syscall bpf(BPF_PROG_LOAD, attr, ...) and attr->prog_type == BPF_PROG_TYPE_SOCKET_FILTER setsockopt() calls bpf_prog_get() which increments refcnt of the program, so it doesn't get unloaded while socket is using the program. The same eBPF program can be attached to multiple sockets. User task exit automatically closes socket which calls sk_filter_uncharge() which decrements refcnt of eBPF program Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* bpf: verifier: add checks for BPF_ABS | BPF_IND instructionsAlexei Starovoitov2014-12-052-2/+69
| | | | | | | | | | | | | introduce program type BPF_PROG_TYPE_SOCKET_FILTER that is used for attaching programs to sockets where ctx == skb. add verifier checks for ABS/IND instructions which can only be seen in socket filters, therefore the check: if (env->prog->aux->prog_type != BPF_PROG_TYPE_SOCKET_FILTER) verbose("BPF_LD_ABS|IND instructions are only allowed in socket filters\n"); Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tun/macvtap: use consume_skb() instead of kfree_skb() when neededJason Wang2014-12-052-2/+8
| | | | | | | | | | To be more friendly with drop monitor, we should only call kfree_skb() when the packets were dropped and use consume_skb() in other cases. Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net-PA Semi: Deletion of unnecessary checks before the function call ↵Markus Elfring2014-12-051-4/+2
| | | | | | | | | | | | | | | "pci_dev_put" The pci_dev_put() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-by: Olof Johansson <olof@lixom.net> Acked-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net-ipvlan: Deletion of an unnecessary check before the function call ↵Markus Elfring2014-12-051-2/+1
| | | | | | | | | | | | | "free_percpu" The free_percpu() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: cassini: Deletion of an unnecessary check before the function call "vfree"Markus Elfring2014-12-051-2/+1
| | | | | | | | | | The vfree() function performs also input parameter validation. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* stmmac: pci: allocate memory resources dynamicallyAndy Shevchenko2014-12-051-24/+31
| | | | | | | | | Instead of using global variables we are going to use dynamically allocated memory. It allows to append a support of more than one ethernet adapter which might have different settings simultaniously. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller2014-12-0528-254/+372
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following batch contains netfilter updates for net-next. Basically, enhancements for xt_recent, skip zeroing of timer in conntrack, fix linking problem with recent redirect support for nf_tables, ipset updates and a couple of cleanups. More specifically, they are: 1) Rise maximum number per IP address to be remembered in xt_recent while retaining backward compatibility, from Florian Westphal. 2) Skip zeroing timer area in nf_conn objects, also from Florian. 3) Inspect IPv4 and IPv6 traffic from the bridge to allow filtering using using meta l4proto and transport layer header, from Alvaro Neira. 4) Fix linking problems in the new redirect support when CONFIG_IPV6=n and IP6_NF_IPTABLES=n. And ipset updates from Jozsef Kadlecsik: 5) Support updating element extensions when the set is full (fixes netfilter bugzilla id 880). 6) Fix set match with 32-bits userspace / 64-bits kernel. 7) Indicate explicitly when /0 networks are supported in ipset. 8) Simplify cidr handling for hash:*net* types. 9) Allocate the proper size of memory when /0 networks are supported. 10) Explicitly add padding elements to hash:net,net and hash:net,port, because the elements must be u32 sized for the used hash function. Jozsef is also cooking ipset RCU conversion which should land soon if they reach the merge window in time. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: ipset: Explicitly add padding elements to hash:net, net and ↵Jozsef Kadlecsik2014-12-032-0/+4
| | | | | | | | | | | | | | | | | | hash:net, port, net The elements must be u32 sized for the used hash function. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: ipset: Allocate the proper size of memory when /0 networks are ↵Jozsef Kadlecsik2014-12-031-2/+1
| | | | | | | | | | | | | | supported Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: ipset: Simplify cidr handling for hash:*net* typesJozsef Kadlecsik2014-12-031-28/+28
| | | | | | | | | | Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: ipset: Indicate when /0 networks are supportedJozsef Kadlecsik2014-12-032-1/+2
| | | | | | | | | | Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: ipset: Alignment problem between 64bit kernel 32bit userspaceJozsef Kadlecsik2014-12-033-6/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sven-Haegar Koch reported the issue: sims:~# iptables -A OUTPUT -m set --match-set testset src -j ACCEPT iptables: Invalid argument. Run `dmesg' for more information. In syslog: x_tables: ip_tables: set.3 match: invalid size 48 (kernel) != (user) 32 which was introduced by the counter extension in ipset. The patch fixes the alignment issue with introducing a new set match revision with the fixed underlying 'struct ip_set_counter_match' structure. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: ipset: Support updating extensions when the set is fullJozsef Kadlecsik2014-12-031-23/+17
| | | | | | | | | | | | | | | | | | | | When the set was full (hash type and maxelem reached), it was not possible to update the extension part of already existing elements. The patch removes this limitation. Fixes: https://bugzilla.netfilter.org/show_bug.cgi?id=880 Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_log_ipv6: correct typo in module descriptionSteven Noonan2014-11-281-1/+1
| | | | | | | | | | | | | | It incorrectly identifies itself as "IPv4" packet logging. Signed-off-by: Steven Noonan <steven@uplinklabs.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: combine IPv4 and IPv6 nf_nat_redirect code in one modulePablo Neira Ayuso2014-11-2714-115/+72
| | | | | | | | | | | | | | | | | | | | | | This resolves linking problems with CONFIG_IPV6=n: net/built-in.o: In function `redirect_tg6': xt_REDIRECT.c:(.text+0x6d021): undefined reference to `nf_nat_redirect_ipv6' Reported-by: Andreas Ruprecht <rupran@einserver.de> Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_tables_bridge: set the pktinfo for IPv4/IPv6 trafficAlvaro Neira2014-11-271-1/+39
| | | | | | | | | | | | | | | | | | | | This patch adds the missing bits to allow to match per meta l4proto from the bridge. Example: nft add rule bridge filter input ether type {ip, ip6} meta l4proto udp counter Signed-off-by: Alvaro Neira Ayuso <alvaroneay@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_tables_bridge: export nft_reject_ip*hdr_validate functionsAlvaro Neira2014-11-273-47/+60
| | | | | | | | | | | | | | | | | | This patch exports the functions nft_reject_iphdr_validate and nft_reject_ip6hdr_validate to use it in follow up patches. These functions check if the IPv4/IPv6 header is correct. Signed-off-by: Alvaro Neira Ayuso <alvaroneay@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: conntrack: avoid zeroing timerFlorian Westphal2014-11-272-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | add a __nfct_init_offset annotation member to struct nf_conn to make it clear which members are covered by the memset when the conntrack is allocated. This avoids zeroing timer_list and ct_net; both are already inited explicitly. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: xt_recent: relax ip_pkt_list_tot restrictionsFlorian Westphal2014-11-271-17/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The maximum value for the hitcount parameter is given by "ip_pkt_list_tot" parameter (default: 20). Exceeding this value on the command line will cause the rule to be rejected. The parameter is also readonly, i.e. it cannot be changed without module unload or reboot. Store size per table, then base nstamps[] size on the hitcount instead. The module parameter is retained for backwards compatibility. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | Merge branch 'master' of ↵David S. Miller2014-12-0518-323/+2404
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2014-12-05 This series contains updates to ixgbe and ixgbevf. Alex provides a couple of patches to cleanup ixgbe. First cleans up the page reuse code getting it into a state where all the workarounds needed are in place as well as cleaning up a few minor oversights such as using __free_pages instead of put_page to drop a locally allocated page. Then cleans up the tail writes for the ixgbe descriptor queues. Mark Peterson adds support to lookup MAC addresses in Open Firmware or IDPROM. Emil provides patches for ixgbe and ixgbevf to fix an issue on rmmod and to add support for X550 in the VF driver. First removes the read/write operations to the CIAA/D registers since it can block access to the PCI config space and make use of standard kernel functions for accessing the PCI config space. Then fixes an issue where the driver has logic to free up used data in case any of the checks in ixgbe_probe() fail, however there is a similar set of cleanups that can occur on driver unload in ixgbe_remove() which can cause the rmmod command to crash. Don provides the remaining patches in the series to complete the addition of X550 support into the ixgbe driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ixgbevf: fix possible crashes in probe and removeEmil Tantilov2014-12-051-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch resolves couple of issues in ixgbevf_probe/remove(): 1. Fix a case where adapter->state is tested after free_netdev() this is same as the patch for ixgbe from Daniel Borkmann <dborkman@redhat.com>: commit b5b2ffc0574e1f27 ("ixgbe: fix use after free adapter->state test in ixgbe_remove/ixgbe_probe") 2. Move pci_set_drvdata() after all the error checks in ixgbevf_probe() and then add a check in ixgbevf_probe() to avoid running the cleanup functions twice in cases where probe failed. CC: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | ixgbevf: add support for X550 VFsEmil Tantilov2014-12-055-5/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds initial support for VFs on a new mac - X550. The patch adds the basic structures and device IDs for the X550 VFs that would allow the driver to load and pass traffic. Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | ixgbe: fix crash on rmmod after probe failEmil Tantilov2014-12-051-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The driver has logic to free up used data in case any of the checks in ixgbe_probe() fail, however there is a similar set of cleanups that can occur on driver unload in ixgbe_remove() which can cause the rmmod command to crash. This patch aims to fix the logic by moving pci_set_drvdata() after all error checks and then adds a check in ixgbe_remove() to skip it altogether if adapter comes up empty. Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | ixgbe: bump version numberDon Skidmore2014-12-051-1/+1
| | | | | | | | | | | | | | | | | | | | | Since we now support X550 mac's bump the version number to reflect this. Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | ixgbe: Add X550 support function pointersDon Skidmore2014-12-0510-59/+1732
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch extends the function pointer structure to include the new X550 class MAC types. This creates a new file ixgbe_x550.c that contains all of the new methods. Because of similarities to the X540 part in some cases we just use it's methods where they can be used without any modification. These exported functions are now defined in the new ixgbe_x540.h file. Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | ixgbe: cleanup checksum to allow error resultsDon Skidmore2014-12-054-53/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the shared code checksum calculation function only returns a u16 and cannot return an error code. Unfortunately a variety of errors can happen that completely prevent the calculation of a checksum. So, change the function return value from a u16 to an s32 and return a negative value on error, or the positive checksum value when there is no error. Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | ixgbe: add methods for combined read and write operationsDon Skidmore2014-12-053-0/+188
| | | | | | | | | | | | | | | | | | | | | | | | | | | Some X550 procedures will be using CS4227 PHY and need to perform combined read and write operations. This patch adds those methods. Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | ixgbe: Add x550 SW/FW semaphore supportDon Skidmore2014-12-055-29/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The X550 hardware will use more bits in the mask, so change the prototypes to match. This larger mask will require changes in callers which use the higher bits. Likewise since X550 will use different semaphore mask values and will use the lan_id value. So save these values in the ixgbe_phy_info struct. Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | ixgbe: Add timeout parameter to ixgbe_host_interface_commandDon Skidmore2014-12-052-13/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since on X550 we use host interface commands to read,write and erase some commands require more time to complete. So this adds a timeout parameter to ixgbe_host_interface_command as wells as a return_data parameter allowing us to return with any data. Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | ixgbe: add support for X550 extended RSS supportDon Skidmore2014-12-055-25/+107
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new X550 family of MAC's will have a larger RSS hash (16 -> 64). It will also support individual VF to have their own independent RSS hash key. This patch will enable this functionality Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | ixgbe: remove CIAA/D register reads from bad VF checkEmil Tantilov2014-12-051-71/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Accessing the CIAA/D register can block access to the PCI config space. This patch removes the read/write operations to the CIAA/D registers and makes use of standard kernel functions for accessing the PCI config space. In addition it moves ixgbevf_check_for_bad_vf() into the watchdog subtask which reduces the frequency of the checks. CC: Alex Williamson <alex.williamson@redhat.com> Reported-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | ixgbe: Look up MAC address in Open Firmware or IDPROMMartin K Petersen2014-12-051-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Attempt to look up the MAC address in Open Firmware on systems that support it. On SPARC resort to using the IDPROM if no OF address is found. Signed-off-by: Martin K Petersen <martin.petersen@oracle.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | ixgbe: Remove tail write abstraction and add missing barrierAlexander Duyck2014-12-052-25/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change cleans up the tail writes for the ixgbe descriptor queues. The current implementation had me confused as I wasn't sure if it was still making use of the surprise remove logic or not. It also adds the mmiowb which is needed on ia64, mips, and a couple other architectures in order to synchronize the MMIO writes with the Tx queue _xmit_lock spinlock. Cc: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | ixgbe: Clean-up page reuse codeAlexander Duyck2014-12-051-42/+36
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch cleans up the page reuse code getting it into a state where all the workarounds needed are in place as well as cleaning up a few minor oversights such as using __free_pages instead of put_page to drop a locally allocated page. It also cleans up how we clear the descriptor status bits. Previously they were zeroed as a part of clearing the hdr_addr. However the hdr_addr is a 64 bit field and 64 bit writes can be a bit more expensive on on 32 bit systems. Since we are no longer using the header split feature the upper 32 bits of the address no longer need to be cleared. As a result we can just clear the status bits and leave the length and VLAN fields as-is which should provide more information in debugging. Cc: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | tun: Fix GSO meta-data handling in tun_get_userHerbert Xu2014-12-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | When we write the GSO meta-data in tun_get_user we end up advancing the IO vector twice, thus exhausting the user buffer before we can finish writing the packet. Fixes: f5ff53b4d97c ("{macvtap,tun}_get_user(): switch to iov_iter") Reported-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'rocker-next'David S. Miller2014-12-0235-105/+5366
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jiri Pirko says: ==================== introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload This patchset is just the first phase of switch and switch-ish device support api in kernel. Note that the api will extend. So what this patchset includes: - introduce switchdev api skeleton for implementing switch drivers - introduce rocker switch driver which implements switchdev api fdb and bridge set/get link ndos As to the discussion if there is need to have specific class of device representing the switch itself, so far we found no need to introduce that. But we are generally ok with the idea and when the time comes and it will be needed, it can be easily introduced without any disturbance. This patchset introduces switch id export through rtnetlink and sysfs, which is similar to what we have for port id in SR-IOV. I will send iproute2 patchset for showing the switch id for port netdevs once this is applied. This applies also for the PF_BRIDGE and fdb iproute2 patches. iproute2 patches are now available here: https://github.com/jpirko/iproute2-rocker For detailed description and version history, please see individual patches. In v4 I reordered the patches leaving rocker patches on the end of the patchset. In v5 I only fixed whitespace issues of patch #13 We have a TODO for related items we want to work on in near future: https://etherpad.wikimedia.org/p/netdev-swdev-todo ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | rocker: Use logical operators on booleansThomas Graf2014-12-021-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | Silences various sparse warnings Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | rocker: Add proper validation of Netlink attributesThomas Graf2014-12-021-0/+9
| | | | | | | | | | | | | | | | | | | | | Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | rocker: add ndo_bridge_setlink/getlink support for learning policyScott Feldman2014-12-022-0/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rocker ports will use new "swdev" hwmode for bridge port offload policy. Current supported policy settings are BR_LEARNING and BR_LEARNING_SYNC. User can turn on/off device port FDB learning and syncing to bridge. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | rocker: implement ndo_fdb_dumpJiri Pirko2014-12-021-0/+73
| | | | | | | | | | | | | | | Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | rocker: implement L2 bridge offloadingScott Feldman2014-12-021-1/+669
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add L2 bridge offloading support to rocker driver. Here, the Linux bridge driver is used to collect swdev ports into a tagged (or untagged) VLAN bridge. The switchdev will offload from the bridge driver the following L2 bridging functions: - Learning of neighbor MAC addresses on VLAN X Learned mac/vlan is installed in bridge FDB. (And removed when device unlearns mac/vlan). Learning must be turned off on each bridge port to disable the feature in the bridge driver. - Flooding of multicast/broadcast and unknown unicast pkts to (STP) active ports in bridge. The bridge driver is unaware of the flooding happening at the device level. Flooding must be turned off on each bridge port to disable the feature on the bridge driver. - STP port state is pushed down to driver/device. The bridge still processes STP BDPUs and maintains port STP state (for all VLANs in bridge), but the driver/device must be notified of port STP state change to program the device. Multiple (VLAN) bridges are supported. The device (implemented per the OF-DPA spec) must use a portion of the VLAN namespace for internal VLANs. Right now, the upper 255 VLANs (0xf00 to 0xffe) are used as internal VLAN IDs for untagged traffic and are not available as port VLANs. The driver uses the following interfaces: 1. To track VLAN add/del on ports in bridge: .ndo_vlan_rx_add_vid .ndo_vlan_rx_kill_vid 2. To track port add/del membership in bridge: NETDEV_CHANGEUPPER netdevice notifier 3. To catch static FDB entries installed on bridge/vlan by user using netlink: .ndo_fdb_add .ndo_fdb_del 4. To be notified on port STP state change: .ndo_switch_port_stp_update 5. To notify bridge driver on learned/forgotten mac/vlans on bridge port: br_fdb_external_learn_add br_fdb_external_learn_del Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | rocker: implement rocker ofdpa flow table manipulationScott Feldman2014-12-021-2/+1467
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rocker driver maintains 4 hash tables: flows, groups, FDB, and VLANs. Flow and group tables track the entries installed to OF-DPA tables, per the OF-DPA spec. See OF-DPA spec for full description of fields in each flow and group table. New table entries are pushed to the device with ADD cmd. Updated entries are pushed to the device with MOD cmd. For flow table entries, a crc32 key is made from fields of the particular field. For group table entries, the group_id is used as the key. The FDB table tracks fdb entries learned by the device or manually pushed to the bridge by the user. A crc32 key is made from the port/mac/vlan tuple for the fdb entry. The VLAN table tracks the ifindex-to-internal-vlan mapping for untagged pkts. On ingress, an untagged pkt is inserted with an internal VLAN ID based on the input port's current internal VLAN ID. The input port's internal VLAN will either be referenced by the port's ifindex, if not bridged, or the containing bridge's ifindex, if bridged. Since the ifindex space isn't within a fixed range, uses a hash table (with ifindex as key) to track internal VLAN ID for a given ifindex. The internal VLAN ID range is fixed and currently uses the upper 255 VLAN IDs, starting at 0xf00. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | rocker: introduce rocker switch driverJiri Pirko2014-12-027-0/+2528
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces the first driver to benefit from the switchdev infrastructure and to implement newly introduced switch ndos. This is a driver for emulated switch chip implemented in qemu: https://github.com/sfeldma/qemu-rocker/ This patch is a result of joint work with Scott Feldman. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Reviewed-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bridge: add brport flags to dflt bridge_getlinkScott Feldman2014-12-024-4/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To allow brport device to return current brport flags set on port. Add returned flags to nested IFLA_PROTINFO netlink msg built in dflt getlink. With this change, netlink msg returned for bridge_getlink contains the port's offloaded flag settings (the port's SELF settings). Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bridge: add new hwmode swdevScott Feldman2014-12-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current hwmode settings are "vepa" or "veb". These are for NIC interfaces with basic bridging function offloaded to HW. Add new "swdev" for full switch device offloads. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bridge: add new brport flag LEARNING_SYNCScott Feldman2014-12-022-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This policy flag controls syncing of learned FDB entries to bridge's FDB. If on, FDB entries learned on bridge port device will be synced. If off, device may still learn new FDB entries but they will not be synced with bridge's FDB. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>