Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next

Pull networking updates from David Miller: 1) Add VF IPSEC offload support in ixgbe, from Shannon Nelson. 2) Add zero-copy AF_XDP support to i40e, from Björn Töpel. 3) All in-tree drivers are converted to {g,s}et_link_ksettings() so we can get rid of the {g,s}et_settings ethtool callbacks, from Michal Kubecek. 4) Add software timestamping to veth driver, from Michael Walle. 5) More work to make packet classifiers and actions lockless, from Vlad Buslov. 6) Support sticky FDB entries in bridge, from Nikolay Aleksandrov. 7) Add ipv6 version of IP_MULTICAST_ALL sockopt, from Andre Naujoks. 8) Support batching of XDP buffers in vhost_net, from Jason Wang. 9) Add flow dissector BPF hook, from Petar Penkov. 10) i40e vf --> generic iavf conversion, from Jesse Brandeburg. 11) Add NLA_REJECT netlink attribute policy type, to signal when users provide attributes in situations which don't make sense. From Johannes Berg. 12) Switch TCP and fair-queue scheduler over to earliest departure time model. From Eric Dumazet. 13) Improve guest receive performance by doing rx busy polling in tx path of vhost networking driver, from Tonghao Zhang. 14) Add per-cgroup local storage to bpf 15) Add reference tracking to BPF, from Joe Stringer. The verifier can now make sure that references taken to objects are properly released by the program. 16) Support in-place encryption in TLS, from Vakul Garg. 17) Add new taprio packet scheduler, from Vinicius Costa Gomes. 18) Lots of selftests additions, too numerous to mention one by one here but all of which are very much appreciated. 19) Support offloading of eBPF programs containing BPF to BPF calls in nfp driver, frm Quentin Monnet. 20) Move dpaa2_ptp driver out of staging, from Yangbo Lu. 21) Lots of u32 classifier cleanups and simplifications, from Al Viro. 22) Add new strict versions of netlink message parsers, and enable them for some situations. From David Ahern. 23) Evict neighbour entries on carrier down, also from David Ahern. 24) Support BPF sk_msg verdict programs with kTLS, from Daniel Borkmann and John Fastabend. 25) Add support for filtering route dumps, from David Ahern. 26) New igc Intel driver for 2.5G parts, from Sasha Neftin et al. 27) Allow vxlan enslavement to bridges in mlxsw driver, from Ido Schimmel. 28) Add queue and stack map types to eBPF, from Mauricio Vasquez B. 29) Add back byte-queue-limit support to r8169, with all the bug fixes in other areas of the driver it works now! From Florian Westphal and Heiner Kallweit. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2147 commits) tcp: add tcp_reset_xmit_timer() helper qed: Fix static checker warning Revert "be2net: remove desc field from be_eq_obj" Revert "net: simplify sock_poll_wait" net: socionext: Reset tx queue in ndo_stop net: socionext: Add dummy PHY register read in phy_write() net: socionext: Stop PHY before resetting netsec net: stmmac: Set OWN bit for jumbo frames arm64: dts: stratix10: Support Ethernet Jumbo frame tls: Add maintainers net: ethernet: ti: cpsw: unsync mcast entries while switch promisc mode octeontx2-af: Support for NIXLF's UCAST/PROMISC/ALLMULTI modes octeontx2-af: Support for setting MAC address octeontx2-af: Support for changing RSS algorithm octeontx2-af: NIX Rx flowkey configuration for RSS octeontx2-af: Install ucast and bcast pkt forwarding rules octeontx2-af: Add LMAC channel info to NIXLF_ALLOC response octeontx2-af: NPC MCAM and LDATA extract minimal configuration octeontx2-af: Enable packet length and csum validation octeontx2-af: Support for VTAG strip and capture ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2018-10-24 06:47:44 +0100
committer: Linus Torvalds <torvalds@linux-foundation.org> 2018-10-24 06:47:44 +0100
commit: 50b825d7e87f4cff7070df6eb26390152bb29537 (patch)
tree: ec82aba49ab0c4743266ff37e18c8304a0367d06 /kernel/bpf/local_storage.c
parent: a97a2d4d56ea596871b739d63d41b084733bd9fb (diff)
parent: 3f80e08f40cdb308589a49077c87632fa4508b21 (diff)
download: linux-50b825d7e87f4cff7070df6eb26390152bb29537.tar.gz
linux-50b825d7e87f4cff7070df6eb26390152bb29537.tar.bz2
linux-50b825d7e87f4cff7070df6eb26390152bb29537.zip
1 files changed, 142 insertions, 27 deletions
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index 830d7f095748..c97a8f968638 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -7,7 +7,8 @@
 #include <linux/rbtree.h>
 #include <linux/slab.h>
 
-DEFINE_PER_CPU(void*, bpf_cgroup_storage);
+DEFINE_PER_CPU(struct bpf_cgroup_storage*,
+	       bpf_cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]);
 
 #ifdef CONFIG_CGROUP_BPF
 
@@ -151,6 +152,71 @@ static int cgroup_storage_update_elem(struct bpf_map *map, void *_key,
 	return 0;
 }
 
+int bpf_percpu_cgroup_storage_copy(struct bpf_map *_map, void *_key,
+				   void *value)
+{
+	struct bpf_cgroup_storage_map *map = map_to_storage(_map);
+	struct bpf_cgroup_storage_key *key = _key;
+	struct bpf_cgroup_storage *storage;
+	int cpu, off = 0;
+	u32 size;
+
+	rcu_read_lock();
+	storage = cgroup_storage_lookup(map, key, false);
+	if (!storage) {
+		rcu_read_unlock();
+		return -ENOENT;
+	}
+
+	/* per_cpu areas are zero-filled and bpf programs can only
+	 * access 'value_size' of them, so copying rounded areas
+	 * will not leak any kernel data
+	 */
+	size = round_up(_map->value_size, 8);
+	for_each_possible_cpu(cpu) {
+		bpf_long_memcpy(value + off,
+				per_cpu_ptr(storage->percpu_buf, cpu), size);
+		off += size;
+	}
+	rcu_read_unlock();
+	return 0;
+}
+
+int bpf_percpu_cgroup_storage_update(struct bpf_map *_map, void *_key,
+				     void *value, u64 map_flags)
+{
+	struct bpf_cgroup_storage_map *map = map_to_storage(_map);
+	struct bpf_cgroup_storage_key *key = _key;
+	struct bpf_cgroup_storage *storage;
+	int cpu, off = 0;
+	u32 size;
+
+	if (map_flags != BPF_ANY && map_flags != BPF_EXIST)
+		return -EINVAL;
+
+	rcu_read_lock();
+	storage = cgroup_storage_lookup(map, key, false);
+	if (!storage) {
+		rcu_read_unlock();
+		return -ENOENT;
+	}
+
+	/* the user space will provide round_up(value_size, 8) bytes that
+	 * will be copied into per-cpu area. bpf programs can only access
+	 * value_size of it. During lookup the same extra bytes will be
+	 * returned or zeros which were zero-filled by percpu_alloc,
+	 * so no kernel data leaks possible
+	 */
+	size = round_up(_map->value_size, 8);
+	for_each_possible_cpu(cpu) {
+		bpf_long_memcpy(per_cpu_ptr(storage->percpu_buf, cpu),
+				value + off, size);
+		off += size;
+	}
+	rcu_read_unlock();
+	return 0;
+}
+
 static int cgroup_storage_get_next_key(struct bpf_map *_map, void *_key,
 				       void *_next_key)
 {
@@ -254,6 +320,7 @@ const struct bpf_map_ops cgroup_storage_map_ops = {
 
 int bpf_cgroup_storage_assign(struct bpf_prog *prog, struct bpf_map *_map)
 {
+	enum bpf_cgroup_storage_type stype = cgroup_storage_type(_map);
 	struct bpf_cgroup_storage_map *map = map_to_storage(_map);
 	int ret = -EBUSY;
 
@@ -261,11 +328,12 @@ int bpf_cgroup_storage_assign(struct bpf_prog *prog, struct bpf_map *_map)
 
 	if (map->prog && map->prog != prog)
 		goto unlock;
-	if (prog->aux->cgroup_storage && prog->aux->cgroup_storage != _map)
+	if (prog->aux->cgroup_storage[stype] &&
+	    prog->aux->cgroup_storage[stype] != _map)
 		goto unlock;
 
 	map->prog = prog;
-	prog->aux->cgroup_storage = _map;
+	prog->aux->cgroup_storage[stype] = _map;
 	ret = 0;
 unlock:
 	spin_unlock_bh(&map->lock);
@@ -275,70 +343,117 @@ unlock:
 
 void bpf_cgroup_storage_release(struct bpf_prog *prog, struct bpf_map *_map)
 {
+	enum bpf_cgroup_storage_type stype = cgroup_storage_type(_map);
 	struct bpf_cgroup_storage_map *map = map_to_storage(_map);
 
 	spin_lock_bh(&map->lock);
 	if (map->prog == prog) {
-		WARN_ON(prog->aux->cgroup_storage != _map);
+		WARN_ON(prog->aux->cgroup_storage[stype] != _map);
 		map->prog = NULL;
-		prog->aux->cgroup_storage = NULL;
+		prog->aux->cgroup_storage[stype] = NULL;
 	}
 	spin_unlock_bh(&map->lock);
 }
 
-struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(struct bpf_prog *prog)
+static size_t bpf_cgroup_storage_calculate_size(struct bpf_map *map, u32 *pages)
+{
+	size_t size;
+
+	if (cgroup_storage_type(map) == BPF_CGROUP_STORAGE_SHARED) {
+		size = sizeof(struct bpf_storage_buffer) + map->value_size;
+		*pages = round_up(sizeof(struct bpf_cgroup_storage) + size,
+				  PAGE_SIZE) >> PAGE_SHIFT;
+	} else {
+		size = map->value_size;
+		*pages = round_up(round_up(size, 8) * num_possible_cpus(),
+				  PAGE_SIZE) >> PAGE_SHIFT;
+	}
+
+	return size;
+}
+
+struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(struct bpf_prog *prog,
+					enum bpf_cgroup_storage_type stype)
 {
 	struct bpf_cgroup_storage *storage;
 	struct bpf_map *map;
+	gfp_t flags;
+	size_t size;
 	u32 pages;
 
-	map = prog->aux->cgroup_storage;
+	map = prog->aux->cgroup_storage[stype];
 	if (!map)
 		return NULL;
 
-	pages = round_up(sizeof(struct bpf_cgroup_storage) +
-			 sizeof(struct bpf_storage_buffer) +
-			 map->value_size, PAGE_SIZE) >> PAGE_SHIFT;
+	size = bpf_cgroup_storage_calculate_size(map, &pages);
+
 	if (bpf_map_charge_memlock(map, pages))
 		return ERR_PTR(-EPERM);
 
 	storage = kmalloc_node(sizeof(struct bpf_cgroup_storage),
 			       __GFP_ZERO | GFP_USER, map->numa_node);
-	if (!storage) {
-		bpf_map_uncharge_memlock(map, pages);
-		return ERR_PTR(-ENOMEM);
-	}
+	if (!storage)
+		goto enomem;
 
-	storage->buf = kmalloc_node(sizeof(struct bpf_storage_buffer) +
-				    map->value_size, __GFP_ZERO | GFP_USER,
-				    map->numa_node);
-	if (!storage->buf) {
-		bpf_map_uncharge_memlock(map, pages);
-		kfree(storage);
-		return ERR_PTR(-ENOMEM);
+	flags = __GFP_ZERO | GFP_USER;
+
+	if (stype == BPF_CGROUP_STORAGE_SHARED) {
+		storage->buf = kmalloc_node(size, flags, map->numa_node);
+		if (!storage->buf)
+			goto enomem;
+	} else {
+		storage->percpu_buf = __alloc_percpu_gfp(size, 8, flags);
+		if (!storage->percpu_buf)
+			goto enomem;
 	}
 
 	storage->map = (struct bpf_cgroup_storage_map *)map;
 
 	return storage;
+
+enomem:
+	bpf_map_uncharge_memlock(map, pages);
+	kfree(storage);
+	return ERR_PTR(-ENOMEM);
+}
+
+static void free_shared_cgroup_storage_rcu(struct rcu_head *rcu)
+{
+	struct bpf_cgroup_storage *storage =
+		container_of(rcu, struct bpf_cgroup_storage, rcu);
+
+	kfree(storage->buf);
+	kfree(storage);
+}
+
+static void free_percpu_cgroup_storage_rcu(struct rcu_head *rcu)
+{
+	struct bpf_cgroup_storage *storage =
+		container_of(rcu, struct bpf_cgroup_storage, rcu);
+
+	free_percpu(storage->percpu_buf);
+	kfree(storage);
 }
 
 void bpf_cgroup_storage_free(struct bpf_cgroup_storage *storage)
 {
-	u32 pages;
+	enum bpf_cgroup_storage_type stype;
 	struct bpf_map *map;
+	u32 pages;
 
 	if (!storage)
 		return;
 
 	map = &storage->map->map;
-	pages = round_up(sizeof(struct bpf_cgroup_storage) +
-			 sizeof(struct bpf_storage_buffer) +
-			 map->value_size, PAGE_SIZE) >> PAGE_SHIFT;
+
+	bpf_cgroup_storage_calculate_size(map, &pages);
 	bpf_map_uncharge_memlock(map, pages);
 
-	kfree_rcu(storage->buf, rcu);
-	kfree_rcu(storage, rcu);
+	stype = cgroup_storage_type(map);
+	if (stype == BPF_CGROUP_STORAGE_SHARED)
+		call_rcu(&storage->rcu, free_shared_cgroup_storage_rcu);
+	else
+		call_rcu(&storage->rcu, free_percpu_cgroup_storage_rcu);
 }
 
 void bpf_cgroup_storage_link(struct bpf_cgroup_storage *storage,
author	Linus Torvalds <torvalds@linux-foundation.org>	2018-10-24 06:47:44 +0100
committer	Linus Torvalds <torvalds@linux-foundation.org>	2018-10-24 06:47:44 +0100
commit	50b825d7e87f4cff7070df6eb26390152bb29537 (patch)
tree	ec82aba49ab0c4743266ff37e18c8304a0367d06 /kernel/bpf/local_storage.c
parent	a97a2d4d56ea596871b739d63d41b084733bd9fb (diff)
parent	3f80e08f40cdb308589a49077c87632fa4508b21 (diff)
download	linux-50b825d7e87f4cff7070df6eb26390152bb29537.tar.gz linux-50b825d7e87f4cff7070df6eb26390152bb29537.tar.bz2 linux-50b825d7e87f4cff7070df6eb26390152bb29537.zip