diff options
author | David S. Miller <davem@davemloft.net> | 2018-10-08 23:42:44 -0700 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2018-10-08 23:42:44 -0700 |
commit | 071a234ad744ab9a1e9c948874d5f646a2964734 (patch) | |
tree | 85f9d2f5a69e31749e01460e49c859ef1f56b616 | |
parent | 9000a457a0c84883874a844ef94adf26f633f3b4 (diff) | |
parent | df3f94a0bbeb6cb6a02eb16b8e76f16b33cb2f8f (diff) | |
download | linux-071a234ad744ab9a1e9c948874d5f646a2964734.tar.gz linux-071a234ad744ab9a1e9c948874d5f646a2964734.tar.bz2 linux-071a234ad744ab9a1e9c948874d5f646a2964734.zip |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:
====================
pull-request: bpf-next 2018-10-08
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) sk_lookup_[tcp|udp] and sk_release helpers from Joe Stringer which allow
BPF programs to perform lookups for sockets in a network namespace. This would
allow programs to determine early on in processing whether the stack is
expecting to receive the packet, and perform some action (eg drop,
forward somewhere) based on this information.
2) per-cpu cgroup local storage from Roman Gushchin.
Per-cpu cgroup local storage is very similar to simple cgroup storage
except all the data is per-cpu. The main goal of per-cpu variant is to
implement super fast counters (e.g. packet counters), which don't require
neither lookups, neither atomic operations in a fast path.
The example of these hybrid counters is in selftests/bpf/netcnt_prog.c
3) allow HW offload of programs with BPF-to-BPF function calls from Quentin Monnet
4) support more than 64-byte key/value in HW offloaded BPF maps from Jakub Kicinski
5) rename of libbpf interfaces from Andrey Ignatov.
libbpf is maturing as a library and should follow good practices in
library design and implementation to play well with other libraries.
This patch set brings consistent naming convention to global symbols.
6) relicense libbpf as LGPL-2.1 OR BSD-2-Clause from Alexei Starovoitov
to let Apache2 projects use libbpf
7) various AF_XDP fixes from Björn and Magnus
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
66 files changed, 4140 insertions, 663 deletions
diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst index ff929cfab4f4..4ae4f9d8f8fe 100644 --- a/Documentation/networking/af_xdp.rst +++ b/Documentation/networking/af_xdp.rst @@ -159,8 +159,8 @@ log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050 and 3000 refers to the same chunk. -UMEM Completetion Ring -~~~~~~~~~~~~~~~~~~~~~~ +UMEM Completion Ring +~~~~~~~~~~~~~~~~~~~~ The Completion Ring is used transfer ownership of UMEM frames from kernel-space to user-space. Just like the Fill ring, UMEM indicies are diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt index e6b4ebb2b243..2196b824e96c 100644 --- a/Documentation/networking/filter.txt +++ b/Documentation/networking/filter.txt @@ -203,11 +203,11 @@ opcodes as defined in linux/filter.h stand for: Instruction Addressing mode Description - ld 1, 2, 3, 4, 10 Load word into A + ld 1, 2, 3, 4, 12 Load word into A ldi 4 Load word into A ldh 1, 2 Load half-word into A ldb 1, 2 Load byte into A - ldx 3, 4, 5, 10 Load word into X + ldx 3, 4, 5, 12 Load word into X ldxi 4 Load word into X ldxb 5 Load byte into X @@ -216,14 +216,14 @@ opcodes as defined in linux/filter.h stand for: jmp 6 Jump to label ja 6 Jump to label - jeq 7, 8 Jump on A == k - jneq 8 Jump on A != k - jne 8 Jump on A != k - jlt 8 Jump on A < k - jle 8 Jump on A <= k - jgt 7, 8 Jump on A > k - jge 7, 8 Jump on A >= k - jset 7, 8 Jump on A & k + jeq 7, 8, 9, 10 Jump on A == <x> + jneq 9, 10 Jump on A != <x> + jne 9, 10 Jump on A != <x> + jlt 9, 10 Jump on A < <x> + jle 9, 10 Jump on A <= <x> + jgt 7, 8, 9, 10 Jump on A > <x> + jge 7, 8, 9, 10 Jump on A >= <x> + jset 7, 8, 9, 10 Jump on A & <x> add 0, 4 A + <x> sub 0, 4 A - <x> @@ -240,7 +240,7 @@ opcodes as defined in linux/filter.h stand for: tax Copy A into X txa Copy X into A - ret 4, 9 Return + ret 4, 11 Return The next table shows addressing formats from the 2nd column: @@ -254,9 +254,11 @@ The next table shows addressing formats from the 2nd column: 5 4*([k]&0xf) Lower nibble * 4 at byte offset k in the packet 6 L Jump label L 7 #k,Lt,Lf Jump to Lt if true, otherwise jump to Lf - 8 #k,Lt Jump to Lt if predicate is true - 9 a/%a Accumulator A - 10 extension BPF extension + 8 x/%x,Lt,Lf Jump to Lt if true, otherwise jump to Lf + 9 #k,Lt Jump to Lt if predicate is true + 10 x/%x,Lt Jump to Lt if predicate is true + 11 a/%a Accumulator A + 12 extension BPF extension The Linux kernel also has a couple of BPF extensions that are used along with the class of load instructions by "overloading" the k argument with @@ -1125,6 +1127,14 @@ pointer type. The types of pointers describe their base, as follows: PTR_TO_STACK Frame pointer. PTR_TO_PACKET skb->data. PTR_TO_PACKET_END skb->data + headlen; arithmetic forbidden. + PTR_TO_SOCKET Pointer to struct bpf_sock_ops, implicitly refcounted. + PTR_TO_SOCKET_OR_NULL + Either a pointer to a socket, or NULL; socket lookup + returns this type, which becomes a PTR_TO_SOCKET when + checked != NULL. PTR_TO_SOCKET is reference-counted, + so programs must release the reference through the + socket release function before the end of the program. + Arithmetic on these pointers is forbidden. However, a pointer may be offset from this base (as a result of pointer arithmetic), and this is tracked in two parts: the 'fixed offset' and 'variable offset'. The former is used when an exactly-known value (e.g. an immediate @@ -1171,6 +1181,13 @@ over the Ethernet header, then reads IHL and addes (IHL * 4), the resulting pointer will have a variable offset known to be 4n+2 for some n, so adding the 2 bytes (NET_IP_ALIGN) gives a 4-byte alignment and so word-sized accesses through that pointer are safe. +The 'id' field is also used on PTR_TO_SOCKET and PTR_TO_SOCKET_OR_NULL, common +to all copies of the pointer returned from a socket lookup. This has similar +behaviour to the handling for PTR_TO_MAP_VALUE_OR_NULL->PTR_TO_MAP_VALUE, but +it also handles reference tracking for the pointer. PTR_TO_SOCKET implicitly +represents a reference to the corresponding 'struct sock'. To ensure that the +reference is not leaked, it is imperative to NULL-check the reference and in +the non-NULL case, and pass the valid reference to the socket release function. Direct packet access -------------------- @@ -1444,6 +1461,55 @@ Error: 8: (7a) *(u64 *)(r0 +0) = 1 R0 invalid mem access 'imm' +Program that performs a socket lookup then sets the pointer to NULL without +checking it: +value: + BPF_MOV64_IMM(BPF_REG_2, 0), + BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_MOV64_IMM(BPF_REG_3, 4), + BPF_MOV64_IMM(BPF_REG_4, 0), + BPF_MOV64_IMM(BPF_REG_5, 0), + BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), +Error: + 0: (b7) r2 = 0 + 1: (63) *(u32 *)(r10 -8) = r2 + 2: (bf) r2 = r10 + 3: (07) r2 += -8 + 4: (b7) r3 = 4 + 5: (b7) r4 = 0 + 6: (b7) r5 = 0 + 7: (85) call bpf_sk_lookup_tcp#65 + 8: (b7) r0 = 0 + 9: (95) exit + Unreleased reference id=1, alloc_insn=7 + +Program that performs a socket lookup but does not NULL-check the returned +value: + BPF_MOV64_IMM(BPF_REG_2, 0), + BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_MOV64_IMM(BPF_REG_3, 4), + BPF_MOV64_IMM(BPF_REG_4, 0), + BPF_MOV64_IMM(BPF_REG_5, 0), + BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp), + BPF_EXIT_INSN(), +Error: + 0: (b7) r2 = 0 + 1: (63) *(u32 *)(r10 -8) = r2 + 2: (bf) r2 = r10 + 3: (07) r2 += -8 + 4: (b7) r3 = 4 + 5: (b7) r4 = 0 + 6: (b7) r5 = 0 + 7: (85) call bpf_sk_lookup_tcp#65 + 8: (95) exit + Unreleased reference id=1, alloc_insn=7 + Testing ------- diff --git a/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c b/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c index 2572a4b91c7c..fdcd2bc98916 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c @@ -89,15 +89,32 @@ nfp_bpf_cmsg_alloc(struct nfp_app_bpf *bpf, unsigned int size) return skb; } +static unsigned int +nfp_bpf_cmsg_map_req_size(struct nfp_app_bpf *bpf, unsigned int n) +{ + unsigned int size; + + size = sizeof(struct cmsg_req_map_op); + size += (bpf->cmsg_key_sz + bpf->cmsg_val_sz) * n; + + return size; +} + static struct sk_buff * nfp_bpf_cmsg_map_req_alloc(struct nfp_app_bpf *bpf, unsigned int n) { + return nfp_bpf_cmsg_alloc(bpf, nfp_bpf_cmsg_map_req_size(bpf, n)); +} + +static unsigned int +nfp_bpf_cmsg_map_reply_size(struct nfp_app_bpf *bpf, unsigned int n) +{ unsigned int size; - size = sizeof(struct cmsg_req_map_op); - size += sizeof(struct cmsg_key_value_pair) * n; + size = sizeof(struct cmsg_reply_map_op); + size += (bpf->cmsg_key_sz + bpf->cmsg_val_sz) * n; - return nfp_bpf_cmsg_alloc(bpf, size); + return size; } static u8 nfp_bpf_cmsg_get_type(struct sk_buff *skb) @@ -338,6 +355,34 @@ void nfp_bpf_ctrl_free_map(struct nfp_app_bpf *bpf, struct nfp_bpf_map *nfp_map) dev_consume_skb_any(skb); } +static void * +nfp_bpf_ctrl_req_key(struct nfp_app_bpf *bpf, struct cmsg_req_map_op *req, + unsigned int n) +{ + return &req->data[bpf->cmsg_key_sz * n + bpf->cmsg_val_sz * n]; +} + +static void * +nfp_bpf_ctrl_req_val(struct nfp_app_bpf *bpf, struct cmsg_req_map_op *req, + unsigned int n) +{ + return &req->data[bpf->cmsg_key_sz * (n + 1) + bpf->cmsg_val_sz * n]; +} + +static void * +nfp_bpf_ctrl_reply_key(struct nfp_app_bpf *bpf, struct cmsg_reply_map_op *reply, + unsigned int n) +{ + return &reply->data[bpf->cmsg_key_sz * n + bpf->cmsg_val_sz * n]; +} + +static void * +nfp_bpf_ctrl_reply_val(struct nfp_app_bpf *bpf, struct cmsg_reply_map_op *reply, + unsigned int n) +{ + return &reply->data[bpf->cmsg_key_sz * (n + 1) + bpf->cmsg_val_sz * n]; +} + static int nfp_bpf_ctrl_entry_op(struct bpf_offloaded_map *offmap, enum nfp_bpf_cmsg_type op, @@ -366,12 +411,13 @@ nfp_bpf_ctrl_entry_op(struct bpf_offloaded_map *offmap, /* Copy inputs */ if (key) - memcpy(&req->elem[0].key, key, map->key_size); + memcpy(nfp_bpf_ctrl_req_key(bpf, req, 0), key, map->key_size); if (value) - memcpy(&req->elem[0].value, value, map->value_size); + memcpy(nfp_bpf_ctrl_req_val(bpf, req, 0), value, + map->value_size); skb = nfp_bpf_cmsg_communicate(bpf, skb, op, - sizeof(*reply) + sizeof(*reply->elem)); + nfp_bpf_cmsg_map_reply_size(bpf, 1)); if (IS_ERR(skb)) return PTR_ERR(skb); @@ -382,9 +428,11 @@ nfp_bpf_ctrl_entry_op(struct bpf_offloaded_map *offmap, /* Copy outputs */ if (out_key) - memcpy(out_key, &reply->elem[0].key, map->key_size); + memcpy(out_key, nfp_bpf_ctrl_reply_key(bpf, reply, 0), + map->key_size); if (out_value) - memcpy(out_value, &reply->elem[0].value, map->value_size); + memcpy(out_value, nfp_bpf_ctrl_reply_val(bpf, reply, 0), + map->value_size); dev_consume_skb_any(skb); @@ -428,6 +476,13 @@ int nfp_bpf_ctrl_getnext_entry(struct bpf_offloaded_map *offmap, key, NULL, 0, next_key, NULL); } +unsigned int nfp_bpf_ctrl_cmsg_mtu(struct nfp_app_bpf *bpf) +{ + return max3((unsigned int)NFP_NET_DEFAULT_MTU, + nfp_bpf_cmsg_map_req_size(bpf, 1), + nfp_bpf_cmsg_map_reply_size(bpf, 1)); +} + void nfp_bpf_ctrl_msg_rx(struct nfp_app *app, struct sk_buff *skb) { struct nfp_app_bpf *bpf = app->priv; diff --git a/drivers/net/ethernet/netronome/nfp/bpf/fw.h b/drivers/net/ethernet/netronome/nfp/bpf/fw.h index e4f9b7ec8528..813644e90b27 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/fw.h +++ b/drivers/net/ethernet/netronome/nfp/bpf/fw.h @@ -52,6 +52,7 @@ enum bpf_cap_tlv_type { NFP_BPF_CAP_TYPE_RANDOM = 4, NFP_BPF_CAP_TYPE_QUEUE_SELECT = 5, NFP_BPF_CAP_TYPE_ADJUST_TAIL = 6, + NFP_BPF_CAP_TYPE_ABI_VERSION = 7, }; struct nfp_bpf_cap_tlv_func { @@ -98,6 +99,7 @@ enum nfp_bpf_cmsg_type { #define CMSG_TYPE_MAP_REPLY_BIT 7 #define __CMSG_REPLY(req) (BIT(CMSG_TYPE_MAP_REPLY_BIT) | (req)) +/* BPF ABIv2 fixed-length control message fields */ #define CMSG_MAP_KEY_LW 16 #define CMSG_MAP_VALUE_LW 16 @@ -147,24 +149,19 @@ struct cmsg_reply_map_free_tbl { __be32 count; }; -struct cmsg_key_value_pair { - __be32 key[CMSG_MAP_KEY_LW]; - __be32 value[CMSG_MAP_VALUE_LW]; -}; - struct cmsg_req_map_op { struct cmsg_hdr hdr; __be32 tid; __be32 count; __be32 flags; - struct cmsg_key_value_pair elem[0]; + u8 data[0]; }; struct cmsg_reply_map_op { struct cmsg_reply_map_simple reply_hdr; __be32 count; __be32 resv; - struct cmsg_key_value_pair elem[0]; + u8 data[0]; }; struct cmsg_bpf_event { diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c index eff57f7d056a..6ed1b5207ecd 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c @@ -267,6 +267,38 @@ emit_br_bset(struct nfp_prog *nfp_prog, swreg src, u8 bit, u16 addr, u8 defer) } static void +__emit_br_alu(struct nfp_prog *nfp_prog, u16 areg, u16 breg, u16 imm_hi, + u8 defer, bool dst_lmextn, bool src_lmextn) +{ + u64 insn; + + insn = OP_BR_ALU_BASE | + FIELD_PREP(OP_BR_ALU_A_SRC, areg) | + FIELD_PREP(OP_BR_ALU_B_SRC, breg) | + FIELD_PREP(OP_BR_ALU_DEFBR, defer) | + FIELD_PREP(OP_BR_ALU_IMM_HI, imm_hi) | + FIELD_PREP(OP_BR_ALU_SRC_LMEXTN, src_lmextn) | + FIELD_PREP(OP_BR_ALU_DST_LMEXTN, dst_lmextn); + + nfp_prog_push(nfp_prog, insn); +} + +static void emit_rtn(struct nfp_prog *nfp_prog, swreg base, u8 defer) +{ + struct nfp_insn_ur_regs reg; + int err; + + err = swreg_to_unrestricted(reg_none(), base, reg_imm(0), ®); + if (err) { + nfp_prog->error = err; + return; + } + + __emit_br_alu(nfp_prog, reg.areg, reg.breg, 0, defer, reg.dst_lmextn, + reg.src_lmextn); +} + +static void __emit_immed(struct nfp_prog *nfp_prog, u16 areg, u16 breg, u16 imm_hi, enum immed_width width, bool invert, enum immed_shift shift, bool wr_both, @@ -1137,7 +1169,7 @@ mem_op_stack(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, unsigned int size, unsigned int ptr_off, u8 gpr, u8 ptr_gpr, bool clr_gpr, lmem_step step) { - s32 off = nfp_prog->stack_depth + meta->insn.off + ptr_off; + s32 off = nfp_prog->stack_frame_depth + meta->insn.off + ptr_off; bool first = true, last; bool needs_inc = false; swreg stack_off_reg; @@ -1146,7 +1178,8 @@ mem_op_stack(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, bool lm3 = true; int ret; - if (meta->ptr_not_const) { + if (meta->ptr_not_const || + meta->flags & FLAG_INSN_PTR_CALLER_STACK_FRAME) { /* Use of the last encountered ptr_off is OK, they all have * the same alignment. Depend on low bits of value being * discarded when written to LMaddr register. @@ -1695,7 +1728,7 @@ map_call_stack_common(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) s64 lm_off; /* We only have to reload LM0 if the key is not at start of stack */ - lm_off = nfp_prog->stack_depth; + lm_off = nfp_prog->stack_frame_depth; lm_off += meta->arg2.reg.var_off.value + meta->arg2.reg.off; load_lm_ptr = meta->arg2.var_off || lm_off; @@ -1808,10 +1841,10 @@ static int mov_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) swreg stack_depth_reg; stack_depth_reg = ur_load_imm_any(nfp_prog, - nfp_prog->stack_depth, + nfp_prog->stack_frame_depth, stack_imm(nfp_prog)); - emit_alu(nfp_prog, reg_both(dst), - stack_reg(nfp_prog), ALU_OP_ADD, stack_depth_reg); + emit_alu(nfp_prog, reg_both(dst), stack_reg(nfp_prog), + ALU_OP_ADD, stack_depth_reg); wrp_immed(nfp_prog, reg_both(dst + 1), 0); } else { wrp_reg_mov(nfp_prog, dst, src); @@ -3081,7 +3114,93 @@ static int jne_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) return wrp_test_reg(nfp_prog, meta, ALU_OP_XOR, BR_BNE); } -static int call(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) +static int +bpf_to_bpf_call(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) +{ + u32 ret_tgt, stack_depth, offset_br; + swreg tmp_reg; + + stack_depth = round_up(nfp_prog->stack_frame_depth, STACK_FRAME_ALIGN); + /* Space for saving the return address is accounted for by the callee, + * so stack_depth can be zero for the main function. + */ + if (stack_depth) { + tmp_reg = ur_load_imm_any(nfp_prog, stack_depth, + stack_imm(nfp_prog)); + emit_alu(nfp_prog, stack_reg(nfp_prog), + stack_reg(nfp_prog), ALU_OP_ADD, tmp_reg); + emit_csr_wr(nfp_prog, stack_reg(nfp_prog), + NFP_CSR_ACT_LM_ADDR0); + } + + /* Two cases for jumping to the callee: + * + * - If callee uses and needs to save R6~R9 then: + * 1. Put the start offset of the callee into imm_b(). This will + * require a fixup step, as we do not necessarily know this + * address yet. + * 2. Put the return address from the callee to the caller into + * register ret_reg(). + * 3. (After defer slots are consumed) Jump to the subroutine that + * pushes the registers to the stack. + * The subroutine acts as a trampoline, and returns to the address in + * imm_b(), i.e. jumps to the callee. + * + * - If callee does not need to save R6~R9 then just load return + * address to the caller in ret_reg(), and jump to the callee + * directly. + * + * Using ret_reg() to pass the return address to the callee is set here + * as a convention. The callee can then push this address onto its + * stack frame in its prologue. The advantages of passing the return + * address through ret_reg(), instead of pushing it to the stack right + * here, are the following: + * - It looks cleaner. + * - If the called function is called multiple time, we get a lower + * program size. + * - We save two no-op instructions that should be added just before + * the emit_br() when stack depth is not null otherwise. + * - If we ever find a register to hold the return address during whole + * execution of the callee, we will not have to push the return + * address to the stack for leaf functions. + */ + if (!meta->jmp_dst) { + pr_err("BUG: BPF-to-BPF call has no destination recorded\n"); + return -ELOOP; + } + if (nfp_prog->subprog[meta->jmp_dst->subprog_idx].needs_reg_push) { + ret_tgt = nfp_prog_current_offset(nfp_prog) + 3; + emit_br_relo(nfp_prog, BR_UNC, BR_OFF_RELO, 2, + RELO_BR_GO_CALL_PUSH_REGS); + offset_br = nfp_prog_current_offset(nfp_prog); + wrp_immed_relo(nfp_prog, imm_b(nfp_prog), 0, RELO_IMMED_REL); + } else { + ret_tgt = nfp_prog_current_offset(nfp_prog) + 2; + emit_br(nfp_prog, BR_UNC, meta->n + 1 + meta->insn.imm, 1); + offset_br = nfp_prog_current_offset(nfp_prog); + } + wrp_immed_relo(nfp_prog, ret_reg(nfp_prog), ret_tgt, RELO_IMMED_REL); + + if (!nfp_prog_confirm_current_offset(nfp_prog, ret_tgt)) + return -EINVAL; + + if (stack_depth) { + tmp_reg = ur_load_imm_any(nfp_prog, stack_depth, + stack_imm(nfp_prog)); + emit_alu(nfp_prog, stack_reg(nfp_prog), + stack_reg(nfp_prog), ALU_OP_SUB, tmp_reg); + emit_csr_wr(nfp_prog, stack_reg(nfp_prog), + NFP_CSR_ACT_LM_ADDR0); + wrp_nops(nfp_prog, 3); + } + + meta->num_insns_after_br = nfp_prog_current_offset(nfp_prog); + meta->num_insns_after_br -= offset_br; + + return 0; +} + +static int helper_call(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) { switch (meta->insn.imm) { case BPF_FUNC_xdp_adjust_head: @@ -3102,6 +3221,19 @@ static int call(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) } } +static int call(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) +{ + if (is_mbpf_pseudo_call(meta)) + return bpf_to_bpf_call(nfp_prog, meta); + else + return helper_call(nfp_prog, meta); +} + +static bool nfp_is_main_function(struct nfp_insn_meta *meta) +{ + return meta->subprog_idx == 0; +} + static int goto_out(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) { emit_br_relo(nfp_prog, BR_UNC, BR_OFF_RELO, 0, RELO_BR_GO_OUT); @@ -3109,6 +3241,39 @@ static int goto_out(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) return 0; } +static int +nfp_subprog_epilogue(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) +{ + if (nfp_prog->subprog[meta->subprog_idx].needs_reg_push) { + /* Pop R6~R9 to the stack via related subroutine. + * We loaded the return address to the caller into ret_reg(). + * This means that the subroutine does not come back here, we + * make it jump back to the subprogram caller directly! + */ + emit_br_relo(nfp_prog, BR_UNC, BR_OFF_RELO, 1, + RELO_BR_GO_CALL_POP_REGS); + /* Pop return address from the stack. */ + wrp_mov(nfp_prog, ret_reg(nfp_prog), reg_lm(0, 0)); + } else { + /* Pop return address from the stack. */ + wrp_mov(nfp_prog, ret_reg(nfp_prog), reg_lm(0, 0)); + /* Jump back to caller if no callee-saved registers were used + * by the subprogram. + */ + emit_rtn(nfp_prog, ret_reg(nfp_prog), 0); + } + + return 0; +} + +static int jmp_exit(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) +{ + if (nfp_is_main_function(meta)) + return goto_out(nfp_prog, meta); + else + return nfp_subprog_epilogue(nfp_prog, meta); +} + static const instr_cb_t instr_cb[256] = { [BPF_ALU64 | BPF_MOV | BPF_X] = mov_reg64, [BPF_ALU64 | BPF_MOV | BPF_K] = mov_imm64, @@ -3197,36 +3362,66 @@ static const instr_cb_t instr_cb[256] = { [BPF_JMP | BPF_JSET | BPF_X] = jset_reg, [BPF_JMP | BPF_JNE | BPF_X] = jne_reg, [BPF_JMP | BPF_CALL] = call, - [BPF_JMP | BPF_EXIT] = goto_out, + [BPF_JMP | BPF_EXIT] = jmp_exit, }; /* --- Assembler logic --- */ +static int +nfp_fixup_immed_relo(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, + struct nfp_insn_meta *jmp_dst, u32 br_idx) +{ + if (immed_get_value(nfp_prog->prog[br_idx + 1])) { + pr_err("BUG: failed to fix up callee register saving\n"); + return -EINVAL; + } + + immed_set_value(&nfp_prog->prog[br_idx + 1], jmp_dst->off); + + return 0; +} + static int nfp_fixup_branches(struct nfp_prog *nfp_prog) { struct nfp_insn_meta *meta, *jmp_dst; u32 idx, br_idx; + int err; list_for_each_entry(meta, &nfp_prog->insns, l) { if (meta->skip) continue; - if (meta->insn.code == (BPF_JMP | BPF_CALL)) - continue; if (BPF_CLASS(meta->insn.code) != BPF_JMP) continue; + if (meta->insn.code == (BPF_JMP | BPF_EXIT) && + !nfp_is_main_function(meta)) + continue; + if (is_mbpf_helper_call(meta)) + continue; if (list_is_last(&meta->l, &nfp_prog->insns)) br_idx = nfp_prog->last_bpf_off; else br_idx = list_next_entry(meta, l)->off - 1; + /* For BPF-to-BPF function call, a stack adjustment sequence is + * generated after the return instruction. Therefore, we must + * withdraw the length of this sequence to have br_idx pointing + * to where the "branch" NFP instruction is expected to be. + */ + if (is_mbpf_pseudo_call(meta)) + br_idx -= meta->num_insns_after_br; + if (!nfp_is_br(nfp_prog->prog[br_idx])) { pr_err("Fixup found block not ending in branch %d %02x %016llx!!\n", br_idx, meta->insn.code, nfp_prog->prog[br_idx]); return -ELOOP; } + + if (meta->insn.code == (BPF_JMP | BPF_EXIT)) + continue; + /* Leave special branches for later */ if (FIELD_GET(OP_RELO_TYPE, nfp_prog->prog[br_idx]) != - RELO_BR_REL) + RELO_BR_REL && !is_mbpf_pseudo_call(meta)) continue; if (!meta->jmp_dst) { @@ -3241,6 +3436,18 @@ static int nfp_fixup_branches(struct nfp_prog *nfp_prog) return -ELOOP; } + if (is_mbpf_pseudo_call(meta) && + nfp_prog->subprog[jmp_dst->subprog_idx].needs_reg_push) { + err = nfp_fixup_immed_relo(nfp_prog, meta, + jmp_dst, br_idx); + if (err) + return err; + } + + if (FIELD_GET(OP_RELO_TYPE, nfp_prog->prog[br_idx]) != + RELO_BR_REL) + continue; + for (idx = meta->off; idx <= br_idx; idx++) { if (!nfp_is_br(nfp_prog->prog[idx])) continue; @@ -3258,6 +3465,27 @@ static void nfp_intro(struct nfp_prog *nfp_prog) plen_reg(nfp_prog), ALU_OP_AND, pv_len(nfp_prog)); } +static void +nfp_subprog_prologue(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) +{ + /* Save return address into the stack. */ + wrp_mov(nfp_prog, reg_lm(0, 0), ret_reg(nfp_prog)); +} + +static void +nfp_start_subprog(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) +{ + unsigned int depth = nfp_prog->subprog[meta->subprog_idx].stack_depth; + + nfp_prog->stack_frame_depth = round_up(depth, 4); + nfp_subprog_prologue(nfp_prog, meta); +} + +bool nfp_is_subprog_start(struct nfp_insn_meta *meta) +{ + return meta->flags & FLAG_INSN_IS_SUBPROG_START; +} + static void nfp_outro_tc_da(struct nfp_prog *nfp_prog) { /* TC direct-action mode: @@ -3348,6 +3576,67 @@ static void nfp_outro_xdp(struct nfp_prog *nfp_prog) emit_ld_field(nfp_prog, reg_a(0), 0xc, reg_b(2), SHF_SC_L_SHF, 16); } +static bool nfp_prog_needs_callee_reg_save(struct nfp_prog *nfp_prog) +{ + unsigned int idx; + + for (idx = 1; idx < nfp_prog->subprog_cnt; idx++) + if (nfp_prog->subprog[idx].needs_reg_push) + return true; + + return false; +} + +static void nfp_push_callee_registers(struct nfp_prog *nfp_prog) +{ + u8 reg; + + /* Subroutine: Save all callee saved registers (R6 ~ R9). + * imm_b() holds the return address. + */ + nfp_prog->tgt_call_push_regs = nfp_prog_current_offset(nfp_prog); + for (reg = BPF_REG_6; reg <= BPF_REG_9; reg++) { + u8 adj = (reg - BPF_REG_0) * 2; + u8 idx = (reg - BPF_REG_6) * 2; + + /* The first slot in the stack frame is used to push the return + * address in bpf_to_bpf_call(), start just after. + */ + wrp_mov(nfp_prog, reg_lm(0, 1 + idx), reg_b(adj)); + + if (reg == BPF_REG_8) + /* Prepare to jump back, last 3 insns use defer slots */ + emit_rtn(nfp_prog, imm_b(nfp_prog), 3); + + wrp_mov(nfp_prog, reg_lm(0, 1 + idx + 1), reg_b(adj + 1)); + } +} + +static void nfp_pop_callee_registers(struct nfp_prog *nfp_prog) +{ + u8 reg; + + /* Subroutine: Restore all callee saved registers (R6 ~ R9). + * ret_reg() holds the return address. + */ + nfp_prog->tgt_call_pop_regs = nfp_prog_current_offset(nfp_prog); + for (reg = BPF_REG_6; reg <= BPF_REG_9; reg++) { + u8 adj = (reg - BPF_REG_0) * 2; + u8 idx = (reg - BPF_REG_6) * 2; + + /* The first slot in the stack frame holds the return address, + * start popping just after that. + */ + wrp_mov(nfp_prog, reg_both(adj), reg_lm(0, 1 + idx)); + + if (reg == BPF_REG_8) + /* Prepare to jump back, last 3 insns use defer slots */ + emit_rtn(nfp_prog, ret_reg(nfp_prog), 3); + + wrp_mov(nfp_prog, reg_both(adj + 1), reg_lm(0, 1 + idx + 1)); + } +} + static void nfp_outro(struct nfp_prog *nfp_prog) { switch (nfp_prog->type) { @@ -3360,13 +3649,23 @@ static void nfp_outro(struct nfp_prog *nfp_prog) default: WARN_ON(1); } + + if (!nfp_prog_needs_callee_reg_save(nfp_prog)) + return; + + nfp_push_callee_registers(nfp_prog); + nfp_pop_callee_registers(nfp_prog); } static int nfp_translate(struct nfp_prog *nfp_prog) { struct nfp_insn_meta *meta; + unsigned int depth; int err; + depth = nfp_prog->subprog[0].stack_depth; + nfp_prog->stack_frame_depth = round_up(depth, 4); + nfp_intro(nfp_prog); if (nfp_prog->error) return nfp_prog->error; @@ -3376,6 +3675,12 @@ static int nfp_translate(struct nfp_prog *nfp_prog) meta->off = nfp_prog_current_offset(nfp_prog); + if (nfp_is_subprog_start(meta)) { + nfp_start_subprog(nfp_prog, meta); + if (nfp_prog->error) + return nfp_prog->error; + } + if (meta->skip) { nfp_prog->n_translated++; continue; @@ -4018,20 +4323,35 @@ void nfp_bpf_jit_prepare(struct nfp_prog *nfp_prog, unsigned int cnt) /* Another pass to record jump information. */ list_for_each_entry(meta, &nfp_prog->insns, l) { + struct nfp_insn_meta *dst_meta; u64 code = meta->insn.code; + unsigned int dst_idx; + bool pseudo_call; + + if (BPF_CLASS(code) != BPF_JMP) + continue; + if (BPF_OP(code) == BPF_EXIT) + continue; + if (is_mbpf_helper_call(meta)) + continue; - if (BPF_CLASS(code) == BPF_JMP && BPF_OP(code) != BPF_EXIT && - BPF_OP(code) != BPF_CALL) { - struct nfp_insn_meta *dst_meta; - unsigned short dst_indx; + /* If opcode is BPF_CALL at this point, this can only be a + * BPF-to-BPF call (a.k.a pseudo call). + */ + pseudo_call = BPF_OP(code) == BPF_CALL; - dst_indx = meta->n + 1 + meta->insn.off; - dst_meta = nfp_bpf_goto_meta(nfp_prog, meta, dst_indx, - cnt); + if (pseudo_call) + dst_idx = meta->n + 1 + meta->insn.imm; + else + dst_idx = meta->n + 1 + meta->insn.off; - meta->jmp_dst = dst_meta; - dst_meta->flags |= FLAG_INSN_IS_JUMP_DST; - } + dst_meta = nfp_bpf_goto_meta(nfp_prog, meta, dst_idx, cnt); + + if (pseudo_call) + dst_meta->flags |= FLAG_INSN_IS_SUBPROG_START; + + dst_meta->flags |= FLAG_INSN_IS_JUMP_DST; + meta->jmp_dst = dst_meta; } } @@ -4054,6 +4374,7 @@ void *nfp_bpf_relo_for_vnic(struct nfp_prog *nfp_prog, struct nfp_bpf_vnic *bv) for (i = 0; i < nfp_prog->prog_len; i++) { enum nfp_relo_type special; u32 val; + u16 off; special = FIELD_GET(OP_RELO_TYPE, prog[i]); switch (special) { @@ -4070,6 +4391,24 @@ void *nfp_bpf_relo_for_vnic(struct nfp_prog *nfp_prog, struct nfp_bpf_vnic *bv) br_set_offset(&prog[i], nfp_prog->tgt_abort + bv->start_off); break; + case RELO_BR_GO_CALL_PUSH_REGS: + if (!nfp_prog->tgt_call_push_regs) { + pr_err("BUG: failed to detect subprogram registers needs\n"); + err = -EINVAL; + goto err_free_prog; + } + off = nfp_prog->tgt_call_push_regs + bv->start_off; + br_set_offset(&prog[i], off); + break; + case RELO_BR_GO_CALL_POP_REGS: + if (!nfp_prog->tgt_call_pop_regs) { + pr_err("BUG: failed to detect subprogram registers needs\n"); + err = -EINVAL; + goto err_free_prog; + } + off = nfp_prog->tgt_call_pop_regs + bv->start_off; + br_set_offset(&prog[i], off); + break; case RELO_BR_NEXT_PKT: br_set_offset(&prog[i], bv->tgt_done); break; diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c index 970af07f4656..d9d37aa860e0 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/main.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c @@ -54,11 +54,14 @@ const struct rhashtable_params nfp_bpf_maps_neutral_params = { static bool nfp_net_ebpf_capable(struct nfp_net *nn) { #ifdef __LITTLE_ENDIAN - if (nn->cap & NFP_NET_CFG_CTRL_BPF && - nn_readb(nn, NFP_NET_CFG_BPF_ABI) == NFP_NET_BPF_ABI) - return true; -#endif + struct nfp_app_bpf *bpf = nn->app->priv; + + return nn->cap & NFP_NET_CFG_CTRL_BPF && + bpf->abi_version && + nn_readb(nn, NFP_NET_CFG_BPF_ABI) == bpf->abi_version; +#else return false; +#endif } static int @@ -342,6 +345,26 @@ nfp_bpf_parse_cap_adjust_tail(struct nfp_app_bpf *bpf, void __iomem *value, return 0; } +static int +nfp_bpf_parse_cap_abi_version(struct nfp_app_bpf *bpf, void __iomem *value, + u32 length) +{ + if (length < 4) { + nfp_err(bpf->app->cpp, "truncated ABI version TLV: %d\n", + length); + return -EINVAL; + } + + bpf->abi_version = readl(value); + if (bpf->abi_version < 2 || bpf->abi_version > 3) { + nfp_warn(bpf->app->cpp, "unsupported BPF ABI version: %d\n", + bpf->abi_version); + bpf->abi_version = 0; + } + + return 0; +} + static int nfp_bpf_parse_capabilities(struct nfp_app *app) { struct nfp_cpp *cpp = app->pf->cpp; @@ -393,6 +416,11 @@ static int nfp_bpf_parse_capabilities(struct nfp_app *app) length)) goto err_release_free; break; + case NFP_BPF_CAP_TYPE_ABI_VERSION: + if (nfp_bpf_parse_cap_abi_version(app->priv, value, + length)) + goto err_release_free; + break; default: nfp_dbg(cpp, "unknown BPF capability: %d\n", type); break; @@ -414,6 +442,11 @@ err_release_free: return -EINVAL; } +static void nfp_bpf_init_capabilities(struct nfp_app_bpf *bpf) +{ + bpf->abi_version = 2; /* Original BPF ABI version */ +} + static int nfp_bpf_ndo_init(struct nfp_app *app, struct net_device *netdev) { struct nfp_app_bpf *bpf = app->priv; @@ -447,10 +480,21 @@ static int nfp_bpf_init(struct nfp_app *app) if (err) goto err_free_bpf; + nfp_bpf_init_capabilities(bpf); + err = nfp_bpf_parse_capabilities(app); if (err) goto err_free_neutral_maps; + if (bpf->abi_version < 3) { + bpf->cmsg_key_sz = CMSG_MAP_KEY_LW * 4; + bpf->cmsg_val_sz = CMSG_MAP_VALUE_LW * 4; + } else { + bpf->cmsg_key_sz = bpf->maps.max_key_sz; + bpf->cmsg_val_sz = bpf->maps.max_val_sz; + app->ctrl_mtu = nfp_bpf_ctrl_cmsg_mtu(bpf); + } + bpf->bpf_dev = bpf_offload_dev_create(); err = PTR_ERR_OR_ZERO(bpf->bpf_dev); if (err) diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h index dbd00982fd2b..25e10cfa2678 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/main.h +++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h @@ -61,6 +61,8 @@ enum nfp_relo_type { /* internal jumps to parts of the outro */ RELO_BR_GO_OUT, RELO_BR_GO_ABORT, + RELO_BR_GO_CALL_PUSH_REGS, + RELO_BR_GO_CALL_POP_REGS, /* external jumps to fixed addresses */ RELO_BR_NEXT_PKT, RELO_BR_HELPER, @@ -104,6 +106,7 @@ enum pkt_vec { #define imma_a(np) reg_a(STATIC_REG_IMMA) #define imma_b(np) reg_b(STATIC_REG_IMMA) #define imm_both(np) reg_both(STATIC_REG_IMM) +#define ret_reg(np) imm_a(np) #define NFP_BPF_ABI_FLAGS reg_imm(0) #define NFP_BPF_ABI_FLAG_MARK 1 @@ -121,12 +124,17 @@ enum pkt_vec { * @cmsg_replies: received cmsg replies waiting to be consumed * @cmsg_wq: work queue for waiting for cmsg replies * + * @cmsg_key_sz: size of key in cmsg element array + * @cmsg_val_sz: size of value in cmsg element array + * * @map_list: list of offloaded maps * @maps_in_use: number of currently offloaded maps * @map_elems_in_use: number of elements allocated to offloaded maps * * @maps_neutral: hash table of offload-neutral maps (on pointer) * + * @abi_version: global BPF ABI version + * * @adjust_head: adjust head capability * @adjust_head.flags: extra flags for adjust head * @adjust_head.off_min: minimal packet offset within buffer required @@ -164,12 +172,17 @@ struct nfp_app_bpf { struct sk_buff_head cmsg_replies; struct wait_queue_head cmsg_wq; + unsigned int cmsg_key_sz; + unsigned int cmsg_val_sz; + struct list_head map_list; unsigned int maps_in_use; unsigned int map_elems_in_use; struct rhashtable maps_neutral; + u32 abi_version; + struct nfp_bpf_cap_adjust_head { u32 flags; int off_min; @@ -252,7 +265,9 @@ struct nfp_bpf_reg_state { bool var_off; }; -#define FLAG_INSN_IS_JUMP_DST BIT(0) +#define FLAG_INSN_IS_JUMP_DST BIT(0) +#define FLAG_INSN_IS_SUBPROG_START BIT(1) +#define FLAG_INSN_PTR_CALLER_STACK_FRAME BIT(2) /** * struct nfp_insn_meta - BPF instruction wrapper @@ -269,6 +284,7 @@ struct nfp_bpf_reg_state { * @xadd_maybe_16bit: 16bit immediate is possible * @jmp_dst: destination info for jump instructions * @jump_neg_op: jump instruction has inverted immediate, use ADD instead of SUB + * @num_insns_after_br: number of insns following a branch jump, used for fixup * @func_id: function id for call instructions * @arg1: arg1 for call instructions * @arg2: arg2 for call instructions @@ -279,6 +295,7 @@ struct nfp_bpf_reg_state { * @off: index of first generated machine instruction (in nfp_prog.prog) * @n: eBPF instruction number * @flags: eBPF instruction extra optimization flags + * @subprog_idx: index of subprogram to which the instruction belongs * @skip: skip this instruction (optimized out) * @double_cb: callback for second part of the instruction * @l: link on nfp_prog->insns list @@ -304,6 +321,7 @@ struct nfp_insn_meta { struct { struct nfp_insn_meta *jmp_dst; bool jump_neg_op; + u32 num_insns_after_br; /* only for BPF-to-BPF calls */ }; /* function calls */ struct { @@ -325,6 +343,7 @@ struct nfp_insn_meta { unsigned int off; unsigned short n; unsigned short flags; + unsigned short subprog_idx; bool skip; instr_cb_t double_cb; @@ -413,6 +432,34 @@ static inline bool is_mbpf_div(const struct nfp_insn_meta *meta) return is_mbpf_alu(meta) && mbpf_op(meta) == BPF_DIV; } +static inline bool is_mbpf_helper_call(const struct nfp_insn_meta *meta) +{ + struct bpf_insn insn = meta->insn; + + return insn.code == (BPF_JMP | BPF_CALL) && + insn.src_reg != BPF_PSEUDO_CALL; +} + +static inline bool is_mbpf_pseudo_call(const struct nfp_insn_meta *meta) +{ + struct bpf_insn insn = meta->insn; + + return insn.code == (BPF_JMP | BPF_CALL) && + insn.src_reg == BPF_PSEUDO_CALL; +} + +#define STACK_FRAME_ALIGN 64 + +/** + * struct nfp_bpf_subprog_info - nfp BPF sub-program (a.k.a. function) info + * @stack_depth: maximum stack depth used by this sub-program + * @needs_reg_push: whether sub-program uses callee-saved registers + */ +struct nfp_bpf_subprog_info { + u16 stack_depth; + u8 needs_reg_push : 1; +}; + /** * struct nfp_prog - nfp BPF program * @bpf: backpointer to the bpf app priv structure @@ -424,12 +471,16 @@ static inline bool is_mbpf_div(const struct nfp_insn_meta *meta) * @last_bpf_off: address of the last instruction translated from BPF * @tgt_out: jump target for normal exit * @tgt_abort: jump target for abort (e.g. access outside of packet buffer) + * @tgt_call_push_regs: jump target for subroutine for saving R6~R9 to stack + * @tgt_call_pop_regs: jump target for subroutine used for restoring R6~R9 * @n_translated: number of successfully translated instructions (for errors) * @error: error code if something went wrong - * @stack_depth: max stack depth from the verifier + * @stack_frame_depth: max stack depth for current frame * @adjust_head_location: if program has single adjust head call - the insn no. * @map_records_cnt: the number of map pointers recorded for this prog + * @subprog_cnt: number of sub-programs, including main function * @map_records: the map record pointers from bpf->maps_neutral + * @subprog: pointer to an array of objects holding info about sub-programs * @insns: list of BPF instruction wrappers (struct nfp_insn_meta) */ struct nfp_prog { @@ -446,15 +497,19 @@ struct nfp_prog { unsigned int last_bpf_off; unsigned int tgt_out; unsigned int tgt_abort; + unsigned int tgt_call_push_regs; + unsigned int tgt_call_pop_regs; unsigned int n_translated; int error; - unsigned int stack_depth; + unsigned int stack_frame_depth; unsigned int adjust_head_location; unsigned int map_records_cnt; + unsigned int subprog_cnt; struct nfp_bpf_neutral_map **map_records; + struct nfp_bpf_subprog_info *subprog; struct list_head insns; }; @@ -471,6 +526,7 @@ struct nfp_bpf_vnic { unsigned int tgt_done; }; +bool nfp_is_subprog_start(struct nfp_insn_meta *meta); void nfp_bpf_jit_prepare(struct nfp_prog *nfp_prog, unsigned int cnt); int nfp_bpf_jit(struct nfp_prog *prog); bool nfp_bpf_supported_opcode(u8 code); @@ -492,6 +548,7 @@ nfp_bpf_goto_meta(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, void *nfp_bpf_relo_for_vnic(struct nfp_prog *nfp_prog, struct nfp_bpf_vnic *bv); +unsigned int nfp_bpf_ctrl_cmsg_mtu(struct nfp_app_bpf *bpf); long long int nfp_bpf_ctrl_alloc_map(struct nfp_app_bpf *bpf, struct bpf_map *map); void diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c b/drivers/net/ethernet/netronome/nfp/bpf/offload.c index 1ccd6371a15b..49c7bead8113 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c @@ -208,6 +208,8 @@ static void nfp_prog_free(struct nfp_prog *nfp_prog) { struct nfp_insn_meta *meta, *tmp; + kfree(nfp_prog->subprog); + list_for_each_entry_safe(meta, tmp, &nfp_prog->insns, l) { list_del(&meta->l); kfree(meta); @@ -250,18 +252,9 @@ err_free: static int nfp_bpf_translate(struct nfp_net *nn, struct bpf_prog *prog) { struct nfp_prog *nfp_prog = prog->aux->offload->dev_priv; - unsigned int stack_size; unsigned int max_instr; int err; - stack_size = nn_readb(nn, NFP_NET_CFG_BPF_STACK_SZ) * 64; - if (prog->aux->stack_depth > stack_size) { - nn_info(nn, "stack too large: program %dB > FW stack %dB\n", - prog->aux->stack_depth, stack_size); - return -EOPNOTSUPP; - } - nfp_prog->stack_depth = round_up(prog->aux->stack_depth, 4); - max_instr = nn_readw(nn, NFP_NET_CFG_BPF_MAX_LEN); nfp_prog->__prog_alloc_len = max_instr * sizeof(u64); diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c index a6e9248669e1..cddb70786a58 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c @@ -34,10 +34,12 @@ #include <linux/bpf.h> #include <linux/bpf_verifier.h> #include <linux/kernel.h> +#include <linux/netdevice.h> #include <linux/pkt_cls.h> #include "../nfp_app.h" #include "../nfp_main.h" +#include "../nfp_net.h" #include "fw.h" #include "main.h" @@ -155,8 +157,9 @@ nfp_bpf_map_call_ok(const char *fname, struct bpf_verifier_env *env, } static int -nfp_bpf_check_call(struct nfp_prog *nfp_prog, struct bpf_verifier_env *env, - struct nfp_insn_meta *meta) +nfp_bpf_check_helper_call(struct nfp_prog *nfp_prog, + struct bpf_verifier_env *env, + struct nfp_insn_meta *meta) { const struct bpf_reg_state *reg1 = cur_regs(env) + BPF_REG_1; const struct bpf_reg_state *reg2 = cur_regs(env) + BPF_REG_2; @@ -333,6 +336,9 @@ nfp_bpf_check_stack_access(struct nfp_prog *nfp_prog, { s32 old_off, new_off; + if (reg->frameno != env->cur_state->curframe) + meta->flags |= FLAG_INSN_PTR_CALLER_STACK_FRAME; + if (!tnum_is_const(reg->var_off)) { pr_vlog(env, "variable ptr stack access\n"); return -EINVAL; @@ -620,8 +626,8 @@ nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx) return -EINVAL; } - if (meta->insn.code == (BPF_JMP | BPF_CALL)) - return nfp_bpf_check_call(nfp_prog, env, meta); + if (is_mbpf_helper_call(meta)) + return nfp_bpf_check_helper_call(nfp_prog, env, meta); if (meta->insn.code == (BPF_JMP | BPF_EXIT)) return nfp_bpf_check_exit(nfp_prog, env); @@ -640,6 +646,131 @@ nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx) return 0; } +static int +nfp_assign_subprog_idx_and_regs(struct bpf_verifier_env *env, + struct nfp_prog *nfp_prog) +{ + struct nfp_insn_meta *meta; + int index = 0; + + list_for_each_entry(meta, &nfp_prog->insns, l) { + if (nfp_is_subprog_start(meta)) + index++; + meta->subprog_idx = index; + + if (meta->insn.dst_reg >= BPF_REG_6 && + meta->insn.dst_reg <= BPF_REG_9) + nfp_prog->subprog[index].needs_reg_push = 1; + } + + if (index + 1 != nfp_prog->subprog_cnt) { + pr_vlog(env, "BUG: number of processed BPF functions is not consistent (processed %d, expected %d)\n", + index + 1, nfp_prog->subprog_cnt); + return -EFAULT; + } + + return 0; +} + +static unsigned int +nfp_bpf_get_stack_usage(struct nfp_prog *nfp_prog, unsigned int cnt) +{ + struct nfp_insn_meta *meta = nfp_prog_first_meta(nfp_prog); + unsigned int max_depth = 0, depth = 0, frame = 0; + struct nfp_insn_meta *ret_insn[MAX_CALL_FRAMES]; + unsigned short frame_depths[MAX_CALL_FRAMES]; + unsigned short ret_prog[MAX_CALL_FRAMES]; + unsigned short idx = meta->subprog_idx; + + /* Inspired from check_max_stack_depth() from kernel verifier. + * Starting from main subprogram, walk all instructions and recursively + * walk all callees that given subprogram can call. Since recursion is + * prevented by the kernel verifier, this algorithm only needs a local + * stack of MAX_CALL_FRAMES to remember callsites. + */ +process_subprog: + frame_depths[frame] = nfp_prog->subprog[idx].stack_depth; + frame_depths[frame] = round_up(frame_depths[frame], STACK_FRAME_ALIGN); + depth += frame_depths[frame]; + max_depth = max(max_depth, depth); + +continue_subprog: + for (; meta != nfp_prog_last_meta(nfp_prog) && meta->subprog_idx == idx; + meta = nfp_meta_next(meta)) { + if (!is_mbpf_pseudo_call(meta)) + continue; + + /* We found a call to a subprogram. Remember instruction to + * return to and subprog id. + */ + ret_insn[frame] = nfp_meta_next(meta); + ret_prog[frame] = idx; + + /* Find the callee and start processing it. */ + meta = nfp_bpf_goto_meta(nfp_prog, meta, + meta->n + 1 + meta->insn.imm, cnt); + idx = meta->subprog_idx; + frame++; + goto process_subprog; + } + /* End of for() loop means the last instruction of the subprog was + * reached. If we popped all stack frames, return; otherwise, go on + * processing remaining instructions from the caller. + */ + if (frame == 0) + return max_depth; + + depth -= frame_depths[frame]; + frame--; + meta = ret_insn[frame]; + idx = ret_prog[frame]; + goto continue_subprog; +} + +static int nfp_bpf_finalize(struct bpf_verifier_env *env) +{ + unsigned int stack_size, stack_needed; + struct bpf_subprog_info *info; + struct nfp_prog *nfp_prog; + struct nfp_net *nn; + int i; + + nfp_prog = env->prog->aux->offload->dev_priv; + nfp_prog->subprog_cnt = env->subprog_cnt; + nfp_prog->subprog = kcalloc(nfp_prog->subprog_cnt, + sizeof(nfp_prog->subprog[0]), GFP_KERNEL); + if (!nfp_prog->subprog) + return -ENOMEM; + + nfp_assign_subprog_idx_and_regs(env, nfp_prog); + + info = env->subprog_info; + for (i = 0; i < nfp_prog->subprog_cnt; i++) { + nfp_prog->subprog[i].stack_depth = info[i].stack_depth; + + if (i == 0) + continue; + + /* Account for size of return address. */ + nfp_prog->subprog[i].stack_depth += REG_WIDTH; + /* Account for size of saved registers, if necessary. */ + if (nfp_prog->subprog[i].needs_reg_push) + nfp_prog->subprog[i].stack_depth += BPF_REG_SIZE * 4; + } + + nn = netdev_priv(env->prog->aux->offload->netdev); + stack_size = nn_readb(nn, NFP_NET_CFG_BPF_STACK_SZ) * 64; + stack_needed = nfp_bpf_get_stack_usage(nfp_prog, env->prog->len); + if (stack_needed > stack_size) { + pr_vlog(env, "stack too large: program %dB > FW stack %dB\n", + stack_needed, stack_size); + return -EOPNOTSUPP; + } + + return 0; +} + const struct bpf_prog_offload_ops nfp_bpf_analyzer_ops = { - .insn_hook = nfp_verify_insn, + .insn_hook = nfp_verify_insn, + .finalize = nfp_bpf_finalize, }; diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.h b/drivers/net/ethernet/netronome/nfp/nfp_app.h index 4e1eb3395648..c896eb8f87a1 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_app.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_app.h @@ -40,6 +40,8 @@ #include "nfp_net_repr.h" +#define NFP_APP_CTRL_MTU_MAX U32_MAX + struct bpf_prog; struct net_device; struct netdev_bpf; @@ -178,6 +180,7 @@ struct nfp_app_type { * @ctrl: pointer to ctrl vNIC struct * @reprs: array of pointers to representors * @type: pointer to const application ops and info + * @ctrl_mtu: MTU to set on the control vNIC (set in .init()) * @priv: app-specific priv data */ struct nfp_app { @@ -189,6 +192,7 @@ struct nfp_app { struct nfp_reprs __rcu *reprs[NFP_REPR_TYPE_MAX + 1]; const struct nfp_app_type *type; + unsigned int ctrl_mtu; void *priv; }; diff --git a/drivers/net/ethernet/netronome/nfp/nfp_asm.h b/drivers/net/ethernet/netronome/nfp/nfp_asm.h index fad0e62a910c..5b257c603e91 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_asm.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_asm.h @@ -82,6 +82,15 @@ #define OP_BR_BIT_ADDR_LO OP_BR_ADDR_LO #define OP_BR_BIT_ADDR_HI OP_BR_ADDR_HI +#define OP_BR_ALU_BASE 0x0e800000000ULL +#define OP_BR_ALU_BASE_MASK 0x0ff80000000ULL +#define OP_BR_ALU_A_SRC 0x000000003ffULL +#define OP_BR_ALU_B_SRC 0x000000ffc00ULL +#define OP_BR_ALU_DEFBR 0x00000300000ULL +#define OP_BR_ALU_IMM_HI 0x0007fc00000ULL +#define OP_BR_ALU_SRC_LMEXTN 0x40000000000ULL +#define OP_BR_ALU_DST_LMEXTN 0x80000000000ULL + static inline bool nfp_is_br(u64 insn) { return (insn & OP_BR_BASE_MASK) == OP_BR_BASE || diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c index 24c8f5bb1eb4..7b91e77b2016 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c @@ -3884,10 +3884,20 @@ int nfp_net_init(struct nfp_net *nn) return err; /* Set default MTU and Freelist buffer size */ - if (nn->max_mtu < NFP_NET_DEFAULT_MTU) + if (!nfp_net_is_data_vnic(nn) && nn->app->ctrl_mtu) { + if (nn->app->ctrl_mtu <= nn->max_mtu) { + nn->dp.mtu = nn->app->ctrl_mtu; + } else { + if (nn->app->ctrl_mtu != NFP_APP_CTRL_MTU_MAX) + nn_warn(nn, "app requested MTU above max supported %u > %u\n", + nn->app->ctrl_mtu, nn->max_mtu); + nn->dp.mtu = nn->max_mtu; + } + } else if (nn->max_mtu < NFP_NET_DEFAULT_MTU) { nn->dp.mtu = nn->max_mtu; - else + } else { nn->dp.mtu = NFP_NET_DEFAULT_MTU; + } nn->dp.fl_bufsz = nfp_net_calc_fl_bufsz(&nn->dp); if (nfp_app_ctrl_uses_data_vnics(nn->app)) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h b/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h index a51490747689..863ca04fffbf 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h @@ -264,7 +264,6 @@ * %NFP_NET_CFG_BPF_ADDR: DMA address of the buffer with JITed BPF code */ #define NFP_NET_CFG_BPF_ABI 0x0080 -#define NFP_NET_BPF_ABI 2 #define NFP_NET_CFG_BPF_CAP 0x0081 #define NFP_NET_BPF_CAP_RELO (1 << 0) /* seamless reload */ #define NFP_NET_CFG_BPF_MAX_LEN 0x0082 diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c index 81444208b216..cb3518474f0e 100644 --- a/drivers/net/netdevsim/bpf.c +++ b/drivers/net/netdevsim/bpf.c @@ -86,8 +86,14 @@ nsim_bpf_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn) return 0; } +static int nsim_bpf_finalize(struct bpf_verifier_env *env) +{ + return 0; +} + static const struct bpf_prog_offload_ops nsim_bpf_analyzer_ops = { - .insn_hook = nsim_bpf_verify_insn, + .insn_hook = nsim_bpf_verify_insn, + .finalize = nsim_bpf_finalize, }; static bool nsim_xdp_offload_active(struct netdevsim *ns) diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index f91b0f8ff3a9..588dd5f0bd85 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -2,6 +2,7 @@ #ifndef _BPF_CGROUP_H #define _BPF_CGROUP_H +#include <linux/bpf.h> #include <linux/errno.h> #include <linux/jump_label.h> #include <linux/percpu.h> @@ -22,7 +23,11 @@ struct bpf_cgroup_storage; extern struct static_key_false cgroup_bpf_enabled_key; #define cgroup_bpf_enabled static_branch_unlikely(&cgroup_bpf_enabled_key) -DECLARE_PER_CPU(void*, bpf_cgroup_storage); +DECLARE_PER_CPU(struct bpf_cgroup_storage*, + bpf_cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]); + +#define for_each_cgroup_storage_type(stype) \ + for (stype = 0; stype < MAX_BPF_CGROUP_STORAGE_TYPE; stype++) struct bpf_cgroup_storage_map; @@ -32,7 +37,10 @@ struct bpf_storage_buffer { }; struct bpf_cgroup_storage { - struct bpf_storage_buffer *buf; + union { + struct bpf_storage_buffer *buf; + void __percpu *percpu_buf; + }; struct bpf_cgroup_storage_map *map; struct bpf_cgroup_storage_key key; struct list_head list; @@ -43,7 +51,7 @@ struct bpf_cgroup_storage { struct bpf_prog_list { struct list_head node; struct bpf_prog *prog; - struct bpf_cgroup_storage *storage; + struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE]; }; struct bpf_prog_array; @@ -101,18 +109,26 @@ int __cgroup_bpf_run_filter_sock_ops(struct sock *sk, int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor, short access, enum bpf_attach_type type); -static inline void bpf_cgroup_storage_set(struct bpf_cgroup_storage *storage) +static inline enum bpf_cgroup_storage_type cgroup_storage_type( + struct bpf_map *map) { - struct bpf_storage_buffer *buf; + if (map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) + return BPF_CGROUP_STORAGE_PERCPU; + + return BPF_CGROUP_STORAGE_SHARED; +} - if (!storage) - return; +static inline void bpf_cgroup_storage_set(struct bpf_cgroup_storage + *storage[MAX_BPF_CGROUP_STORAGE_TYPE]) +{ + enum bpf_cgroup_storage_type stype; - buf = READ_ONCE(storage->buf); - this_cpu_write(bpf_cgroup_storage, &buf->data[0]); + for_each_cgroup_storage_type(stype) + this_cpu_write(bpf_cgroup_storage[stype], storage[stype]); } -struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(struct bpf_prog *prog); +struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(struct bpf_prog *prog, + enum bpf_cgroup_storage_type stype); void bpf_cgroup_storage_free(struct bpf_cgroup_storage *storage); void bpf_cgroup_storage_link(struct bpf_cgroup_storage *storage, struct cgroup *cgroup, @@ -121,6 +137,10 @@ void bpf_cgroup_storage_unlink(struct bpf_cgroup_storage *storage); int bpf_cgroup_storage_assign(struct bpf_prog *prog, struct bpf_map *map); void bpf_cgroup_storage_release(struct bpf_prog *prog, struct bpf_map *map); +int bpf_percpu_cgroup_storage_copy(struct bpf_map *map, void *key, void *value); +int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key, + void *value, u64 flags); + /* Wrappers for __cgroup_bpf_run_filter_skb() guarded by cgroup_bpf_enabled. */ #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk, skb) \ ({ \ @@ -265,15 +285,24 @@ static inline int cgroup_bpf_prog_query(const union bpf_attr *attr, return -EINVAL; } -static inline void bpf_cgroup_storage_set(struct bpf_cgroup_storage *storage) {} +static inline void bpf_cgroup_storage_set( + struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE]) {} static inline int bpf_cgroup_storage_assign(struct bpf_prog *prog, struct bpf_map *map) { return 0; } static inline void bpf_cgroup_storage_release(struct bpf_prog *prog, struct bpf_map *map) {} static inline struct bpf_cgroup_storage *bpf_cgroup_storage_alloc( - struct bpf_prog *prog) { return 0; } + struct bpf_prog *prog, enum bpf_cgroup_storage_type stype) { return 0; } static inline void bpf_cgroup_storage_free( struct bpf_cgroup_storage *storage) {} +static inline int bpf_percpu_cgroup_storage_copy(struct bpf_map *map, void *key, + void *value) { + return 0; +} +static inline int bpf_percpu_cgroup_storage_update(struct bpf_map *map, + void *key, void *value, u64 flags) { + return 0; +} #define cgroup_bpf_enabled (0) #define BPF_CGROUP_PRE_CONNECT_ENABLED(sk) (0) @@ -293,6 +322,8 @@ static inline void bpf_cgroup_storage_free( #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) ({ 0; }) #define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type,major,minor,access) ({ 0; }) +#define for_each_cgroup_storage_type(stype) for (; false; ) + #endif /* CONFIG_CGROUP_BPF */ #endif /* _BPF_CGROUP_H */ diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 988a00797bcd..9b558713447f 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -154,6 +154,7 @@ enum bpf_arg_type { ARG_PTR_TO_CTX, /* pointer to context */ ARG_ANYTHING, /* any (initialized) argument is ok */ + ARG_PTR_TO_SOCKET, /* pointer to bpf_sock */ }; /* type of values returned from helper functions */ @@ -162,6 +163,7 @@ enum bpf_return_type { RET_VOID, /* function doesn't return anything */ RET_PTR_TO_MAP_VALUE, /* returns a pointer to map elem value */ RET_PTR_TO_MAP_VALUE_OR_NULL, /* returns a pointer to map elem value or NULL */ + RET_PTR_TO_SOCKET_OR_NULL, /* returns a pointer to a socket or NULL */ }; /* eBPF function prototype used by verifier to allow BPF_CALLs from eBPF programs @@ -213,6 +215,8 @@ enum bpf_reg_type { PTR_TO_PACKET, /* reg points to skb->data */ PTR_TO_PACKET_END, /* skb->data + headlen */ PTR_TO_FLOW_KEYS, /* reg points to bpf_flow_keys */ + PTR_TO_SOCKET, /* reg points to struct bpf_sock */ + PTR_TO_SOCKET_OR_NULL, /* reg points to struct bpf_sock or NULL */ }; /* The information passed from prog-specific *_is_valid_access @@ -259,6 +263,7 @@ struct bpf_verifier_ops { struct bpf_prog_offload_ops { int (*insn_hook)(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx); + int (*finalize)(struct bpf_verifier_env *env); }; struct bpf_prog_offload { @@ -272,6 +277,14 @@ struct bpf_prog_offload { u32 jited_len; }; +enum bpf_cgroup_storage_type { + BPF_CGROUP_STORAGE_SHARED, + BPF_CGROUP_STORAGE_PERCPU, + __BPF_CGROUP_STORAGE_MAX +}; + +#define MAX_BPF_CGROUP_STORAGE_TYPE __BPF_CGROUP_STORAGE_MAX + struct bpf_prog_aux { atomic_t refcnt; u32 used_map_cnt; @@ -289,7 +302,7 @@ struct bpf_prog_aux { struct bpf_prog *prog; struct user_struct *user; u64 load_time; /* ns since boottime */ - struct bpf_map *cgroup_storage; + struct bpf_map *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]; char name[BPF_OBJ_NAME_LEN]; #ifdef CONFIG_SECURITY void *security; @@ -335,6 +348,11 @@ const struct bpf_func_proto *bpf_get_trace_printk_proto(void); typedef unsigned long (*bpf_ctx_copy_t)(void *dst, const void *src, unsigned long off, unsigned long len); +typedef u32 (*bpf_convert_ctx_access_t)(enum bpf_access_type type, + const struct bpf_insn *src, + struct bpf_insn *dst, + struct bpf_prog *prog, + u32 *target_size); u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size, void *ctx, u64 ctx_size, bpf_ctx_copy_t ctx_copy); @@ -358,7 +376,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr, */ struct bpf_prog_array_item { struct bpf_prog *prog; - struct bpf_cgroup_storage *cgroup_storage; + struct bpf_cgroup_storage *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]; }; struct bpf_prog_array { @@ -828,4 +846,29 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto; void bpf_user_rnd_init_once(void); u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5); +#if defined(CONFIG_NET) +bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type, + struct bpf_insn_access_aux *info); +u32 bpf_sock_convert_ctx_access(enum bpf_access_type type, + const struct bpf_insn *si, + struct bpf_insn *insn_buf, + struct bpf_prog *prog, + u32 *target_size); +#else +static inline bool bpf_sock_is_valid_access(int off, int size, + enum bpf_access_type type, + struct bpf_insn_access_aux *info) +{ + return false; +} +static inline u32 bpf_sock_convert_ctx_access(enum bpf_access_type type, + const struct bpf_insn *si, + struct bpf_insn *insn_buf, + struct bpf_prog *prog, + u32 *target_size) +{ + return 0; +} +#endif + #endif /* _LINUX_BPF_H */ diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h index c9bd6fb765b0..5432f4c9f50e 100644 --- a/include/linux/bpf_types.h +++ b/include/linux/bpf_types.h @@ -43,6 +43,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_CGROUP_ARRAY, cgroup_array_map_ops) #endif #ifdef CONFIG_CGROUP_BPF BPF_MAP_TYPE(BPF_MAP_TYPE_CGROUP_STORAGE, cgroup_storage_map_ops) +BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE, cgroup_storage_map_ops) #endif BPF_MAP_TYPE(BPF_MAP_TYPE_HASH, htab_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_HASH, htab_percpu_map_ops) diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index b42b60a83e19..9e8056ec20fa 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -58,6 +58,8 @@ struct bpf_reg_state { * offset, so they can share range knowledge. * For PTR_TO_MAP_VALUE_OR_NULL this is used to share which map value we * came from, when one is tested for != NULL. + * For PTR_TO_SOCKET this is used to share which pointers retain the + * same reference to the socket, to determine proper reference freeing. */ u32 id; /* For scalar types (SCALAR_VALUE), this represents our knowledge of @@ -102,6 +104,17 @@ struct bpf_stack_state { u8 slot_type[BPF_REG_SIZE]; }; +struct bpf_reference_state { + /* Track each reference created with a unique id, even if the same + * instruction creates the reference multiple times (eg, via CALL). + */ + int id; + /* Instruction where the allocation of this reference occurred. This + * is used purely to inform the user of a reference leak. + */ + int insn_idx; +}; + /* state of the program: * type of all registers and stack info */ @@ -119,7 +132,9 @@ struct bpf_func_state { */ u32 subprogno; - /* should be second to last. See copy_func_state() */ + /* The following fields should be last. See copy_func_state() */ + int acquired_refs; + struct bpf_reference_state *refs; int allocated_stack; struct bpf_stack_state *stack; }; @@ -131,6 +146,17 @@ struct bpf_verifier_state { u32 curframe; }; +#define bpf_get_spilled_reg(slot, frame) \ + (((slot < frame->allocated_stack / BPF_REG_SIZE) && \ + (frame->stack[slot].slot_type[0] == STACK_SPILL)) \ + ? &frame->stack[slot].spilled_ptr : NULL) + +/* Iterate over 'frame', setting 'reg' to either NULL or a spilled register. */ +#define bpf_for_each_spilled_reg(iter, frame, reg) \ + for (iter = 0, reg = bpf_get_spilled_reg(iter, frame); \ + iter < frame->allocated_stack / BPF_REG_SIZE; \ + iter++, reg = bpf_get_spilled_reg(iter, frame)) + /* linked list of verifier states used to prune search */ struct bpf_verifier_state_list { struct bpf_verifier_state state; @@ -204,15 +230,21 @@ __printf(2, 0) void bpf_verifier_vlog(struct bpf_verifier_log *log, __printf(2, 3) void bpf_verifier_log_write(struct bpf_verifier_env *env, const char *fmt, ...); -static inline struct bpf_reg_state *cur_regs(struct bpf_verifier_env *env) +static inline struct bpf_func_state *cur_func(struct bpf_verifier_env *env) { struct bpf_verifier_state *cur = env->cur_state; - return cur->frame[cur->curframe]->regs; + return cur->frame[cur->curframe]; +} + +static inline struct bpf_reg_state *cur_regs(struct bpf_verifier_env *env) +{ + return cur_func(env)->regs; } int bpf_prog_offload_verifier_prep(struct bpf_verifier_env *env); int bpf_prog_offload_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx); +int bpf_prog_offload_finalize(struct bpf_verifier_env *env); #endif /* _LINUX_BPF_VERIFIER_H */ diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f5f1f145018d..76603ee136a8 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -609,6 +609,9 @@ struct netdev_queue { /* Subordinate device that the queue has been assigned to */ struct net_device *sb_dev; +#ifdef CONFIG_XDP_SOCKETS + struct xdp_umem *umem; +#endif /* * write-mostly part */ @@ -738,6 +741,9 @@ struct netdev_rx_queue { struct kobject kobj; struct net_device *dev; struct xdp_rxq_info xdp_rxq; +#ifdef CONFIG_XDP_SOCKETS + struct xdp_umem *umem; +#endif } ____cacheline_aligned_in_smp; /* diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index 70a115bea4f4..13acb9803a6d 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -86,6 +86,7 @@ struct xdp_umem_fq_reuse *xsk_reuseq_prepare(u32 nentries); struct xdp_umem_fq_reuse *xsk_reuseq_swap(struct xdp_umem *umem, struct xdp_umem_fq_reuse *newq); void xsk_reuseq_free(struct xdp_umem_fq_reuse *rq); +struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev, u16 queue_id); static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr) { @@ -183,6 +184,12 @@ static inline void xsk_reuseq_free(struct xdp_umem_fq_reuse *rq) { } +static inline struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev, + u16 queue_id) +{ + return NULL; +} + static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr) { return NULL; diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index aa5ccd2385ed..f9187b41dff6 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -127,6 +127,7 @@ enum bpf_map_type { BPF_MAP_TYPE_SOCKHASH, BPF_MAP_TYPE_CGROUP_STORAGE, BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, + BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE, }; enum bpf_prog_type { @@ -2143,6 +2144,77 @@ union bpf_attr { * request in the skb. * Return * 0 on success, or a negative error in case of failure. + * + * struct bpf_sock *bpf_sk_lookup_tcp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u32 netns, u64 flags) + * Description + * Look for TCP socket matching *tuple*, optionally in a child + * network namespace *netns*. The return value must be checked, + * and if non-NULL, released via **bpf_sk_release**\ (). + * + * The *ctx* should point to the context of the program, such as + * the skb or socket (depending on the hook in use). This is used + * to determine the base network namespace for the lookup. + * + * *tuple_size* must be one of: + * + * **sizeof**\ (*tuple*\ **->ipv4**) + * Look for an IPv4 socket. + * **sizeof**\ (*tuple*\ **->ipv6**) + * Look for an IPv6 socket. + * + * If the *netns* is zero, then the socket lookup table in the + * netns associated with the *ctx* will be used. For the TC hooks, + * this in the netns of the device in the skb. For socket hooks, + * this in the netns of the socket. If *netns* is non-zero, then + * it specifies the ID of the netns relative to the netns + * associated with the *ctx*. + * + * All values for *flags* are reserved for future usage, and must + * be left at zero. + * + * This helper is available only if the kernel was compiled with + * **CONFIG_NET** configuration option. + * Return + * Pointer to *struct bpf_sock*, or NULL in case of failure. + * + * struct bpf_sock *bpf_sk_lookup_udp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u32 netns, u64 flags) + * Description + * Look for UDP socket matching *tuple*, optionally in a child + * network namespace *netns*. The return value must be checked, + * and if non-NULL, released via **bpf_sk_release**\ (). + * + * The *ctx* should point to the context of the program, such as + * the skb or socket (depending on the hook in use). This is used + * to determine the base network namespace for the lookup. + * + * *tuple_size* must be one of: + * + * **sizeof**\ (*tuple*\ **->ipv4**) + * Look for an IPv4 socket. + * **sizeof**\ (*tuple*\ **->ipv6**) + * Look for an IPv6 socket. + * + * If the *netns* is zero, then the socket lookup table in the + * netns associated with the *ctx* will be used. For the TC hooks, + * this in the netns of the device in the skb. For socket hooks, + * this in the netns of the socket. If *netns* is non-zero, then + * it specifies the ID of the netns relative to the netns + * associated with the *ctx*. + * + * All values for *flags* are reserved for future usage, and must + * be left at zero. + * + * This helper is available only if the kernel was compiled with + * **CONFIG_NET** configuration option. + * Return + * Pointer to *struct bpf_sock*, or NULL in case of failure. + * + * int bpf_sk_release(struct bpf_sock *sk) + * Description + * Release the reference held by *sock*. *sock* must be a non-NULL + * pointer that was returned from bpf_sk_lookup_xxx\ (). + * Return + * 0 on success, or a negative error in case of failure. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -2228,7 +2300,10 @@ union bpf_attr { FN(get_current_cgroup_id), \ FN(get_local_storage), \ FN(sk_select_reuseport), \ - FN(skb_ancestor_cgroup_id), + FN(skb_ancestor_cgroup_id), \ + FN(sk_lookup_tcp), \ + FN(sk_lookup_udp), \ + FN(sk_release), /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function eBPF program intends to call @@ -2398,6 +2473,23 @@ struct bpf_sock { */ }; +struct bpf_sock_tuple { + union { + struct { + __be32 saddr; + __be32 daddr; + __be16 sport; + __be16 dport; + } ipv4; + struct { + __be32 saddr[4]; + __be32 daddr[4]; + __be16 sport; + __be16 dport; + } ipv6; + }; +}; + #define XDP_PACKET_HEADROOM 256 /* User return codes for XDP prog type. diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index 6a7d931bbc55..00f6ed2e4f9a 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -25,6 +25,7 @@ EXPORT_SYMBOL(cgroup_bpf_enabled_key); */ void cgroup_bpf_put(struct cgroup *cgrp) { + enum bpf_cgroup_storage_type stype; unsigned int type; for (type = 0; type < ARRAY_SIZE(cgrp->bpf.progs); type++) { @@ -34,8 +35,10 @@ void cgroup_bpf_put(struct cgroup *cgrp) list_for_each_entry_safe(pl, tmp, progs, node) { list_del(&pl->node); bpf_prog_put(pl->prog); - bpf_cgroup_storage_unlink(pl->storage); - bpf_cgroup_storage_free(pl->storage); + for_each_cgroup_storage_type(stype) { + bpf_cgroup_storage_unlink(pl->storage[stype]); + bpf_cgroup_storage_free(pl->storage[stype]); + } kfree(pl); static_branch_dec(&cgroup_bpf_enabled_key); } @@ -97,6 +100,7 @@ static int compute_effective_progs(struct cgroup *cgrp, enum bpf_attach_type type, struct bpf_prog_array __rcu **array) { + enum bpf_cgroup_storage_type stype; struct bpf_prog_array *progs; struct bpf_prog_list *pl; struct cgroup *p = cgrp; @@ -125,7 +129,9 @@ static int compute_effective_progs(struct cgroup *cgrp, continue; progs->items[cnt].prog = pl->prog; - progs->items[cnt].cgroup_storage = pl->storage; + for_each_cgroup_storage_type(stype) + progs->items[cnt].cgroup_storage[stype] = + pl->storage[stype]; cnt++; } } while ((p = cgroup_parent(p))); @@ -232,7 +238,9 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog, { struct list_head *progs = &cgrp->bpf.progs[type]; struct bpf_prog *old_prog = NULL; - struct bpf_cgroup_storage *storage, *old_storage = NULL; + struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE], + *old_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {NULL}; + enum bpf_cgroup_storage_type stype; struct bpf_prog_list *pl; bool pl_was_allocated; int err; @@ -254,34 +262,44 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog, if (prog_list_length(progs) >= BPF_CGROUP_MAX_PROGS) return -E2BIG; - storage = bpf_cgroup_storage_alloc(prog); - if (IS_ERR(storage)) - return -ENOMEM; + for_each_cgroup_storage_type(stype) { + storage[stype] = bpf_cgroup_storage_alloc(prog, stype); + if (IS_ERR(storage[stype])) { + storage[stype] = NULL; + for_each_cgroup_storage_type(stype) + bpf_cgroup_storage_free(storage[stype]); + return -ENOMEM; + } + } if (flags & BPF_F_ALLOW_MULTI) { list_for_each_entry(pl, progs, node) { if (pl->prog == prog) { /* disallow attaching the same prog twice */ - bpf_cgroup_storage_free(storage); + for_each_cgroup_storage_type(stype) + bpf_cgroup_storage_free(storage[stype]); return -EINVAL; } } pl = kmalloc(sizeof(*pl), GFP_KERNEL); if (!pl) { - bpf_cgroup_storage_free(storage); + for_each_cgroup_storage_type(stype) + bpf_cgroup_storage_free(storage[stype]); return -ENOMEM; } pl_was_allocated = true; pl->prog = prog; - pl->storage = storage; + for_each_cgroup_storage_type(stype) + pl->storage[stype] = storage[stype]; list_add_tail(&pl->node, progs); } else { if (list_empty(progs)) { pl = kmalloc(sizeof(*pl), GFP_KERNEL); if (!pl) { - bpf_cgroup_storage_free(storage); + for_each_cgroup_storage_type(stype) + bpf_cgroup_storage_free(storage[stype]); return -ENOMEM; } pl_was_allocated = true; @@ -289,12 +307,15 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog, } else { pl = list_first_entry(progs, typeof(*pl), node); old_prog = pl->prog; - old_storage = pl->storage; - bpf_cgroup_storage_unlink(old_storage); + for_each_cgroup_storage_type(stype) { + old_storage[stype] = pl->storage[stype]; + bpf_cgroup_storage_unlink(old_storage[stype]); + } pl_was_allocated = false; } pl->prog = prog; - pl->storage = storage; + for_each_cgroup_storage_type(stype) + pl->storage[stype] = storage[stype]; } cgrp->bpf.flags[type] = flags; @@ -304,21 +325,27 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog, goto cleanup; static_branch_inc(&cgroup_bpf_enabled_key); - if (old_storage) - bpf_cgroup_storage_free(old_storage); + for_each_cgroup_storage_type(stype) { + if (!old_storage[stype]) + continue; + bpf_cgroup_storage_free(old_storage[stype]); + } if (old_prog) { bpf_prog_put(old_prog); static_branch_dec(&cgroup_bpf_enabled_key); } - bpf_cgroup_storage_link(storage, cgrp, type); + for_each_cgroup_storage_type(stype) + bpf_cgroup_storage_link(storage[stype], cgrp, type); return 0; cleanup: /* and cleanup the prog list */ pl->prog = old_prog; - bpf_cgroup_storage_free(pl->storage); - pl->storage = old_storage; - bpf_cgroup_storage_link(old_storage, cgrp, type); + for_each_cgroup_storage_type(stype) { + bpf_cgroup_storage_free(pl->storage[stype]); + pl->storage[stype] = old_storage[stype]; + bpf_cgroup_storage_link(old_storage[stype], cgrp, type); + } if (pl_was_allocated) { list_del(&pl->node); kfree(pl); @@ -339,6 +366,7 @@ int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog, enum bpf_attach_type type, u32 unused_flags) { struct list_head *progs = &cgrp->bpf.progs[type]; + enum bpf_cgroup_storage_type stype; u32 flags = cgrp->bpf.flags[type]; struct bpf_prog *old_prog = NULL; struct bpf_prog_list *pl; @@ -385,8 +413,10 @@ int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog, /* now can actually delete it from this cgroup list */ list_del(&pl->node); - bpf_cgroup_storage_unlink(pl->storage); - bpf_cgroup_storage_free(pl->storage); + for_each_cgroup_storage_type(stype) { + bpf_cgroup_storage_unlink(pl->storage[stype]); + bpf_cgroup_storage_free(pl->storage[stype]); + } kfree(pl); if (list_empty(progs)) /* last program was detached, reset flags to zero */ @@ -677,6 +707,8 @@ cgroup_dev_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_get_current_uid_gid_proto; case BPF_FUNC_get_local_storage: return &bpf_get_local_storage_proto; + case BPF_FUNC_get_current_cgroup_id: + return &bpf_get_current_cgroup_id_proto; case BPF_FUNC_trace_printk: if (capable(CAP_SYS_ADMIN)) return bpf_get_trace_printk_proto(); diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 1991466b8327..6502115e8f55 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -194,16 +194,28 @@ const struct bpf_func_proto bpf_get_current_cgroup_id_proto = { .ret_type = RET_INTEGER, }; -DECLARE_PER_CPU(void*, bpf_cgroup_storage); +#ifdef CONFIG_CGROUP_BPF +DECLARE_PER_CPU(struct bpf_cgroup_storage*, + bpf_cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]); BPF_CALL_2(bpf_get_local_storage, struct bpf_map *, map, u64, flags) { - /* map and flags arguments are not used now, - * but provide an ability to extend the API - * for other types of local storages. - * verifier checks that their values are correct. + /* flags argument is not used now, + * but provides an ability to extend the API. + * verifier checks that its value is correct. */ - return (unsigned long) this_cpu_read(bpf_cgroup_storage); + enum bpf_cgroup_storage_type stype = cgroup_storage_type(map); + struct bpf_cgroup_storage *storage; + void *ptr; + + storage = this_cpu_read(bpf_cgroup_storage[stype]); + + if (stype == BPF_CGROUP_STORAGE_SHARED) + ptr = &READ_ONCE(storage->buf)->data[0]; + else + ptr = this_cpu_ptr(storage->percpu_buf); + + return (unsigned long)ptr; } const struct bpf_func_proto bpf_get_local_storage_proto = { @@ -214,3 +226,4 @@ const struct bpf_func_proto bpf_get_local_storage_proto = { .arg2_type = ARG_ANYTHING, }; #endif +#endif diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c index 830d7f095748..c97a8f968638 100644 --- a/kernel/bpf/local_storage.c +++ b/kernel/bpf/local_storage.c @@ -7,7 +7,8 @@ #include <linux/rbtree.h> #include <linux/slab.h> -DEFINE_PER_CPU(void*, bpf_cgroup_storage); +DEFINE_PER_CPU(struct bpf_cgroup_storage*, + bpf_cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]); #ifdef CONFIG_CGROUP_BPF @@ -151,6 +152,71 @@ static int cgroup_storage_update_elem(struct bpf_map *map, void *_key, return 0; } +int bpf_percpu_cgroup_storage_copy(struct bpf_map *_map, void *_key, + void *value) +{ + struct bpf_cgroup_storage_map *map = map_to_storage(_map); + struct bpf_cgroup_storage_key *key = _key; + struct bpf_cgroup_storage *storage; + int cpu, off = 0; + u32 size; + + rcu_read_lock(); + storage = cgroup_storage_lookup(map, key, false); + if (!storage) { + rcu_read_unlock(); + return -ENOENT; + } + + /* per_cpu areas are zero-filled and bpf programs can only + * access 'value_size' of them, so copying rounded areas + * will not leak any kernel data + */ + size = round_up(_map->value_size, 8); + for_each_possible_cpu(cpu) { + bpf_long_memcpy(value + off, + per_cpu_ptr(storage->percpu_buf, cpu), size); + off += size; + } + rcu_read_unlock(); + return 0; +} + +int bpf_percpu_cgroup_storage_update(struct bpf_map *_map, void *_key, + void *value, u64 map_flags) +{ + struct bpf_cgroup_storage_map *map = map_to_storage(_map); + struct bpf_cgroup_storage_key *key = _key; + struct bpf_cgroup_storage *storage; + int cpu, off = 0; + u32 size; + + if (map_flags != BPF_ANY && map_flags != BPF_EXIST) + return -EINVAL; + + rcu_read_lock(); + storage = cgroup_storage_lookup(map, key, false); + if (!storage) { + rcu_read_unlock(); + return -ENOENT; + } + + /* the user space will provide round_up(value_size, 8) bytes that + * will be copied into per-cpu area. bpf programs can only access + * value_size of it. During lookup the same extra bytes will be + * returned or zeros which were zero-filled by percpu_alloc, + * so no kernel data leaks possible + */ + size = round_up(_map->value_size, 8); + for_each_possible_cpu(cpu) { + bpf_long_memcpy(per_cpu_ptr(storage->percpu_buf, cpu), + value + off, size); + off += size; + } + rcu_read_unlock(); + return 0; +} + static int cgroup_storage_get_next_key(struct bpf_map *_map, void *_key, void *_next_key) { @@ -254,6 +320,7 @@ const struct bpf_map_ops cgroup_storage_map_ops = { int bpf_cgroup_storage_assign(struct bpf_prog *prog, struct bpf_map *_map) { + enum bpf_cgroup_storage_type stype = cgroup_storage_type(_map); struct bpf_cgroup_storage_map *map = map_to_storage(_map); int ret = -EBUSY; @@ -261,11 +328,12 @@ int bpf_cgroup_storage_assign(struct bpf_prog *prog, struct bpf_map *_map) if (map->prog && map->prog != prog) goto unlock; - if (prog->aux->cgroup_storage && prog->aux->cgroup_storage != _map) + if (prog->aux->cgroup_storage[stype] && + prog->aux->cgroup_storage[stype] != _map) goto unlock; map->prog = prog; - prog->aux->cgroup_storage = _map; + prog->aux->cgroup_storage[stype] = _map; ret = 0; unlock: spin_unlock_bh(&map->lock); @@ -275,70 +343,117 @@ unlock: void bpf_cgroup_storage_release(struct bpf_prog *prog, struct bpf_map *_map) { + enum bpf_cgroup_storage_type stype = cgroup_storage_type(_map); struct bpf_cgroup_storage_map *map = map_to_storage(_map); spin_lock_bh(&map->lock); if (map->prog == prog) { - WARN_ON(prog->aux->cgroup_storage != _map); + WARN_ON(prog->aux->cgroup_storage[stype] != _map); map->prog = NULL; - prog->aux->cgroup_storage = NULL; + prog->aux->cgroup_storage[stype] = NULL; } spin_unlock_bh(&map->lock); } -struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(struct bpf_prog *prog) +static size_t bpf_cgroup_storage_calculate_size(struct bpf_map *map, u32 *pages) +{ + size_t size; + + if (cgroup_storage_type(map) == BPF_CGROUP_STORAGE_SHARED) { + size = sizeof(struct bpf_storage_buffer) + map->value_size; + *pages = round_up(sizeof(struct bpf_cgroup_storage) + size, + PAGE_SIZE) >> PAGE_SHIFT; + } else { + size = map->value_size; + *pages = round_up(round_up(size, 8) * num_possible_cpus(), + PAGE_SIZE) >> PAGE_SHIFT; + } + + return size; +} + +struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(struct bpf_prog *prog, + enum bpf_cgroup_storage_type stype) { struct bpf_cgroup_storage *storage; struct bpf_map *map; + gfp_t flags; + size_t size; u32 pages; - map = prog->aux->cgroup_storage; + map = prog->aux->cgroup_storage[stype]; if (!map) return NULL; - pages = round_up(sizeof(struct bpf_cgroup_storage) + - sizeof(struct bpf_storage_buffer) + - map->value_size, PAGE_SIZE) >> PAGE_SHIFT; + size = bpf_cgroup_storage_calculate_size(map, &pages); + if (bpf_map_charge_memlock(map, pages)) return ERR_PTR(-EPERM); storage = kmalloc_node(sizeof(struct bpf_cgroup_storage), __GFP_ZERO | GFP_USER, map->numa_node); - if (!storage) { - bpf_map_uncharge_memlock(map, pages); - return ERR_PTR(-ENOMEM); - } + if (!storage) + goto enomem; - storage->buf = kmalloc_node(sizeof(struct bpf_storage_buffer) + - map->value_size, __GFP_ZERO | GFP_USER, - map->numa_node); - if (!storage->buf) { - bpf_map_uncharge_memlock(map, pages); - kfree(storage); - return ERR_PTR(-ENOMEM); + flags = __GFP_ZERO | GFP_USER; + + if (stype == BPF_CGROUP_STORAGE_SHARED) { + storage->buf = kmalloc_node(size, flags, map->numa_node); + if (!storage->buf) + goto enomem; + } else { + storage->percpu_buf = __alloc_percpu_gfp(size, 8, flags); + if (!storage->percpu_buf) + goto enomem; } storage->map = (struct bpf_cgroup_storage_map *)map; return storage; + +enomem: + bpf_map_uncharge_memlock(map, pages); + kfree(storage); + return ERR_PTR(-ENOMEM); +} + +static void free_shared_cgroup_storage_rcu(struct rcu_head *rcu) +{ + struct bpf_cgroup_storage *storage = + container_of(rcu, struct bpf_cgroup_storage, rcu); + + kfree(storage->buf); + kfree(storage); +} + +static void free_percpu_cgroup_storage_rcu(struct rcu_head *rcu) +{ + struct bpf_cgroup_storage *storage = + container_of(rcu, struct bpf_cgroup_storage, rcu); + + free_percpu(storage->percpu_buf); + kfree(storage); } void bpf_cgroup_storage_free(struct bpf_cgroup_storage *storage) { - u32 pages; + enum bpf_cgroup_storage_type stype; struct bpf_map *map; + u32 pages; if (!storage) return; map = &storage->map->map; - pages = round_up(sizeof(struct bpf_cgroup_storage) + - sizeof(struct bpf_storage_buffer) + - map->value_size, PAGE_SIZE) >> PAGE_SHIFT; + + bpf_cgroup_storage_calculate_size(map, &pages); bpf_map_uncharge_memlock(map, pages); - kfree_rcu(storage->buf, rcu); - kfree_rcu(storage, rcu); + stype = cgroup_storage_type(map); + if (stype == BPF_CGROUP_STORAGE_SHARED) + call_rcu(&storage->rcu, free_shared_cgroup_storage_rcu); + else + call_rcu(&storage->rcu, free_percpu_cgroup_storage_rcu); } void bpf_cgroup_storage_link(struct bpf_cgroup_storage *storage, diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c index 3bfbf4464416..99d243e1ad6e 100644 --- a/kernel/bpf/map_in_map.c +++ b/kernel/bpf/map_in_map.c @@ -24,7 +24,8 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd) * in the verifier is not enough. */ if (inner_map->map_type == BPF_MAP_TYPE_PROG_ARRAY || - inner_map->map_type == BPF_MAP_TYPE_CGROUP_STORAGE) { + inner_map->map_type == BPF_MAP_TYPE_CGROUP_STORAGE || + inner_map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) { fdput(f); return ERR_PTR(-ENOTSUPP); } diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c index 177a52436394..8e93c47f0779 100644 --- a/kernel/bpf/offload.c +++ b/kernel/bpf/offload.c @@ -172,6 +172,24 @@ int bpf_prog_offload_verify_insn(struct bpf_verifier_env *env, return ret; } +int bpf_prog_offload_finalize(struct bpf_verifier_env *env) +{ + struct bpf_prog_offload *offload; + int ret = -ENODEV; + + down_read(&bpf_devs_lock); + offload = env->prog->aux->offload; + if (offload) { + if (offload->dev_ops->finalize) + ret = offload->dev_ops->finalize(env); + else + ret = 0; + } + up_read(&bpf_devs_lock); + + return ret; +} + static void __bpf_prog_offload_destroy(struct bpf_prog *prog) { struct bpf_prog_offload *offload = prog->aux->offload; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index b3c2d09bcf7a..5742df21598c 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -686,7 +686,8 @@ static int map_lookup_elem(union bpf_attr *attr) if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH || map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH || - map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) + map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY || + map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) value_size = round_up(map->value_size, 8) * num_possible_cpus(); else if (IS_FD_MAP(map)) value_size = sizeof(u32); @@ -705,6 +706,8 @@ static int map_lookup_elem(union bpf_attr *attr) err = bpf_percpu_hash_copy(map, key, value); } else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) { err = bpf_percpu_array_copy(map, key, value); + } else if (map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) { + err = bpf_percpu_cgroup_storage_copy(map, key, value); } else if (map->map_type == BPF_MAP_TYPE_STACK_TRACE) { err = bpf_stackmap_copy(map, key, value); } else if (IS_FD_ARRAY(map)) { @@ -774,7 +777,8 @@ static int map_update_elem(union bpf_attr *attr) if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH || map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH || - map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) + map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY || + map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) value_size = round_up(map->value_size, 8) * num_possible_cpus(); else value_size = map->value_size; @@ -809,6 +813,9 @@ static int map_update_elem(union bpf_attr *attr) err = bpf_percpu_hash_update(map, key, value, attr->flags); } else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) { err = bpf_percpu_array_update(map, key, value, attr->flags); + } else if (map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) { + err = bpf_percpu_cgroup_storage_update(map, key, value, + attr->flags); } else if (IS_FD_ARRAY(map)) { rcu_read_lock(); err = bpf_fd_array_map_update_elem(map, f.file, key, value, @@ -988,10 +995,15 @@ static int find_prog_type(enum bpf_prog_type type, struct bpf_prog *prog) /* drop refcnt on maps used by eBPF program and free auxilary data */ static void free_used_maps(struct bpf_prog_aux *aux) { + enum bpf_cgroup_storage_type stype; int i; - if (aux->cgroup_storage) - bpf_cgroup_storage_release(aux->prog, aux->cgroup_storage); + for_each_cgroup_storage_type(stype) { + if (!aux->cgroup_storage[stype]) + continue; + bpf_cgroup_storage_release(aux->prog, + aux->cgroup_storage[stype]); + } for (i = 0; i < aux->used_map_cnt; i++) bpf_map_put(aux->used_maps[i]); diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 3584ab27d25c..3f93a548a642 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1,5 +1,6 @@ /* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com * Copyright (c) 2016 Facebook + * Copyright (c) 2018 Covalent IO, Inc. http://covalent.io * * This program is free software; you can redistribute it and/or * modify it under the terms of version 2 of the GNU General Public @@ -80,8 +81,8 @@ static const struct bpf_verifier_ops * const bpf_verifier_ops[] = { * (like pointer plus pointer becomes SCALAR_VALUE type) * * When verifier sees load or store instructions the type of base register - * can be: PTR_TO_MAP_VALUE, PTR_TO_CTX, PTR_TO_STACK. These are three pointer - * types recognized by check_mem_access() function. + * can be: PTR_TO_MAP_VALUE, PTR_TO_CTX, PTR_TO_STACK, PTR_TO_SOCKET. These are + * four pointer types recognized by check_mem_access() function. * * PTR_TO_MAP_VALUE means that this register is pointing to 'map element value' * and the range of [ptr, ptr + map's value_size) is accessible. @@ -140,6 +141,24 @@ static const struct bpf_verifier_ops * const bpf_verifier_ops[] = { * * After the call R0 is set to return type of the function and registers R1-R5 * are set to NOT_INIT to indicate that they are no longer readable. + * + * The following reference types represent a potential reference to a kernel + * resource which, after first being allocated, must be checked and freed by + * the BPF program: + * - PTR_TO_SOCKET_OR_NULL, PTR_TO_SOCKET + * + * When the verifier sees a helper call return a reference type, it allocates a + * pointer id for the reference and stores it in the current function state. + * Similar to the way that PTR_TO_MAP_VALUE_OR_NULL is converted into + * PTR_TO_MAP_VALUE, PTR_TO_SOCKET_OR_NULL becomes PTR_TO_SOCKET when the type + * passes through a NULL-check conditional. For the branch wherein the state is + * changed to CONST_IMM, the verifier releases the reference. + * + * For each helper function that allocates a reference, such as + * bpf_sk_lookup_tcp(), there is a corresponding release function, such as + * bpf_sk_release(). When a reference type passes into the release function, + * the verifier also releases the reference. If any unchecked or unreleased + * reference remains at the end of the program, the verifier rejects it. */ /* verifier_state + insn_idx are pushed to stack when branch is encountered */ @@ -189,6 +208,7 @@ struct bpf_call_arg_meta { int access_size; s64 msize_smax_value; u64 msize_umax_value; + int ptr_id; }; static DEFINE_MUTEX(bpf_verifier_lock); @@ -249,6 +269,46 @@ static bool type_is_pkt_pointer(enum bpf_reg_type type) type == PTR_TO_PACKET_META; } +static bool reg_type_may_be_null(enum bpf_reg_type type) +{ + return type == PTR_TO_MAP_VALUE_OR_NULL || + type == PTR_TO_SOCKET_OR_NULL; +} + +static bool type_is_refcounted(enum bpf_reg_type type) +{ + return type == PTR_TO_SOCKET; +} + +static bool type_is_refcounted_or_null(enum bpf_reg_type type) +{ + return type == PTR_TO_SOCKET || type == PTR_TO_SOCKET_OR_NULL; +} + +static bool reg_is_refcounted(const struct bpf_reg_state *reg) +{ + return type_is_refcounted(reg->type); +} + +static bool reg_is_refcounted_or_null(const struct bpf_reg_state *reg) +{ + return type_is_refcounted_or_null(reg->type); +} + +static bool arg_type_is_refcounted(enum bpf_arg_type type) +{ + return type == ARG_PTR_TO_SOCKET; +} + +/* Determine whether the function releases some resources allocated by another + * function call. The first reference type argument will be assumed to be + * released by release_reference(). + */ +static bool is_release_function(enum bpf_func_id func_id) +{ + return func_id == BPF_FUNC_sk_release; +} + /* string representation of 'enum bpf_reg_type' */ static const char * const reg_type_str[] = { [NOT_INIT] = "?", @@ -262,6 +322,8 @@ static const char * const reg_type_str[] = { [PTR_TO_PACKET_META] = "pkt_meta", [PTR_TO_PACKET_END] = "pkt_end", [PTR_TO_FLOW_KEYS] = "flow_keys", + [PTR_TO_SOCKET] = "sock", + [PTR_TO_SOCKET_OR_NULL] = "sock_or_null", }; static char slot_type_char[] = { @@ -378,62 +440,158 @@ static void print_verifier_state(struct bpf_verifier_env *env, else verbose(env, "=%s", types_buf); } + if (state->acquired_refs && state->refs[0].id) { + verbose(env, " refs=%d", state->refs[0].id); + for (i = 1; i < state->acquired_refs; i++) + if (state->refs[i].id) + verbose(env, ",%d", state->refs[i].id); + } verbose(env, "\n"); } -static int copy_stack_state(struct bpf_func_state *dst, - const struct bpf_func_state *src) -{ - if (!src->stack) - return 0; - if (WARN_ON_ONCE(dst->allocated_stack < src->allocated_stack)) { - /* internal bug, make state invalid to reject the program */ - memset(dst, 0, sizeof(*dst)); - return -EFAULT; - } - memcpy(dst->stack, src->stack, - sizeof(*src->stack) * (src->allocated_stack / BPF_REG_SIZE)); - return 0; -} +#define COPY_STATE_FN(NAME, COUNT, FIELD, SIZE) \ +static int copy_##NAME##_state(struct bpf_func_state *dst, \ + const struct bpf_func_state *src) \ +{ \ + if (!src->FIELD) \ + return 0; \ + if (WARN_ON_ONCE(dst->COUNT < src->COUNT)) { \ + /* internal bug, make state invalid to reject the program */ \ + memset(dst, 0, sizeof(*dst)); \ + return -EFAULT; \ + } \ + memcpy(dst->FIELD, src->FIELD, \ + sizeof(*src->FIELD) * (src->COUNT / SIZE)); \ + return 0; \ +} +/* copy_reference_state() */ +COPY_STATE_FN(reference, acquired_refs, refs, 1) +/* copy_stack_state() */ +COPY_STATE_FN(stack, allocated_stack, stack, BPF_REG_SIZE) +#undef COPY_STATE_FN + +#define REALLOC_STATE_FN(NAME, COUNT, FIELD, SIZE) \ +static int realloc_##NAME##_state(struct bpf_func_state *state, int size, \ + bool copy_old) \ +{ \ + u32 old_size = state->COUNT; \ + struct bpf_##NAME##_state *new_##FIELD; \ + int slot = size / SIZE; \ + \ + if (size <= old_size || !size) { \ + if (copy_old) \ + return 0; \ + state->COUNT = slot * SIZE; \ + if (!size && old_size) { \ + kfree(state->FIELD); \ + state->FIELD = NULL; \ + } \ + return 0; \ + } \ + new_##FIELD = kmalloc_array(slot, sizeof(struct bpf_##NAME##_state), \ + GFP_KERNEL); \ + if (!new_##FIELD) \ + return -ENOMEM; \ + if (copy_old) { \ + if (state->FIELD) \ + memcpy(new_##FIELD, state->FIELD, \ + sizeof(*new_##FIELD) * (old_size / SIZE)); \ + memset(new_##FIELD + old_size / SIZE, 0, \ + sizeof(*new_##FIELD) * (size - old_size) / SIZE); \ + } \ + state->COUNT = slot * SIZE; \ + kfree(state->FIELD); \ + state->FIELD = new_##FIELD; \ + return 0; \ +} +/* realloc_reference_state() */ +REALLOC_STATE_FN(reference, acquired_refs, refs, 1) +/* realloc_stack_state() */ +REALLOC_STATE_FN(stack, allocated_stack, stack, BPF_REG_SIZE) +#undef REALLOC_STATE_FN /* do_check() starts with zero-sized stack in struct bpf_verifier_state to * make it consume minimal amount of memory. check_stack_write() access from * the program calls into realloc_func_state() to grow the stack size. - * Note there is a non-zero parent pointer inside each reg of bpf_verifier_state - * which this function copies over. It points to corresponding reg in previous - * bpf_verifier_state which is never reallocated + * Note there is a non-zero 'parent' pointer inside bpf_verifier_state + * which realloc_stack_state() copies over. It points to previous + * bpf_verifier_state which is never reallocated. + */ +static int realloc_func_state(struct bpf_func_state *state, int stack_size, + int refs_size, bool copy_old) +{ + int err = realloc_reference_state(state, refs_size, copy_old); + if (err) + return err; + return realloc_stack_state(state, stack_size, copy_old); +} + +/* Acquire a pointer id from the env and update the state->refs to include + * this new pointer reference. + * On success, returns a valid pointer id to associate with the register + * On failure, returns a negative errno. */ -static int realloc_func_state(struct bpf_func_state *state, int size, - bool copy_old) +static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx) { - u32 old_size = state->allocated_stack; - struct bpf_stack_state *new_stack; - int slot = size / BPF_REG_SIZE; + struct bpf_func_state *state = cur_func(env); + int new_ofs = state->acquired_refs; + int id, err; + + err = realloc_reference_state(state, state->acquired_refs + 1, true); + if (err) + return err; + id = ++env->id_gen; + state->refs[new_ofs].id = id; + state->refs[new_ofs].insn_idx = insn_idx; - if (size <= old_size || !size) { - if (copy_old) + return id; +} + +/* release function corresponding to acquire_reference_state(). Idempotent. */ +static int __release_reference_state(struct bpf_func_state *state, int ptr_id) +{ + int i, last_idx; + + if (!ptr_id) + return -EFAULT; + + last_idx = state->acquired_refs - 1; + for (i = 0; i < state->acquired_refs; i++) { + if (state->refs[i].id == ptr_id) { + if (last_idx && i != last_idx) + memcpy(&state->refs[i], &state->refs[last_idx], + sizeof(*state->refs)); + memset(&state->refs[last_idx], 0, sizeof(*state->refs)); + state->acquired_refs--; return 0; - state->allocated_stack = slot * BPF_REG_SIZE; - if (!size && old_size) { - kfree(state->stack); - state->stack = NULL; } - return 0; } - new_stack = kmalloc_array(slot, sizeof(struct bpf_stack_state), - GFP_KERNEL); - if (!new_stack) - return -ENOMEM; - if (copy_old) { - if (state->stack) - memcpy(new_stack, state->stack, - sizeof(*new_stack) * (old_size / BPF_REG_SIZE)); - memset(new_stack + old_size / BPF_REG_SIZE, 0, - sizeof(*new_stack) * (size - old_size) / BPF_REG_SIZE); - } - state->allocated_stack = slot * BPF_REG_SIZE; - kfree(state->stack); - state->stack = new_stack; + return -EFAULT; +} + +/* variation on the above for cases where we expect that there must be an + * outstanding reference for the specified ptr_id. + */ +static int release_reference_state(struct bpf_verifier_env *env, int ptr_id) +{ + struct bpf_func_state *state = cur_func(env); + int err; + + err = __release_reference_state(state, ptr_id); + if (WARN_ON_ONCE(err != 0)) + verbose(env, "verifier internal error: can't release reference\n"); + return err; +} + +static int transfer_reference_state(struct bpf_func_state *dst, + struct bpf_func_state *src) +{ + int err = realloc_reference_state(dst, src->acquired_refs, false); + if (err) + return err; + err = copy_reference_state(dst, src); + if (err) + return err; return 0; } @@ -441,6 +599,7 @@ static void free_func_state(struct bpf_func_state *state) { if (!state) return; + kfree(state->refs); kfree(state->stack); kfree(state); } @@ -466,10 +625,14 @@ static int copy_func_state(struct bpf_func_state *dst, { int err; - err = realloc_func_state(dst, src->allocated_stack, false); + err = realloc_func_state(dst, src->allocated_stack, src->acquired_refs, + false); + if (err) + return err; + memcpy(dst, src, offsetof(struct bpf_func_state, acquired_refs)); + err = copy_reference_state(dst, src); if (err) return err; - memcpy(dst, src, offsetof(struct bpf_func_state, allocated_stack)); return copy_stack_state(dst, src); } @@ -846,10 +1009,6 @@ static int check_subprogs(struct bpf_verifier_env *env) verbose(env, "function calls to other bpf functions are allowed for root only\n"); return -EPERM; } - if (bpf_prog_is_dev_bound(env->prog->aux)) { - verbose(env, "function calls in offloaded programs are not supported yet\n"); - return -EINVAL; - } ret = add_subprog(env, i + insn[i].imm + 1); if (ret < 0) return ret; @@ -968,6 +1127,8 @@ static bool is_spillable_regtype(enum bpf_reg_type type) case PTR_TO_PACKET_END: case PTR_TO_FLOW_KEYS: case CONST_PTR_TO_MAP: + case PTR_TO_SOCKET: + case PTR_TO_SOCKET_OR_NULL: return true; default: return false; @@ -992,7 +1153,7 @@ static int check_stack_write(struct bpf_verifier_env *env, enum bpf_reg_type type; err = realloc_func_state(state, round_up(slot + 1, BPF_REG_SIZE), - true); + state->acquired_refs, true); if (err) return err; /* caller checked that off % size == 0 and -MAX_BPF_STACK <= off < 0, @@ -1336,6 +1497,28 @@ static int check_flow_keys_access(struct bpf_verifier_env *env, int off, return 0; } +static int check_sock_access(struct bpf_verifier_env *env, u32 regno, int off, + int size, enum bpf_access_type t) +{ + struct bpf_reg_state *regs = cur_regs(env); + struct bpf_reg_state *reg = ®s[regno]; + struct bpf_insn_access_aux info; + + if (reg->smin_value < 0) { + verbose(env, "R%d min value is negative, either use unsigned index or do a if (index >=0) check.\n", + regno); + return -EACCES; + } + + if (!bpf_sock_is_valid_access(off, size, t, &info)) { + verbose(env, "invalid bpf_sock access off=%d size=%d\n", + off, size); + return -EACCES; + } + + return 0; +} + static bool __is_pointer_value(bool allow_ptr_leaks, const struct bpf_reg_state *reg) { @@ -1354,7 +1537,8 @@ static bool is_ctx_reg(struct bpf_verifier_env *env, int regno) { const struct bpf_reg_state *reg = cur_regs(env) + regno; - return reg->type == PTR_TO_CTX; + return reg->type == PTR_TO_CTX || + reg->type == PTR_TO_SOCKET; } static bool is_pkt_reg(struct bpf_verifier_env *env, int regno) @@ -1454,6 +1638,9 @@ static int check_ptr_alignment(struct bpf_verifier_env *env, */ strict = true; break; + case PTR_TO_SOCKET: + pointer_desc = "sock "; + break; default: break; } @@ -1721,6 +1908,14 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn err = check_flow_keys_access(env, off, size); if (!err && t == BPF_READ && value_regno >= 0) mark_reg_unknown(env, regs, value_regno); + } else if (reg->type == PTR_TO_SOCKET) { + if (t == BPF_WRITE) { + verbose(env, "cannot write into socket\n"); + return -EACCES; + } + err = check_sock_access(env, regno, off, size, t); + if (!err && value_regno >= 0) + mark_reg_unknown(env, regs, value_regno); } else { verbose(env, "R%d invalid mem access '%s'\n", regno, reg_type_str[reg->type]); @@ -1763,8 +1958,7 @@ static int check_xadd(struct bpf_verifier_env *env, int insn_idx, struct bpf_ins if (is_ctx_reg(env, insn->dst_reg) || is_pkt_reg(env, insn->dst_reg)) { verbose(env, "BPF_XADD stores into R%d %s is not allowed\n", - insn->dst_reg, is_ctx_reg(env, insn->dst_reg) ? - "context" : "packet"); + insn->dst_reg, reg_type_str[insn->dst_reg]); return -EACCES; } @@ -1944,6 +2138,16 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno, err = check_ctx_reg(env, reg, regno); if (err < 0) return err; + } else if (arg_type == ARG_PTR_TO_SOCKET) { + expected_type = PTR_TO_SOCKET; + if (type != expected_type) + goto err_type; + if (meta->ptr_id || !reg->id) { + verbose(env, "verifier internal error: mismatched references meta=%d, reg=%d\n", + meta->ptr_id, reg->id); + return -EFAULT; + } + meta->ptr_id = reg->id; } else if (arg_type_is_mem_ptr(arg_type)) { expected_type = PTR_TO_STACK; /* One exception here. In case function allows for NULL to be @@ -2074,6 +2278,7 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env, goto error; break; case BPF_MAP_TYPE_CGROUP_STORAGE: + case BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE: if (func_id != BPF_FUNC_get_local_storage) goto error; break; @@ -2164,7 +2369,8 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env, goto error; break; case BPF_FUNC_get_local_storage: - if (map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE) + if (map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE && + map->map_type != BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) goto error; break; case BPF_FUNC_sk_select_reuseport: @@ -2231,10 +2437,32 @@ static bool check_arg_pair_ok(const struct bpf_func_proto *fn) return true; } +static bool check_refcount_ok(const struct bpf_func_proto *fn) +{ + int count = 0; + + if (arg_type_is_refcounted(fn->arg1_type)) + count++; + if (arg_type_is_refcounted(fn->arg2_type)) + count++; + if (arg_type_is_refcounted(fn->arg3_type)) + count++; + if (arg_type_is_refcounted(fn->arg4_type)) + count++; + if (arg_type_is_refcounted(fn->arg5_type)) + count++; + + /* We only support one arg being unreferenced at the moment, + * which is sufficient for the helper functions we have right now. + */ + return count <= 1; +} + static int check_func_proto(const struct bpf_func_proto *fn) { return check_raw_mode_ok(fn) && - check_arg_pair_ok(fn) ? 0 : -EINVAL; + check_arg_pair_ok(fn) && + check_refcount_ok(fn) ? 0 : -EINVAL; } /* Packet data might have moved, any old PTR_TO_PACKET[_META,_END] @@ -2250,10 +2478,9 @@ static void __clear_all_pkt_pointers(struct bpf_verifier_env *env, if (reg_is_pkt_pointer_any(®s[i])) mark_reg_unknown(env, regs, i); - for (i = 0; i < state->allocated_stack / BPF_REG_SIZE; i++) { - if (state->stack[i].slot_type[0] != STACK_SPILL) + bpf_for_each_spilled_reg(i, state, reg) { + if (!reg) continue; - reg = &state->stack[i].spilled_ptr; if (reg_is_pkt_pointer_any(reg)) __mark_reg_unknown(reg); } @@ -2268,12 +2495,45 @@ static void clear_all_pkt_pointers(struct bpf_verifier_env *env) __clear_all_pkt_pointers(env, vstate->frame[i]); } +static void release_reg_references(struct bpf_verifier_env *env, + struct bpf_func_state *state, int id) +{ + struct bpf_reg_state *regs = state->regs, *reg; + int i; + + for (i = 0; i < MAX_BPF_REG; i++) + if (regs[i].id == id) + mark_reg_unknown(env, regs, i); + + bpf_for_each_spilled_reg(i, state, reg) { + if (!reg) + continue; + if (reg_is_refcounted(reg) && reg->id == id) + __mark_reg_unknown(reg); + } +} + +/* The pointer with the specified id has released its reference to kernel + * resources. Identify all copies of the same pointer and clear the reference. + */ +static int release_reference(struct bpf_verifier_env *env, + struct bpf_call_arg_meta *meta) +{ + struct bpf_verifier_state *vstate = env->cur_state; + int i; + + for (i = 0; i <= vstate->curframe; i++) + release_reg_references(env, vstate->frame[i], meta->ptr_id); + + return release_reference_state(env, meta->ptr_id); +} + static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn, int *insn_idx) { struct bpf_verifier_state *state = env->cur_state; struct bpf_func_state *caller, *callee; - int i, subprog, target_insn; + int i, err, subprog, target_insn; if (state->curframe + 1 >= MAX_CALL_FRAMES) { verbose(env, "the call stack of %d frames is too deep\n", @@ -2311,6 +2571,11 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn, state->curframe + 1 /* frameno within this callchain */, subprog /* subprog number within this prog */); + /* Transfer references to the callee */ + err = transfer_reference_state(callee, caller); + if (err) + return err; + /* copy r1 - r5 args that callee can access. The copy includes parent * pointers, which connects us up to the liveness chain */ @@ -2343,6 +2608,7 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx) struct bpf_verifier_state *state = env->cur_state; struct bpf_func_state *caller, *callee; struct bpf_reg_state *r0; + int err; callee = state->frame[state->curframe]; r0 = &callee->regs[BPF_REG_0]; @@ -2362,6 +2628,11 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx) /* return to the caller whatever r0 had in the callee */ caller->regs[BPF_REG_0] = *r0; + /* Transfer references to the caller */ + err = transfer_reference_state(caller, callee); + if (err) + return err; + *insn_idx = callee->callsite + 1; if (env->log.level) { verbose(env, "returning from callee:\n"); @@ -2418,6 +2689,18 @@ record_func_map(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta, return 0; } +static int check_reference_leak(struct bpf_verifier_env *env) +{ + struct bpf_func_state *state = cur_func(env); + int i; + + for (i = 0; i < state->acquired_refs; i++) { + verbose(env, "Unreleased reference id=%d alloc_insn=%d\n", + state->refs[i].id, state->refs[i].insn_idx); + } + return state->acquired_refs ? -EINVAL : 0; +} + static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn_idx) { const struct bpf_func_proto *fn = NULL; @@ -2496,6 +2779,18 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn return err; } + if (func_id == BPF_FUNC_tail_call) { + err = check_reference_leak(env); + if (err) { + verbose(env, "tail_call would lead to reference leak\n"); + return err; + } + } else if (is_release_function(func_id)) { + err = release_reference(env, &meta); + if (err) + return err; + } + regs = cur_regs(env); /* check that flags argument in get_local_storage(map, flags) is 0, @@ -2538,6 +2833,13 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn } regs[BPF_REG_0].map_ptr = meta.map_ptr; regs[BPF_REG_0].id = ++env->id_gen; + } else if (fn->ret_type == RET_PTR_TO_SOCKET_OR_NULL) { + int id = acquire_reference_state(env, insn_idx); + if (id < 0) + return id; + mark_reg_known_zero(env, regs, BPF_REG_0); + regs[BPF_REG_0].type = PTR_TO_SOCKET_OR_NULL; + regs[BPF_REG_0].id = id; } else { verbose(env, "unknown return type %d of func %s#%d\n", fn->ret_type, func_id_name(func_id), func_id); @@ -2668,20 +2970,20 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env, return -EACCES; } - if (ptr_reg->type == PTR_TO_MAP_VALUE_OR_NULL) { - verbose(env, "R%d pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL prohibited, null-check it first\n", - dst); - return -EACCES; - } - if (ptr_reg->type == CONST_PTR_TO_MAP) { - verbose(env, "R%d pointer arithmetic on CONST_PTR_TO_MAP prohibited\n", - dst); + switch (ptr_reg->type) { + case PTR_TO_MAP_VALUE_OR_NULL: + verbose(env, "R%d pointer arithmetic on %s prohibited, null-check it first\n", + dst, reg_type_str[ptr_reg->type]); return -EACCES; - } - if (ptr_reg->type == PTR_TO_PACKET_END) { - verbose(env, "R%d pointer arithmetic on PTR_TO_PACKET_END prohibited\n", - dst); + case CONST_PTR_TO_MAP: + case PTR_TO_PACKET_END: + case PTR_TO_SOCKET: + case PTR_TO_SOCKET_OR_NULL: + verbose(env, "R%d pointer arithmetic on %s prohibited\n", + dst, reg_type_str[ptr_reg->type]); return -EACCES; + default: + break; } /* In case of 'scalar += pointer', dst_reg inherits pointer type and id. @@ -3401,10 +3703,9 @@ static void find_good_pkt_pointers(struct bpf_verifier_state *vstate, for (j = 0; j <= vstate->curframe; j++) { state = vstate->frame[j]; - for (i = 0; i < state->allocated_stack / BPF_REG_SIZE; i++) { - if (state->stack[i].slot_type[0] != STACK_SPILL) + bpf_for_each_spilled_reg(i, state, reg) { + if (!reg) continue; - reg = &state->stack[i].spilled_ptr; if (reg->type == type && reg->id == dst_reg->id) reg->range = max(reg->range, new_range); } @@ -3610,12 +3911,11 @@ static void reg_combine_min_max(struct bpf_reg_state *true_src, } } -static void mark_map_reg(struct bpf_reg_state *regs, u32 regno, u32 id, - bool is_null) +static void mark_ptr_or_null_reg(struct bpf_func_state *state, + struct bpf_reg_state *reg, u32 id, + bool is_null) { - struct bpf_reg_state *reg = ®s[regno]; - - if (reg->type == PTR_TO_MAP_VALUE_OR_NULL && reg->id == id) { + if (reg_type_may_be_null(reg->type) && reg->id == id) { /* Old offset (both fixed and variable parts) should * have been known-zero, because we don't allow pointer * arithmetic on pointers that might be NULL. @@ -3628,40 +3928,49 @@ static void mark_map_reg(struct bpf_reg_state *regs, u32 regno, u32 id, } if (is_null) { reg->type = SCALAR_VALUE; - } else if (reg->map_ptr->inner_map_meta) { - reg->type = CONST_PTR_TO_MAP; - reg->map_ptr = reg->map_ptr->inner_map_meta; - } else { - reg->type = PTR_TO_MAP_VALUE; + } else if (reg->type == PTR_TO_MAP_VALUE_OR_NULL) { + if (reg->map_ptr->inner_map_meta) { + reg->type = CONST_PTR_TO_MAP; + reg->map_ptr = reg->map_ptr->inner_map_meta; + } else { + reg->type = PTR_TO_MAP_VALUE; + } + } else if (reg->type == PTR_TO_SOCKET_OR_NULL) { + reg->type = PTR_TO_SOCKET; + } + if (is_null || !reg_is_refcounted(reg)) { + /* We don't need id from this point onwards anymore, + * thus we should better reset it, so that state + * pruning has chances to take effect. + */ + reg->id = 0; } - /* We don't need id from this point onwards anymore, thus we - * should better reset it, so that state pruning has chances - * to take effect. - */ - reg->id = 0; } } /* The logic is similar to find_good_pkt_pointers(), both could eventually * be folded together at some point. */ -static void mark_map_regs(struct bpf_verifier_state *vstate, u32 regno, - bool is_null) +static void mark_ptr_or_null_regs(struct bpf_verifier_state *vstate, u32 regno, + bool is_null) { struct bpf_func_state *state = vstate->frame[vstate->curframe]; - struct bpf_reg_state *regs = state->regs; + struct bpf_reg_state *reg, *regs = state->regs; u32 id = regs[regno].id; int i, j; + if (reg_is_refcounted_or_null(®s[regno]) && is_null) + __release_reference_state(state, id); + for (i = 0; i < MAX_BPF_REG; i++) - mark_map_reg(regs, i, id, is_null); + mark_ptr_or_null_reg(state, ®s[i], id, is_null); for (j = 0; j <= vstate->curframe; j++) { state = vstate->frame[j]; - for (i = 0; i < state->allocated_stack / BPF_REG_SIZE; i++) { - if (state->stack[i].slot_type[0] != STACK_SPILL) + bpf_for_each_spilled_reg(i, state, reg) { + if (!reg) continue; - mark_map_reg(&state->stack[i].spilled_ptr, 0, id, is_null); + mark_ptr_or_null_reg(state, reg, id, is_null); } } } @@ -3863,12 +4172,14 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env, /* detect if R == 0 where R is returned from bpf_map_lookup_elem() */ if (BPF_SRC(insn->code) == BPF_K && insn->imm == 0 && (opcode == BPF_JEQ || opcode == BPF_JNE) && - dst_reg->type == PTR_TO_MAP_VALUE_OR_NULL) { - /* Mark all identical map registers in each branch as either + reg_type_may_be_null(dst_reg->type)) { + /* Mark all identical registers in each branch as either * safe or unknown depending R == 0 or R != 0 conditional. */ - mark_map_regs(this_branch, insn->dst_reg, opcode == BPF_JNE); - mark_map_regs(other_branch, insn->dst_reg, opcode == BPF_JEQ); + mark_ptr_or_null_regs(this_branch, insn->dst_reg, + opcode == BPF_JNE); + mark_ptr_or_null_regs(other_branch, insn->dst_reg, + opcode == BPF_JEQ); } else if (!try_match_pkt_pointers(insn, dst_reg, ®s[insn->src_reg], this_branch, other_branch) && is_pointer_value(env, insn->dst_reg)) { @@ -3991,6 +4302,16 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn) if (err) return err; + /* Disallow usage of BPF_LD_[ABS|IND] with reference tracking, as + * gen_ld_abs() may terminate the program at runtime, leading to + * reference leak. + */ + err = check_reference_leak(env); + if (err) { + verbose(env, "BPF_LD_[ABS|IND] cannot be mixed with socket references\n"); + return err; + } + if (regs[BPF_REG_6].type != PTR_TO_CTX) { verbose(env, "at the time of BPF_LD_ABS|IND R6 != pointer to skb\n"); @@ -4406,6 +4727,8 @@ static bool regsafe(struct bpf_reg_state *rold, struct bpf_reg_state *rcur, case CONST_PTR_TO_MAP: case PTR_TO_PACKET_END: case PTR_TO_FLOW_KEYS: + case PTR_TO_SOCKET: + case PTR_TO_SOCKET_OR_NULL: /* Only valid matches are exact, which memcmp() above * would have accepted */ @@ -4481,6 +4804,14 @@ static bool stacksafe(struct bpf_func_state *old, return true; } +static bool refsafe(struct bpf_func_state *old, struct bpf_func_state *cur) +{ + if (old->acquired_refs != cur->acquired_refs) + return false; + return !memcmp(old->refs, cur->refs, + sizeof(*old->refs) * old->acquired_refs); +} + /* compare two verifier states * * all states stored in state_list are known to be valid, since @@ -4526,6 +4857,9 @@ static bool func_states_equal(struct bpf_func_state *old, if (!stacksafe(old, cur, idmap)) goto out_free; + + if (!refsafe(old, cur)) + goto out_free; ret = true; out_free: kfree(idmap); @@ -4683,6 +5017,37 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx) return 0; } +/* Return true if it's OK to have the same insn return a different type. */ +static bool reg_type_mismatch_ok(enum bpf_reg_type type) +{ + switch (type) { + case PTR_TO_CTX: + case PTR_TO_SOCKET: + case PTR_TO_SOCKET_OR_NULL: + return false; + default: + return true; + } +} + +/* If an instruction was previously used with particular pointer types, then we + * need to be careful to avoid cases such as the below, where it may be ok + * for one branch accessing the pointer, but not ok for the other branch: + * + * R1 = sock_ptr + * goto X; + * ... + * R1 = some_other_valid_ptr; + * goto X; + * ... + * R2 = *(u32 *)(R1 + 0); + */ +static bool reg_type_mismatch(enum bpf_reg_type src, enum bpf_reg_type prev) +{ + return src != prev && (!reg_type_mismatch_ok(src) || + !reg_type_mismatch_ok(prev)); +} + static int do_check(struct bpf_verifier_env *env) { struct bpf_verifier_state *state; @@ -4776,6 +5141,7 @@ static int do_check(struct bpf_verifier_env *env) regs = cur_regs(env); env->insn_aux_data[insn_idx].seen = true; + if (class == BPF_ALU || class == BPF_ALU64) { err = check_alu_op(env, insn); if (err) @@ -4815,9 +5181,7 @@ static int do_check(struct bpf_verifier_env *env) */ *prev_src_type = src_reg_type; - } else if (src_reg_type != *prev_src_type && - (src_reg_type == PTR_TO_CTX || - *prev_src_type == PTR_TO_CTX)) { + } else if (reg_type_mismatch(src_reg_type, *prev_src_type)) { /* ABuser program is trying to use the same insn * dst_reg = *(u32*) (src_reg + off) * with different pointer types: @@ -4862,9 +5226,7 @@ static int do_check(struct bpf_verifier_env *env) if (*prev_dst_type == NOT_INIT) { *prev_dst_type = dst_reg_type; - } else if (dst_reg_type != *prev_dst_type && - (dst_reg_type == PTR_TO_CTX || - *prev_dst_type == PTR_TO_CTX)) { + } else if (reg_type_mismatch(dst_reg_type, *prev_dst_type)) { verbose(env, "same insn cannot be used with different pointers\n"); return -EINVAL; } @@ -4881,8 +5243,8 @@ static int do_check(struct bpf_verifier_env *env) return err; if (is_ctx_reg(env, insn->dst_reg)) { - verbose(env, "BPF_ST stores into R%d context is not allowed\n", - insn->dst_reg); + verbose(env, "BPF_ST stores into R%d %s is not allowed\n", + insn->dst_reg, reg_type_str[insn->dst_reg]); return -EACCES; } @@ -4944,6 +5306,10 @@ static int do_check(struct bpf_verifier_env *env) continue; } + err = check_reference_leak(env); + if (err) + return err; + /* eBPF calling convetion is such that R0 is used * to return the value from eBPF program. * Make sure that it's readable at this time @@ -5057,6 +5423,12 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, return 0; } +static bool bpf_map_is_cgroup_storage(struct bpf_map *map) +{ + return (map->map_type == BPF_MAP_TYPE_CGROUP_STORAGE || + map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE); +} + /* look for pseudo eBPF instructions that access map FDs and * replace them with actual map pointers */ @@ -5147,10 +5519,9 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env) } env->used_maps[env->used_map_cnt++] = map; - if (map->map_type == BPF_MAP_TYPE_CGROUP_STORAGE && + if (bpf_map_is_cgroup_storage(map) && bpf_cgroup_storage_assign(env->prog, map)) { - verbose(env, - "only one cgroup storage is allowed\n"); + verbose(env, "only one cgroup storage of each type is allowed\n"); fdput(f); return -EBUSY; } @@ -5179,11 +5550,15 @@ next_insn: /* drop refcnt of maps used by the rejected program */ static void release_maps(struct bpf_verifier_env *env) { + enum bpf_cgroup_storage_type stype; int i; - if (env->prog->aux->cgroup_storage) + for_each_cgroup_storage_type(stype) { + if (!env->prog->aux->cgroup_storage[stype]) + continue; bpf_cgroup_storage_release(env->prog, - env->prog->aux->cgroup_storage); + env->prog->aux->cgroup_storage[stype]); + } for (i = 0; i < env->used_map_cnt; i++) bpf_map_put(env->used_maps[i]); @@ -5281,8 +5656,10 @@ static void sanitize_dead_code(struct bpf_verifier_env *env) } } -/* convert load instructions that access fields of 'struct __sk_buff' - * into sequence of instructions that access fields of 'struct sk_buff' +/* convert load instructions that access fields of a context type into a + * sequence of instructions that access fields of the underlying structure: + * struct __sk_buff -> struct sk_buff + * struct bpf_sock_ops -> struct sock */ static int convert_ctx_accesses(struct bpf_verifier_env *env) { @@ -5311,12 +5688,14 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) } } - if (!ops->convert_ctx_access || bpf_prog_is_dev_bound(env->prog->aux)) + if (bpf_prog_is_dev_bound(env->prog->aux)) return 0; insn = env->prog->insnsi + delta; for (i = 0; i < insn_cnt; i++, insn++) { + bpf_convert_ctx_access_t convert_ctx_access; + if (insn->code == (BPF_LDX | BPF_MEM | BPF_B) || insn->code == (BPF_LDX | BPF_MEM | BPF_H) || insn->code == (BPF_LDX | BPF_MEM | BPF_W) || @@ -5358,8 +5737,18 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) continue; } - if (env->insn_aux_data[i + delta].ptr_type != PTR_TO_CTX) + switch (env->insn_aux_data[i + delta].ptr_type) { + case PTR_TO_CTX: + if (!ops->convert_ctx_access) + continue; + convert_ctx_access = ops->convert_ctx_access; + break; + case PTR_TO_SOCKET: + convert_ctx_access = bpf_sock_convert_ctx_access; + break; + default: continue; + } ctx_field_size = env->insn_aux_data[i + delta].ctx_field_size; size = BPF_LDST_BYTES(insn); @@ -5391,8 +5780,8 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) } target_size = 0; - cnt = ops->convert_ctx_access(type, insn, insn_buf, env->prog, - &target_size); + cnt = convert_ctx_access(type, insn, insn_buf, env->prog, + &target_size); if (cnt == 0 || cnt >= ARRAY_SIZE(insn_buf) || (ctx_field_size && !target_size)) { verbose(env, "bpf verifier is misconfigured\n"); @@ -5583,10 +5972,10 @@ static int fixup_call_args(struct bpf_verifier_env *env) struct bpf_insn *insn = prog->insnsi; int i, depth; #endif - int err; + int err = 0; - err = 0; - if (env->prog->jit_requested) { + if (env->prog->jit_requested && + !bpf_prog_is_dev_bound(env->prog->aux)) { err = jit_subprogs(env); if (err == 0) return 0; @@ -5924,6 +6313,9 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr) env->cur_state = NULL; } + if (ret == 0 && bpf_prog_is_dev_bound(env->prog->aux)) + ret = bpf_prog_offload_finalize(env); + skip_full_check: while (!pop_stack(env, NULL, NULL)); free_states(env); diff --git a/lib/test_bpf.c b/lib/test_bpf.c index 08d3d59dca17..aa22bcaec1dc 100644 --- a/lib/test_bpf.c +++ b/lib/test_bpf.c @@ -6494,6 +6494,7 @@ static struct sk_buff *populate_skb(char *buf, int size) skb->queue_mapping = SKB_QUEUE_MAP; skb->vlan_tci = SKB_VLAN_TCI; skb->vlan_proto = htons(ETH_P_IP); + dev_net_set(&dev, &init_net); skb->dev = &dev; skb->dev->ifindex = SKB_DEV_IFINDEX; skb->dev->type = SKB_DEV_TYPE; diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index f4078830ea50..0c423b8cd75c 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -12,7 +12,7 @@ #include <linux/sched/signal.h> static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void *ctx, - struct bpf_cgroup_storage *storage) + struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE]) { u32 ret; @@ -28,13 +28,20 @@ static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void *ctx, static u32 bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 *time) { - struct bpf_cgroup_storage *storage = NULL; + struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = { 0 }; + enum bpf_cgroup_storage_type stype; u64 time_start, time_spent = 0; u32 ret = 0, i; - storage = bpf_cgroup_storage_alloc(prog); - if (IS_ERR(storage)) - return PTR_ERR(storage); + for_each_cgroup_storage_type(stype) { + storage[stype] = bpf_cgroup_storage_alloc(prog, stype); + if (IS_ERR(storage[stype])) { + storage[stype] = NULL; + for_each_cgroup_storage_type(stype) + bpf_cgroup_storage_free(storage[stype]); + return -ENOMEM; + } + } if (!repeat) repeat = 1; @@ -53,7 +60,8 @@ static u32 bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 *time) do_div(time_spent, repeat); *time = time_spent > U32_MAX ? U32_MAX : (u32)time_spent; - bpf_cgroup_storage_free(storage); + for_each_cgroup_storage_type(stype) + bpf_cgroup_storage_free(storage[stype]); return ret; } diff --git a/net/core/ethtool.c b/net/core/ethtool.c index 3144ef2bf136..4cc603dfc9ef 100644 --- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -27,6 +27,7 @@ #include <linux/rtnetlink.h> #include <linux/sched/signal.h> #include <linux/net.h> +#include <net/xdp_sock.h> /* * Some useful ethtool_ops methods that're device independent. @@ -1662,8 +1663,10 @@ static noinline_for_stack int ethtool_get_channels(struct net_device *dev, static noinline_for_stack int ethtool_set_channels(struct net_device *dev, void __user *useraddr) { - struct ethtool_channels channels, max = { .cmd = ETHTOOL_GCHANNELS }; + struct ethtool_channels channels, curr = { .cmd = ETHTOOL_GCHANNELS }; + u16 from_channel, to_channel; u32 max_rx_in_use = 0; + unsigned int i; if (!dev->ethtool_ops->set_channels || !dev->ethtool_ops->get_channels) return -EOPNOTSUPP; @@ -1671,13 +1674,13 @@ static noinline_for_stack int ethtool_set_channels(struct net_device *dev, if (copy_from_user(&channels, useraddr, sizeof(channels))) return -EFAULT; - dev->ethtool_ops->get_channels(dev, &max); + dev->ethtool_ops->get_channels(dev, &curr); /* ensure new counts are within the maximums */ - if ((channels.rx_count > max.max_rx) || - (channels.tx_count > max.max_tx) || - (channels.combined_count > max.max_combined) || - (channels.other_count > max.max_other)) + if (channels.rx_count > curr.max_rx || + channels.tx_count > curr.max_tx || + channels.combined_count > curr.max_combined || + channels.other_count > curr.max_other) return -EINVAL; /* ensure the new Rx count fits within the configured Rx flow @@ -1687,6 +1690,14 @@ static noinline_for_stack int ethtool_set_channels(struct net_device *dev, (channels.combined_count + channels.rx_count) <= max_rx_in_use) return -EINVAL; + /* Disabling channels, query zero-copy AF_XDP sockets */ + from_channel = channels.combined_count + + min(channels.rx_count, channels.tx_count); + to_channel = curr.combined_count + max(curr.rx_count, curr.tx_count); + for (i = from_channel; i < to_channel; i++) + if (xdp_get_umem_from_qid(dev, i)) + return -EINVAL; + return dev->ethtool_ops->set_channels(dev, &channels); } diff --git a/net/core/filter.c b/net/core/filter.c index 72db8afb7cb6..4bbc6567fcb8 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -58,13 +58,17 @@ #include <net/busy_poll.h> #include <net/tcp.h> #include <net/xfrm.h> +#include <net/udp.h> #include <linux/bpf_trace.h> #include <net/xdp_sock.h> #include <linux/inetdevice.h> +#include <net/inet_hashtables.h> +#include <net/inet6_hashtables.h> #include <net/ip_fib.h> #include <net/flow.h> #include <net/arp.h> #include <net/ipv6.h> +#include <net/net_namespace.h> #include <linux/seg6_local.h> #include <net/seg6.h> #include <net/seg6_local.h> @@ -4813,6 +4817,143 @@ static const struct bpf_func_proto bpf_lwt_seg6_adjust_srh_proto = { }; #endif /* CONFIG_IPV6_SEG6_BPF */ +#ifdef CONFIG_INET +static struct sock *sk_lookup(struct net *net, struct bpf_sock_tuple *tuple, + struct sk_buff *skb, u8 family, u8 proto) +{ + int dif = skb->dev->ifindex; + bool refcounted = false; + struct sock *sk = NULL; + + if (family == AF_INET) { + __be32 src4 = tuple->ipv4.saddr; + __be32 dst4 = tuple->ipv4.daddr; + int sdif = inet_sdif(skb); + + if (proto == IPPROTO_TCP) + sk = __inet_lookup(net, &tcp_hashinfo, skb, 0, + src4, tuple->ipv4.sport, + dst4, tuple->ipv4.dport, + dif, sdif, &refcounted); + else + sk = __udp4_lib_lookup(net, src4, tuple->ipv4.sport, + dst4, tuple->ipv4.dport, + dif, sdif, &udp_table, skb); +#if IS_REACHABLE(CONFIG_IPV6) + } else { + struct in6_addr *src6 = (struct in6_addr *)&tuple->ipv6.saddr; + struct in6_addr *dst6 = (struct in6_addr *)&tuple->ipv6.daddr; + int sdif = inet6_sdif(skb); + + if (proto == IPPROTO_TCP) + sk = __inet6_lookup(net, &tcp_hashinfo, skb, 0, + src6, tuple->ipv6.sport, + dst6, tuple->ipv6.dport, + dif, sdif, &refcounted); + else + sk = __udp6_lib_lookup(net, src6, tuple->ipv6.sport, + dst6, tuple->ipv6.dport, + dif, sdif, &udp_table, skb); +#endif + } + + if (unlikely(sk && !refcounted && !sock_flag(sk, SOCK_RCU_FREE))) { + WARN_ONCE(1, "Found non-RCU, unreferenced socket!"); + sk = NULL; + } + return sk; +} + +/* bpf_sk_lookup performs the core lookup for different types of sockets, + * taking a reference on the socket if it doesn't have the flag SOCK_RCU_FREE. + * Returns the socket as an 'unsigned long' to simplify the casting in the + * callers to satisfy BPF_CALL declarations. + */ +static unsigned long +bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len, + u8 proto, u64 netns_id, u64 flags) +{ + struct net *caller_net; + struct sock *sk = NULL; + u8 family = AF_UNSPEC; + struct net *net; + + family = len == sizeof(tuple->ipv4) ? AF_INET : AF_INET6; + if (unlikely(family == AF_UNSPEC || netns_id > U32_MAX || flags)) + goto out; + + if (skb->dev) + caller_net = dev_net(skb->dev); + else + caller_net = sock_net(skb->sk); + if (netns_id) { + net = get_net_ns_by_id(caller_net, netns_id); + if (unlikely(!net)) + goto out; + sk = sk_lookup(net, tuple, skb, family, proto); + put_net(net); + } else { + net = caller_net; + sk = sk_lookup(net, tuple, skb, family, proto); + } + + if (sk) + sk = sk_to_full_sk(sk); +out: + return (unsigned long) sk; +} + +BPF_CALL_5(bpf_sk_lookup_tcp, struct sk_buff *, skb, + struct bpf_sock_tuple *, tuple, u32, len, u64, netns_id, u64, flags) +{ + return bpf_sk_lookup(skb, tuple, len, IPPROTO_TCP, netns_id, flags); +} + +static const struct bpf_func_proto bpf_sk_lookup_tcp_proto = { + .func = bpf_sk_lookup_tcp, + .gpl_only = false, + .pkt_access = true, + .ret_type = RET_PTR_TO_SOCKET_OR_NULL, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_PTR_TO_MEM, + .arg3_type = ARG_CONST_SIZE, + .arg4_type = ARG_ANYTHING, + .arg5_type = ARG_ANYTHING, +}; + +BPF_CALL_5(bpf_sk_lookup_udp, struct sk_buff *, skb, + struct bpf_sock_tuple *, tuple, u32, len, u64, netns_id, u64, flags) +{ + return bpf_sk_lookup(skb, tuple, len, IPPROTO_UDP, netns_id, flags); +} + +static const struct bpf_func_proto bpf_sk_lookup_udp_proto = { + .func = bpf_sk_lookup_udp, + .gpl_only = false, + .pkt_access = true, + .ret_type = RET_PTR_TO_SOCKET_OR_NULL, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_PTR_TO_MEM, + .arg3_type = ARG_CONST_SIZE, + .arg4_type = ARG_ANYTHING, + .arg5_type = ARG_ANYTHING, +}; + +BPF_CALL_1(bpf_sk_release, struct sock *, sk) +{ + if (!sock_flag(sk, SOCK_RCU_FREE)) + sock_gen_put(sk); + return 0; +} + +static const struct bpf_func_proto bpf_sk_release_proto = { + .func = bpf_sk_release, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_SOCKET, +}; +#endif /* CONFIG_INET */ + bool bpf_helper_changes_pkt_data(void *func) { if (func == bpf_skb_vlan_push || @@ -5019,6 +5160,14 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) case BPF_FUNC_skb_ancestor_cgroup_id: return &bpf_skb_ancestor_cgroup_id_proto; #endif +#ifdef CONFIG_INET + case BPF_FUNC_sk_lookup_tcp: + return &bpf_sk_lookup_tcp_proto; + case BPF_FUNC_sk_lookup_udp: + return &bpf_sk_lookup_udp_proto; + case BPF_FUNC_sk_release: + return &bpf_sk_release_proto; +#endif default: return bpf_base_func_proto(func_id); } @@ -5119,6 +5268,14 @@ sk_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_sk_redirect_hash_proto; case BPF_FUNC_get_local_storage: return &bpf_get_local_storage_proto; +#ifdef CONFIG_INET + case BPF_FUNC_sk_lookup_tcp: + return &bpf_sk_lookup_tcp_proto; + case BPF_FUNC_sk_lookup_udp: + return &bpf_sk_lookup_udp_proto; + case BPF_FUNC_sk_release: + return &bpf_sk_release_proto; +#endif default: return bpf_base_func_proto(func_id); } @@ -5394,23 +5551,29 @@ static bool __sock_filter_check_size(int off, int size, return size == size_default; } -static bool sock_filter_is_valid_access(int off, int size, - enum bpf_access_type type, - const struct bpf_prog *prog, - struct bpf_insn_access_aux *info) +bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type, + struct bpf_insn_access_aux *info) { if (off < 0 || off >= sizeof(struct bpf_sock)) return false; if (off % size != 0) return false; - if (!__sock_filter_check_attach_type(off, type, - prog->expected_attach_type)) - return false; if (!__sock_filter_check_size(off, size, info)) return false; return true; } +static bool sock_filter_is_valid_access(int off, int size, + enum bpf_access_type type, + const struct bpf_prog *prog, + struct bpf_insn_access_aux *info) +{ + if (!bpf_sock_is_valid_access(off, size, type, info)) + return false; + return __sock_filter_check_attach_type(off, type, + prog->expected_attach_type); +} + static int bpf_unclone_prologue(struct bpf_insn *insn_buf, bool direct_write, const struct bpf_prog *prog, int drop_verdict) { @@ -6122,10 +6285,10 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type, return insn - insn_buf; } -static u32 sock_filter_convert_ctx_access(enum bpf_access_type type, - const struct bpf_insn *si, - struct bpf_insn *insn_buf, - struct bpf_prog *prog, u32 *target_size) +u32 bpf_sock_convert_ctx_access(enum bpf_access_type type, + const struct bpf_insn *si, + struct bpf_insn *insn_buf, + struct bpf_prog *prog, u32 *target_size) { struct bpf_insn *insn = insn_buf; int off; @@ -7037,7 +7200,7 @@ const struct bpf_prog_ops lwt_seg6local_prog_ops = { const struct bpf_verifier_ops cg_sock_verifier_ops = { .get_func_proto = sock_filter_func_proto, .is_valid_access = sock_filter_is_valid_access, - .convert_ctx_access = sock_filter_convert_ctx_access, + .convert_ctx_access = bpf_sock_convert_ctx_access, }; const struct bpf_prog_ops cg_sock_prog_ops = { diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index 555427b3e0fe..a264cf2accd0 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -32,37 +32,49 @@ void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs) { unsigned long flags; - if (xs->dev) { - spin_lock_irqsave(&umem->xsk_list_lock, flags); - list_del_rcu(&xs->list); - spin_unlock_irqrestore(&umem->xsk_list_lock, flags); - - if (umem->zc) - synchronize_net(); - } + spin_lock_irqsave(&umem->xsk_list_lock, flags); + list_del_rcu(&xs->list); + spin_unlock_irqrestore(&umem->xsk_list_lock, flags); } -int xdp_umem_query(struct net_device *dev, u16 queue_id) +/* The umem is stored both in the _rx struct and the _tx struct as we do + * not know if the device has more tx queues than rx, or the opposite. + * This might also change during run time. + */ +static void xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem *umem, + u16 queue_id) { - struct netdev_bpf bpf; + if (queue_id < dev->real_num_rx_queues) + dev->_rx[queue_id].umem = umem; + if (queue_id < dev->real_num_tx_queues) + dev->_tx[queue_id].umem = umem; +} - ASSERT_RTNL(); +struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev, + u16 queue_id) +{ + if (queue_id < dev->real_num_rx_queues) + return dev->_rx[queue_id].umem; + if (queue_id < dev->real_num_tx_queues) + return dev->_tx[queue_id].umem; - memset(&bpf, 0, sizeof(bpf)); - bpf.command = XDP_QUERY_XSK_UMEM; - bpf.xsk.queue_id = queue_id; + return NULL; +} - if (!dev->netdev_ops->ndo_bpf) - return 0; - return dev->netdev_ops->ndo_bpf(dev, &bpf) ?: !!bpf.xsk.umem; +static void xdp_clear_umem_at_qid(struct net_device *dev, u16 queue_id) +{ + if (queue_id < dev->real_num_rx_queues) + dev->_rx[queue_id].umem = NULL; + if (queue_id < dev->real_num_tx_queues) + dev->_tx[queue_id].umem = NULL; } int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, - u32 queue_id, u16 flags) + u16 queue_id, u16 flags) { bool force_zc, force_copy; struct netdev_bpf bpf; - int err; + int err = 0; force_zc = flags & XDP_ZEROCOPY; force_copy = flags & XDP_COPY; @@ -70,17 +82,23 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, if (force_zc && force_copy) return -EINVAL; - if (force_copy) - return 0; + rtnl_lock(); + if (xdp_get_umem_from_qid(dev, queue_id)) { + err = -EBUSY; + goto out_rtnl_unlock; + } - if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_async_xmit) - return force_zc ? -EOPNOTSUPP : 0; /* fail or fallback */ + xdp_reg_umem_at_qid(dev, umem, queue_id); + umem->dev = dev; + umem->queue_id = queue_id; + if (force_copy) + /* For copy-mode, we are done. */ + goto out_rtnl_unlock; - rtnl_lock(); - err = xdp_umem_query(dev, queue_id); - if (err) { - err = err < 0 ? -EOPNOTSUPP : -EBUSY; - goto err_rtnl_unlock; + if (!dev->netdev_ops->ndo_bpf || + !dev->netdev_ops->ndo_xsk_async_xmit) { + err = -EOPNOTSUPP; + goto err_unreg_umem; } bpf.command = XDP_SETUP_XSK_UMEM; @@ -89,18 +107,20 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, err = dev->netdev_ops->ndo_bpf(dev, &bpf); if (err) - goto err_rtnl_unlock; + goto err_unreg_umem; rtnl_unlock(); dev_hold(dev); - umem->dev = dev; - umem->queue_id = queue_id; umem->zc = true; return 0; -err_rtnl_unlock: +err_unreg_umem: + xdp_clear_umem_at_qid(dev, queue_id); + if (!force_zc) + err = 0; /* fallback to copy mode */ +out_rtnl_unlock: rtnl_unlock(); - return force_zc ? err : 0; /* fail or fallback */ + return err; } static void xdp_umem_clear_dev(struct xdp_umem *umem) @@ -108,7 +128,7 @@ static void xdp_umem_clear_dev(struct xdp_umem *umem) struct netdev_bpf bpf; int err; - if (umem->dev) { + if (umem->zc) { bpf.command = XDP_SETUP_XSK_UMEM; bpf.xsk.umem = NULL; bpf.xsk.queue_id = umem->queue_id; @@ -119,9 +139,17 @@ static void xdp_umem_clear_dev(struct xdp_umem *umem) if (err) WARN(1, "failed to disable umem!\n"); + } + + if (umem->dev) { + rtnl_lock(); + xdp_clear_umem_at_qid(umem->dev, umem->queue_id); + rtnl_unlock(); + } + if (umem->zc) { dev_put(umem->dev); - umem->dev = NULL; + umem->zc = false; } } diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h index c8be1ad3eb88..27603227601b 100644 --- a/net/xdp/xdp_umem.h +++ b/net/xdp/xdp_umem.h @@ -9,7 +9,7 @@ #include <net/xdp_sock.h> int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, - u32 queue_id, u16 flags); + u16 queue_id, u16 flags); bool xdp_umem_validate_queues(struct xdp_umem *umem); void xdp_get_umem(struct xdp_umem *umem); void xdp_put_umem(struct xdp_umem *umem); diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 5a432dfee4ee..0577cd49aa72 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -355,12 +355,18 @@ static int xsk_release(struct socket *sock) local_bh_enable(); if (xs->dev) { + struct net_device *dev = xs->dev; + /* Wait for driver to stop using the xdp socket. */ - synchronize_net(); - dev_put(xs->dev); + xdp_del_sk_umem(xs->umem, xs); xs->dev = NULL; + synchronize_net(); + dev_put(dev); } + xskq_destroy(xs->rx); + xskq_destroy(xs->tx); + sock_orphan(sk); sock->sk = NULL; @@ -419,13 +425,6 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) } qid = sxdp->sxdp_queue_id; - - if ((xs->rx && qid >= dev->real_num_rx_queues) || - (xs->tx && qid >= dev->real_num_tx_queues)) { - err = -EINVAL; - goto out_unlock; - } - flags = sxdp->sxdp_flags; if (flags & XDP_SHARED_UMEM) { @@ -721,9 +720,6 @@ static void xsk_destruct(struct sock *sk) if (!sock_flag(sk, SOCK_DEAD)) return; - xskq_destroy(xs->rx); - xskq_destroy(xs->tx); - xdp_del_sk_umem(xs->umem, xs); xdp_put_umem(xs->umem); sk_refcnt_debug_dec(sk); diff --git a/samples/bpf/test_cgrp2_attach2.c b/samples/bpf/test_cgrp2_attach2.c index 180f9d813bca..d7b68ef5ba79 100644 --- a/samples/bpf/test_cgrp2_attach2.c +++ b/samples/bpf/test_cgrp2_attach2.c @@ -209,7 +209,7 @@ static int map_fd = -1; static int prog_load_cnt(int verdict, int val) { - int cgroup_storage_fd; + int cgroup_storage_fd, percpu_cgroup_storage_fd; if (map_fd < 0) map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, 4, 8, 1, 0); @@ -225,6 +225,14 @@ static int prog_load_cnt(int verdict, int val) return -1; } + percpu_cgroup_storage_fd = bpf_create_map( + BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE, + sizeof(struct bpf_cgroup_storage_key), 8, 0, 0); + if (percpu_cgroup_storage_fd < 0) { + printf("failed to create map '%s'\n", strerror(errno)); + return -1; + } + struct bpf_insn prog[] = { BPF_MOV32_IMM(BPF_REG_0, 0), BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4), /* *(u32 *)(fp - 4) = r0 */ @@ -235,11 +243,20 @@ static int prog_load_cnt(int verdict, int val) BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), BPF_MOV64_IMM(BPF_REG_1, val), /* r1 = 1 */ BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* xadd r0 += r1 */ + BPF_LD_MAP_FD(BPF_REG_1, cgroup_storage_fd), BPF_MOV64_IMM(BPF_REG_2, 0), BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_local_storage), BPF_MOV64_IMM(BPF_REG_1, val), BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_W, BPF_REG_0, BPF_REG_1, 0, 0), + + BPF_LD_MAP_FD(BPF_REG_1, percpu_cgroup_storage_fd), + BPF_MOV64_IMM(BPF_REG_2, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_local_storage), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_0, 0), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, 0x1), + BPF_STX_MEM(BPF_W, BPF_REG_0, BPF_REG_3, 0), + BPF_MOV64_IMM(BPF_REG_0, verdict), /* r0 = verdict */ BPF_EXIT_INSN(), }; diff --git a/samples/bpf/tracex3_user.c b/samples/bpf/tracex3_user.c index 6c6b10f4c3ee..56466d010139 100644 --- a/samples/bpf/tracex3_user.c +++ b/samples/bpf/tracex3_user.c @@ -17,8 +17,6 @@ #include "bpf_load.h" #include "bpf_util.h" -#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x))) - #define SLOTS 100 static void clear_stats(int fd) diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c index e22fbe8b975f..6003e9598973 100644 --- a/tools/bpf/bpftool/map.c +++ b/tools/bpf/bpftool/map.c @@ -72,13 +72,15 @@ static const char * const map_type_name[] = { [BPF_MAP_TYPE_SOCKHASH] = "sockhash", [BPF_MAP_TYPE_CGROUP_STORAGE] = "cgroup_storage", [BPF_MAP_TYPE_REUSEPORT_SOCKARRAY] = "reuseport_sockarray", + [BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE] = "percpu_cgroup_storage", }; static bool map_is_per_cpu(__u32 type) { return type == BPF_MAP_TYPE_PERCPU_HASH || type == BPF_MAP_TYPE_PERCPU_ARRAY || - type == BPF_MAP_TYPE_LRU_PERCPU_HASH; + type == BPF_MAP_TYPE_LRU_PERCPU_HASH || + type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE; } static bool map_is_map_of_maps(__u32 type) diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c index ed205ee57655..d441bb7035ca 100644 --- a/tools/bpf/bpftool/net.c +++ b/tools/bpf/bpftool/net.c @@ -69,7 +69,9 @@ static int dump_link_nlmsg(void *cookie, void *msg, struct nlattr **tb) snprintf(netinfo->devices[netinfo->used_len].devname, sizeof(netinfo->devices[netinfo->used_len].devname), "%s", - tb[IFLA_IFNAME] ? nla_getattr_str(tb[IFLA_IFNAME]) : ""); + tb[IFLA_IFNAME] + ? libbpf_nla_getattr_str(tb[IFLA_IFNAME]) + : ""); netinfo->used_len++; return do_xdp_dump(ifinfo, tb); @@ -83,7 +85,7 @@ static int dump_class_qdisc_nlmsg(void *cookie, void *msg, struct nlattr **tb) if (tcinfo->is_qdisc) { /* skip clsact qdisc */ if (tb[TCA_KIND] && - strcmp(nla_data(tb[TCA_KIND]), "clsact") == 0) + strcmp(libbpf_nla_data(tb[TCA_KIND]), "clsact") == 0) return 0; if (info->tcm_handle == 0) return 0; @@ -101,7 +103,9 @@ static int dump_class_qdisc_nlmsg(void *cookie, void *msg, struct nlattr **tb) snprintf(tcinfo->handle_array[tcinfo->used_len].kind, sizeof(tcinfo->handle_array[tcinfo->used_len].kind), "%s", - tb[TCA_KIND] ? nla_getattr_str(tb[TCA_KIND]) : "unknown"); + tb[TCA_KIND] + ? libbpf_nla_getattr_str(tb[TCA_KIND]) + : "unknown"); tcinfo->used_len++; return 0; @@ -127,14 +131,14 @@ static int show_dev_tc_bpf(int sock, unsigned int nl_pid, tcinfo.array_len = 0; tcinfo.is_qdisc = false; - ret = nl_get_class(sock, nl_pid, dev->ifindex, dump_class_qdisc_nlmsg, - &tcinfo); + ret = libbpf_nl_get_class(sock, nl_pid, dev->ifindex, + dump_class_qdisc_nlmsg, &tcinfo); if (ret) goto out; tcinfo.is_qdisc = true; - ret = nl_get_qdisc(sock, nl_pid, dev->ifindex, dump_class_qdisc_nlmsg, - &tcinfo); + ret = libbpf_nl_get_qdisc(sock, nl_pid, dev->ifindex, + dump_class_qdisc_nlmsg, &tcinfo); if (ret) goto out; @@ -142,10 +146,9 @@ static int show_dev_tc_bpf(int sock, unsigned int nl_pid, filter_info.ifindex = dev->ifindex; for (i = 0; i < tcinfo.used_len; i++) { filter_info.kind = tcinfo.handle_array[i].kind; - ret = nl_get_filter(sock, nl_pid, dev->ifindex, - tcinfo.handle_array[i].handle, - dump_filter_nlmsg, - &filter_info); + ret = libbpf_nl_get_filter(sock, nl_pid, dev->ifindex, + tcinfo.handle_array[i].handle, + dump_filter_nlmsg, &filter_info); if (ret) goto out; } @@ -153,22 +156,22 @@ static int show_dev_tc_bpf(int sock, unsigned int nl_pid, /* root, ingress and egress handle */ handle = TC_H_ROOT; filter_info.kind = "root"; - ret = nl_get_filter(sock, nl_pid, dev->ifindex, handle, - dump_filter_nlmsg, &filter_info); + ret = libbpf_nl_get_filter(sock, nl_pid, dev->ifindex, handle, + dump_filter_nlmsg, &filter_info); if (ret) goto out; handle = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS); filter_info.kind = "clsact/ingress"; - ret = nl_get_filter(sock, nl_pid, dev->ifindex, handle, - dump_filter_nlmsg, &filter_info); + ret = libbpf_nl_get_filter(sock, nl_pid, dev->ifindex, handle, + dump_filter_nlmsg, &filter_info); if (ret) goto out; handle = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS); filter_info.kind = "clsact/egress"; - ret = nl_get_filter(sock, nl_pid, dev->ifindex, handle, - dump_filter_nlmsg, &filter_info); + ret = libbpf_nl_get_filter(sock, nl_pid, dev->ifindex, handle, + dump_filter_nlmsg, &filter_info); if (ret) goto out; @@ -196,7 +199,7 @@ static int do_show(int argc, char **argv) usage(); } - sock = bpf_netlink_open(&nl_pid); + sock = libbpf_netlink_open(&nl_pid); if (sock < 0) { fprintf(stderr, "failed to open netlink sock\n"); return -1; @@ -211,7 +214,7 @@ static int do_show(int argc, char **argv) jsonw_start_array(json_wtr); NET_START_OBJECT; NET_START_ARRAY("xdp", "%s:\n"); - ret = nl_get_link(sock, nl_pid, dump_link_nlmsg, &dev_array); + ret = libbpf_nl_get_link(sock, nl_pid, dump_link_nlmsg, &dev_array); NET_END_ARRAY("\n"); if (!ret) { diff --git a/tools/bpf/bpftool/netlink_dumper.c b/tools/bpf/bpftool/netlink_dumper.c index 6f5e9cc6836c..4e9f4531269f 100644 --- a/tools/bpf/bpftool/netlink_dumper.c +++ b/tools/bpf/bpftool/netlink_dumper.c @@ -21,7 +21,7 @@ static void xdp_dump_prog_id(struct nlattr **tb, int attr, if (new_json_object) NET_START_OBJECT NET_DUMP_STR("mode", " %s", mode); - NET_DUMP_UINT("id", " id %u", nla_getattr_u32(tb[attr])) + NET_DUMP_UINT("id", " id %u", libbpf_nla_getattr_u32(tb[attr])) if (new_json_object) NET_END_OBJECT } @@ -32,13 +32,13 @@ static int do_xdp_dump_one(struct nlattr *attr, unsigned int ifindex, struct nlattr *tb[IFLA_XDP_MAX + 1]; unsigned char mode; - if (nla_parse_nested(tb, IFLA_XDP_MAX, attr, NULL) < 0) + if (libbpf_nla_parse_nested(tb, IFLA_XDP_MAX, attr, NULL) < 0) return -1; if (!tb[IFLA_XDP_ATTACHED]) return 0; - mode = nla_getattr_u8(tb[IFLA_XDP_ATTACHED]); + mode = libbpf_nla_getattr_u8(tb[IFLA_XDP_ATTACHED]); if (mode == XDP_ATTACHED_NONE) return 0; @@ -75,14 +75,14 @@ int do_xdp_dump(struct ifinfomsg *ifinfo, struct nlattr **tb) return 0; return do_xdp_dump_one(tb[IFLA_XDP], ifinfo->ifi_index, - nla_getattr_str(tb[IFLA_IFNAME])); + libbpf_nla_getattr_str(tb[IFLA_IFNAME])); } static int do_bpf_dump_one_act(struct nlattr *attr) { struct nlattr *tb[TCA_ACT_BPF_MAX + 1]; - if (nla_parse_nested(tb, TCA_ACT_BPF_MAX, attr, NULL) < 0) + if (libbpf_nla_parse_nested(tb, TCA_ACT_BPF_MAX, attr, NULL) < 0) return -LIBBPF_ERRNO__NLPARSE; if (!tb[TCA_ACT_BPF_PARMS]) @@ -91,10 +91,10 @@ static int do_bpf_dump_one_act(struct nlattr *attr) NET_START_OBJECT_NESTED2; if (tb[TCA_ACT_BPF_NAME]) NET_DUMP_STR("name", "%s", - nla_getattr_str(tb[TCA_ACT_BPF_NAME])); + libbpf_nla_getattr_str(tb[TCA_ACT_BPF_NAME])); if (tb[TCA_ACT_BPF_ID]) NET_DUMP_UINT("id", " id %u", - nla_getattr_u32(tb[TCA_ACT_BPF_ID])); + libbpf_nla_getattr_u32(tb[TCA_ACT_BPF_ID])); NET_END_OBJECT_NESTED; return 0; } @@ -106,10 +106,11 @@ static int do_dump_one_act(struct nlattr *attr) if (!attr) return 0; - if (nla_parse_nested(tb, TCA_ACT_MAX, attr, NULL) < 0) + if (libbpf_nla_parse_nested(tb, TCA_ACT_MAX, attr, NULL) < 0) return -LIBBPF_ERRNO__NLPARSE; - if (tb[TCA_ACT_KIND] && strcmp(nla_data(tb[TCA_ACT_KIND]), "bpf") == 0) + if (tb[TCA_ACT_KIND] && + strcmp(libbpf_nla_data(tb[TCA_ACT_KIND]), "bpf") == 0) return do_bpf_dump_one_act(tb[TCA_ACT_OPTIONS]); return 0; @@ -120,7 +121,7 @@ static int do_bpf_act_dump(struct nlattr *attr) struct nlattr *tb[TCA_ACT_MAX_PRIO + 1]; int act, ret; - if (nla_parse_nested(tb, TCA_ACT_MAX_PRIO, attr, NULL) < 0) + if (libbpf_nla_parse_nested(tb, TCA_ACT_MAX_PRIO, attr, NULL) < 0) return -LIBBPF_ERRNO__NLPARSE; NET_START_ARRAY("act", " %s ["); @@ -139,13 +140,15 @@ static int do_bpf_filter_dump(struct nlattr *attr) struct nlattr *tb[TCA_BPF_MAX + 1]; int ret; - if (nla_parse_nested(tb, TCA_BPF_MAX, attr, NULL) < 0) + if (libbpf_nla_parse_nested(tb, TCA_BPF_MAX, attr, NULL) < 0) return -LIBBPF_ERRNO__NLPARSE; if (tb[TCA_BPF_NAME]) - NET_DUMP_STR("name", " %s", nla_getattr_str(tb[TCA_BPF_NAME])); + NET_DUMP_STR("name", " %s", + libbpf_nla_getattr_str(tb[TCA_BPF_NAME])); if (tb[TCA_BPF_ID]) - NET_DUMP_UINT("id", " id %u", nla_getattr_u32(tb[TCA_BPF_ID])); + NET_DUMP_UINT("id", " id %u", + libbpf_nla_getattr_u32(tb[TCA_BPF_ID])); if (tb[TCA_BPF_ACT]) { ret = do_bpf_act_dump(tb[TCA_BPF_ACT]); if (ret) @@ -160,7 +163,8 @@ int do_filter_dump(struct tcmsg *info, struct nlattr **tb, const char *kind, { int ret = 0; - if (tb[TCA_OPTIONS] && strcmp(nla_data(tb[TCA_KIND]), "bpf") == 0) { + if (tb[TCA_OPTIONS] && + strcmp(libbpf_nla_data(tb[TCA_KIND]), "bpf") == 0) { NET_START_OBJECT; if (devname[0] != '\0') NET_DUMP_STR("devname", "%s", devname); diff --git a/tools/bpf/bpftool/netlink_dumper.h b/tools/bpf/bpftool/netlink_dumper.h index 0788cfbbed0e..e3516b586a34 100644 --- a/tools/bpf/bpftool/netlink_dumper.h +++ b/tools/bpf/bpftool/netlink_dumper.h @@ -16,7 +16,7 @@ jsonw_name(json_wtr, name); \ jsonw_start_object(json_wtr); \ } else { \ - fprintf(stderr, "%s {", name); \ + fprintf(stdout, "%s {", name); \ } \ } @@ -25,7 +25,7 @@ if (json_output) \ jsonw_start_object(json_wtr); \ else \ - fprintf(stderr, "{"); \ + fprintf(stdout, "{"); \ } #define NET_END_OBJECT_NESTED \ @@ -33,7 +33,7 @@ if (json_output) \ jsonw_end_object(json_wtr); \ else \ - fprintf(stderr, "}"); \ + fprintf(stdout, "}"); \ } #define NET_END_OBJECT \ @@ -47,7 +47,7 @@ if (json_output) \ jsonw_end_object(json_wtr); \ else \ - fprintf(stderr, "\n"); \ + fprintf(stdout, "\n"); \ } #define NET_START_ARRAY(name, fmt_str) \ @@ -56,7 +56,7 @@ jsonw_name(json_wtr, name); \ jsonw_start_array(json_wtr); \ } else { \ - fprintf(stderr, fmt_str, name); \ + fprintf(stdout, fmt_str, name); \ } \ } @@ -65,7 +65,7 @@ if (json_output) \ jsonw_end_array(json_wtr); \ else \ - fprintf(stderr, "%s", endstr); \ + fprintf(stdout, "%s", endstr); \ } #define NET_DUMP_UINT(name, fmt_str, val) \ @@ -73,7 +73,7 @@ if (json_output) \ jsonw_uint_field(json_wtr, name, val); \ else \ - fprintf(stderr, fmt_str, val); \ + fprintf(stdout, fmt_str, val); \ } #define NET_DUMP_STR(name, fmt_str, str) \ @@ -81,7 +81,7 @@ if (json_output) \ jsonw_string_field(json_wtr, name, str);\ else \ - fprintf(stderr, fmt_str, str); \ + fprintf(stdout, fmt_str, str); \ } #define NET_DUMP_STR_ONLY(str) \ @@ -89,7 +89,7 @@ if (json_output) \ jsonw_string(json_wtr, str); \ else \ - fprintf(stderr, "%s ", str); \ + fprintf(stdout, "%s ", str); \ } #endif diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index aa5ccd2385ed..f9187b41dff6 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -127,6 +127,7 @@ enum bpf_map_type { BPF_MAP_TYPE_SOCKHASH, BPF_MAP_TYPE_CGROUP_STORAGE, BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, + BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE, }; enum bpf_prog_type { @@ -2143,6 +2144,77 @@ union bpf_attr { * request in the skb. * Return * 0 on success, or a negative error in case of failure. + * + * struct bpf_sock *bpf_sk_lookup_tcp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u32 netns, u64 flags) + * Description + * Look for TCP socket matching *tuple*, optionally in a child + * network namespace *netns*. The return value must be checked, + * and if non-NULL, released via **bpf_sk_release**\ (). + * + * The *ctx* should point to the context of the program, such as + * the skb or socket (depending on the hook in use). This is used + * to determine the base network namespace for the lookup. + * + * *tuple_size* must be one of: + * + * **sizeof**\ (*tuple*\ **->ipv4**) + * Look for an IPv4 socket. + * **sizeof**\ (*tuple*\ **->ipv6**) + * Look for an IPv6 socket. + * + * If the *netns* is zero, then the socket lookup table in the + * netns associated with the *ctx* will be used. For the TC hooks, + * this in the netns of the device in the skb. For socket hooks, + * this in the netns of the socket. If *netns* is non-zero, then + * it specifies the ID of the netns relative to the netns + * associated with the *ctx*. + * + * All values for *flags* are reserved for future usage, and must + * be left at zero. + * + * This helper is available only if the kernel was compiled with + * **CONFIG_NET** configuration option. + * Return + * Pointer to *struct bpf_sock*, or NULL in case of failure. + * + * struct bpf_sock *bpf_sk_lookup_udp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u32 netns, u64 flags) + * Description + * Look for UDP socket matching *tuple*, optionally in a child + * network namespace *netns*. The return value must be checked, + * and if non-NULL, released via **bpf_sk_release**\ (). + * + * The *ctx* should point to the context of the program, such as + * the skb or socket (depending on the hook in use). This is used + * to determine the base network namespace for the lookup. + * + * *tuple_size* must be one of: + * + * **sizeof**\ (*tuple*\ **->ipv4**) + * Look for an IPv4 socket. + * **sizeof**\ (*tuple*\ **->ipv6**) + * Look for an IPv6 socket. + * + * If the *netns* is zero, then the socket lookup table in the + * netns associated with the *ctx* will be used. For the TC hooks, + * this in the netns of the device in the skb. For socket hooks, + * this in the netns of the socket. If *netns* is non-zero, then + * it specifies the ID of the netns relative to the netns + * associated with the *ctx*. + * + * All values for *flags* are reserved for future usage, and must + * be left at zero. + * + * This helper is available only if the kernel was compiled with + * **CONFIG_NET** configuration option. + * Return + * Pointer to *struct bpf_sock*, or NULL in case of failure. + * + * int bpf_sk_release(struct bpf_sock *sk) + * Description + * Release the reference held by *sock*. *sock* must be a non-NULL + * pointer that was returned from bpf_sk_lookup_xxx\ (). + * Return + * 0 on success, or a negative error in case of failure. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -2228,7 +2300,10 @@ union bpf_attr { FN(get_current_cgroup_id), \ FN(get_local_storage), \ FN(sk_select_reuseport), \ - FN(skb_ancestor_cgroup_id), + FN(skb_ancestor_cgroup_id), \ + FN(sk_lookup_tcp), \ + FN(sk_lookup_udp), \ + FN(sk_release), /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function eBPF program intends to call @@ -2398,6 +2473,23 @@ struct bpf_sock { */ }; +struct bpf_sock_tuple { + union { + struct { + __be32 saddr; + __be32 daddr; + __be16 sport; + __be16 dport; + } ipv4; + struct { + __be32 saddr[4]; + __be32 daddr[4]; + __be16 sport; + __be16 dport; + } ipv6; + }; +}; + #define XDP_PACKET_HEADROOM 256 /* User return codes for XDP prog type. diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile index d49902e818b5..6ad27257fd67 100644 --- a/tools/lib/bpf/Makefile +++ b/tools/lib/bpf/Makefile @@ -1,4 +1,4 @@ -# SPDX-License-Identifier: GPL-2.0 +# SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) # Most of this file is copied from tools/lib/traceevent/Makefile BPF_VERSION = 0 diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c index 3878a26a2071..d70a255cb05e 100644 --- a/tools/lib/bpf/bpf.c +++ b/tools/lib/bpf/bpf.c @@ -1,4 +1,4 @@ -// SPDX-License-Identifier: LGPL-2.1 +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) /* * common eBPF ELF operations. diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h index 6f38164b2618..87520a87a75f 100644 --- a/tools/lib/bpf/bpf.h +++ b/tools/lib/bpf/bpf.h @@ -1,4 +1,4 @@ -/* SPDX-License-Identifier: LGPL-2.1 */ +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ /* * common eBPF ELF operations. @@ -20,8 +20,8 @@ * You should have received a copy of the GNU Lesser General Public * License along with this program; if not, see <http://www.gnu.org/licenses> */ -#ifndef __BPF_BPF_H -#define __BPF_BPF_H +#ifndef __LIBBPF_BPF_H +#define __LIBBPF_BPF_H #include <linux/bpf.h> #include <stdbool.h> @@ -111,4 +111,4 @@ int bpf_load_btf(void *btf, __u32 btf_size, char *log_buf, __u32 log_buf_size, int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf, __u32 *buf_len, __u32 *prog_id, __u32 *fd_type, __u64 *probe_offset, __u64 *probe_addr); -#endif +#endif /* __LIBBPF_BPF_H */ diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c index cf94b0770522..449591aa9900 100644 --- a/tools/lib/bpf/btf.c +++ b/tools/lib/bpf/btf.c @@ -1,4 +1,4 @@ -// SPDX-License-Identifier: LGPL-2.1 +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) /* Copyright (c) 2018 Facebook */ #include <stdlib.h> diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h index 4897e0724d4e..6db5462bb2ef 100644 --- a/tools/lib/bpf/btf.h +++ b/tools/lib/bpf/btf.h @@ -1,8 +1,8 @@ -/* SPDX-License-Identifier: LGPL-2.1 */ +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ /* Copyright (c) 2018 Facebook */ -#ifndef __BPF_BTF_H -#define __BPF_BTF_H +#ifndef __LIBBPF_BTF_H +#define __LIBBPF_BTF_H #include <linux/types.h> @@ -23,4 +23,4 @@ int btf__resolve_type(const struct btf *btf, __u32 type_id); int btf__fd(const struct btf *btf); const char *btf__name_by_offset(const struct btf *btf, __u32 offset); -#endif +#endif /* __LIBBPF_BTF_H */ diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 4f8d43ae20d2..ceb918c14d80 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -1,4 +1,4 @@ -// SPDX-License-Identifier: LGPL-2.1 +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) /* * Common eBPF ELF object loading operations. @@ -7,19 +7,6 @@ * Copyright (C) 2015 Wang Nan <wangnan0@huawei.com> * Copyright (C) 2015 Huawei Inc. * Copyright (C) 2017 Nicira, Inc. - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU Lesser General Public - * License as published by the Free Software Foundation; - * version 2.1 of the License (not later!) - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU Lesser General Public License for more details. - * - * You should have received a copy of the GNU Lesser General Public - * License along with this program; if not, see <http://www.gnu.org/licenses> */ #define _GNU_SOURCE @@ -228,7 +215,7 @@ struct bpf_object { }; #define obj_elf_valid(o) ((o)->efile.elf) -static void bpf_program__unload(struct bpf_program *prog) +void bpf_program__unload(struct bpf_program *prog) { int i; @@ -470,7 +457,8 @@ static int bpf_object__elf_init(struct bpf_object *obj) obj->efile.fd = open(obj->path, O_RDONLY); if (obj->efile.fd < 0) { char errmsg[STRERR_BUFSIZE]; - char *cp = str_error(errno, errmsg, sizeof(errmsg)); + char *cp = libbpf_strerror_r(errno, errmsg, + sizeof(errmsg)); pr_warning("failed to open %s: %s\n", obj->path, cp); return -errno; @@ -811,7 +799,8 @@ static int bpf_object__elf_collect(struct bpf_object *obj) data->d_size, name, idx); if (err) { char errmsg[STRERR_BUFSIZE]; - char *cp = str_error(-err, errmsg, sizeof(errmsg)); + char *cp = libbpf_strerror_r(-err, errmsg, + sizeof(errmsg)); pr_warning("failed to alloc program %s (%s): %s", name, obj->path, cp); @@ -1140,7 +1129,7 @@ bpf_object__create_maps(struct bpf_object *obj) *pfd = bpf_create_map_xattr(&create_attr); if (*pfd < 0 && create_attr.btf_key_type_id) { - cp = str_error(errno, errmsg, sizeof(errmsg)); + cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg)); pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n", map->name, cp, errno); create_attr.btf_fd = 0; @@ -1155,7 +1144,7 @@ bpf_object__create_maps(struct bpf_object *obj) size_t j; err = *pfd; - cp = str_error(errno, errmsg, sizeof(errmsg)); + cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg)); pr_warning("failed to create map (name: '%s'): %s\n", map->name, cp); for (j = 0; j < i; j++) @@ -1339,7 +1328,7 @@ load_program(enum bpf_prog_type type, enum bpf_attach_type expected_attach_type, } ret = -LIBBPF_ERRNO__LOAD; - cp = str_error(errno, errmsg, sizeof(errmsg)); + cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg)); pr_warning("load bpf program failed: %s\n", cp); if (log_buf && log_buf[0] != '\0') { @@ -1375,9 +1364,9 @@ out: return ret; } -static int +int bpf_program__load(struct bpf_program *prog, - char *license, u32 kern_version) + char *license, __u32 kern_version) { int err = 0, fd, i; @@ -1655,7 +1644,7 @@ static int check_path(const char *path) dir = dirname(dname); if (statfs(dir, &st_fs)) { - cp = str_error(errno, errmsg, sizeof(errmsg)); + cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg)); pr_warning("failed to statfs %s: %s\n", dir, cp); err = -errno; } @@ -1691,7 +1680,7 @@ int bpf_program__pin_instance(struct bpf_program *prog, const char *path, } if (bpf_obj_pin(prog->instances.fds[instance], path)) { - cp = str_error(errno, errmsg, sizeof(errmsg)); + cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg)); pr_warning("failed to pin program: %s\n", cp); return -errno; } @@ -1709,7 +1698,7 @@ static int make_dir(const char *path) err = -errno; if (err) { - cp = str_error(-err, errmsg, sizeof(errmsg)); + cp = libbpf_strerror_r(-err, errmsg, sizeof(errmsg)); pr_warning("failed to mkdir %s: %s\n", path, cp); } return err; @@ -1771,7 +1760,7 @@ int bpf_map__pin(struct bpf_map *map, const char *path) } if (bpf_obj_pin(map->fd, path)) { - cp = str_error(errno, errmsg, sizeof(errmsg)); + cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg)); pr_warning("failed to pin map: %s\n", cp); return -errno; } @@ -2085,58 +2074,90 @@ void bpf_program__set_expected_attach_type(struct bpf_program *prog, prog->expected_attach_type = type; } -#define BPF_PROG_SEC_FULL(string, ptype, atype) \ - { string, sizeof(string) - 1, ptype, atype } +#define BPF_PROG_SEC_IMPL(string, ptype, eatype, atype) \ + { string, sizeof(string) - 1, ptype, eatype, atype } + +/* Programs that can NOT be attached. */ +#define BPF_PROG_SEC(string, ptype) BPF_PROG_SEC_IMPL(string, ptype, 0, -EINVAL) -#define BPF_PROG_SEC(string, ptype) BPF_PROG_SEC_FULL(string, ptype, 0) +/* Programs that can be attached. */ +#define BPF_APROG_SEC(string, ptype, atype) \ + BPF_PROG_SEC_IMPL(string, ptype, 0, atype) -#define BPF_S_PROG_SEC(string, ptype) \ - BPF_PROG_SEC_FULL(string, BPF_PROG_TYPE_CGROUP_SOCK, ptype) +/* Programs that must specify expected attach type at load time. */ +#define BPF_EAPROG_SEC(string, ptype, eatype) \ + BPF_PROG_SEC_IMPL(string, ptype, eatype, eatype) -#define BPF_SA_PROG_SEC(string, ptype) \ - BPF_PROG_SEC_FULL(string, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, ptype) +/* Programs that can be attached but attach type can't be identified by section + * name. Kept for backward compatibility. + */ +#define BPF_APROG_COMPAT(string, ptype) BPF_PROG_SEC(string, ptype) static const struct { const char *sec; size_t len; enum bpf_prog_type prog_type; enum bpf_attach_type expected_attach_type; + enum bpf_attach_type attach_type; } section_names[] = { - BPF_PROG_SEC("socket", BPF_PROG_TYPE_SOCKET_FILTER), - BPF_PROG_SEC("kprobe/", BPF_PROG_TYPE_KPROBE), - BPF_PROG_SEC("kretprobe/", BPF_PROG_TYPE_KPROBE), - BPF_PROG_SEC("classifier", BPF_PROG_TYPE_SCHED_CLS), - BPF_PROG_SEC("action", BPF_PROG_TYPE_SCHED_ACT), - BPF_PROG_SEC("tracepoint/", BPF_PROG_TYPE_TRACEPOINT), - BPF_PROG_SEC("raw_tracepoint/", BPF_PROG_TYPE_RAW_TRACEPOINT), - BPF_PROG_SEC("xdp", BPF_PROG_TYPE_XDP), - BPF_PROG_SEC("perf_event", BPF_PROG_TYPE_PERF_EVENT), - BPF_PROG_SEC("cgroup/skb", BPF_PROG_TYPE_CGROUP_SKB), - BPF_PROG_SEC("cgroup/sock", BPF_PROG_TYPE_CGROUP_SOCK), - BPF_PROG_SEC("cgroup/dev", BPF_PROG_TYPE_CGROUP_DEVICE), - BPF_PROG_SEC("lwt_in", BPF_PROG_TYPE_LWT_IN), - BPF_PROG_SEC("lwt_out", BPF_PROG_TYPE_LWT_OUT), - BPF_PROG_SEC("lwt_xmit", BPF_PROG_TYPE_LWT_XMIT), - BPF_PROG_SEC("lwt_seg6local", BPF_PROG_TYPE_LWT_SEG6LOCAL), - BPF_PROG_SEC("sockops", BPF_PROG_TYPE_SOCK_OPS), - BPF_PROG_SEC("sk_skb", BPF_PROG_TYPE_SK_SKB), - BPF_PROG_SEC("sk_msg", BPF_PROG_TYPE_SK_MSG), - BPF_PROG_SEC("lirc_mode2", BPF_PROG_TYPE_LIRC_MODE2), - BPF_PROG_SEC("flow_dissector", BPF_PROG_TYPE_FLOW_DISSECTOR), - BPF_SA_PROG_SEC("cgroup/bind4", BPF_CGROUP_INET4_BIND), - BPF_SA_PROG_SEC("cgroup/bind6", BPF_CGROUP_INET6_BIND), - BPF_SA_PROG_SEC("cgroup/connect4", BPF_CGROUP_INET4_CONNECT), - BPF_SA_PROG_SEC("cgroup/connect6", BPF_CGROUP_INET6_CONNECT), - BPF_SA_PROG_SEC("cgroup/sendmsg4", BPF_CGROUP_UDP4_SENDMSG), - BPF_SA_PROG_SEC("cgroup/sendmsg6", BPF_CGROUP_UDP6_SENDMSG), - BPF_S_PROG_SEC("cgroup/post_bind4", BPF_CGROUP_INET4_POST_BIND), - BPF_S_PROG_SEC("cgroup/post_bind6", BPF_CGROUP_INET6_POST_BIND), + BPF_PROG_SEC("socket", BPF_PROG_TYPE_SOCKET_FILTER), + BPF_PROG_SEC("kprobe/", BPF_PROG_TYPE_KPROBE), + BPF_PROG_SEC("kretprobe/", BPF_PROG_TYPE_KPROBE), + BPF_PROG_SEC("classifier", BPF_PROG_TYPE_SCHED_CLS), + BPF_PROG_SEC("action", BPF_PROG_TYPE_SCHED_ACT), + BPF_PROG_SEC("tracepoint/", BPF_PROG_TYPE_TRACEPOINT), + BPF_PROG_SEC("raw_tracepoint/", BPF_PROG_TYPE_RAW_TRACEPOINT), + BPF_PROG_SEC("xdp", BPF_PROG_TYPE_XDP), + BPF_PROG_SEC("perf_event", BPF_PROG_TYPE_PERF_EVENT), + BPF_PROG_SEC("lwt_in", BPF_PROG_TYPE_LWT_IN), + BPF_PROG_SEC("lwt_out", BPF_PROG_TYPE_LWT_OUT), + BPF_PROG_SEC("lwt_xmit", BPF_PROG_TYPE_LWT_XMIT), + BPF_PROG_SEC("lwt_seg6local", BPF_PROG_TYPE_LWT_SEG6LOCAL), + BPF_APROG_SEC("cgroup_skb/ingress", BPF_PROG_TYPE_CGROUP_SKB, + BPF_CGROUP_INET_INGRESS), + BPF_APROG_SEC("cgroup_skb/egress", BPF_PROG_TYPE_CGROUP_SKB, + BPF_CGROUP_INET_EGRESS), + BPF_APROG_COMPAT("cgroup/skb", BPF_PROG_TYPE_CGROUP_SKB), + BPF_APROG_SEC("cgroup/sock", BPF_PROG_TYPE_CGROUP_SOCK, + BPF_CGROUP_INET_SOCK_CREATE), + BPF_EAPROG_SEC("cgroup/post_bind4", BPF_PROG_TYPE_CGROUP_SOCK, + BPF_CGROUP_INET4_POST_BIND), + BPF_EAPROG_SEC("cgroup/post_bind6", BPF_PROG_TYPE_CGROUP_SOCK, + BPF_CGROUP_INET6_POST_BIND), + BPF_APROG_SEC("cgroup/dev", BPF_PROG_TYPE_CGROUP_DEVICE, + BPF_CGROUP_DEVICE), + BPF_APROG_SEC("sockops", BPF_PROG_TYPE_SOCK_OPS, + BPF_CGROUP_SOCK_OPS), + BPF_APROG_SEC("sk_skb/stream_parser", BPF_PROG_TYPE_SK_SKB, + BPF_SK_SKB_STREAM_PARSER), + BPF_APROG_SEC("sk_skb/stream_verdict", BPF_PROG_TYPE_SK_SKB, + BPF_SK_SKB_STREAM_VERDICT), + BPF_APROG_COMPAT("sk_skb", BPF_PROG_TYPE_SK_SKB), + BPF_APROG_SEC("sk_msg", BPF_PROG_TYPE_SK_MSG, + BPF_SK_MSG_VERDICT), + BPF_APROG_SEC("lirc_mode2", BPF_PROG_TYPE_LIRC_MODE2, + BPF_LIRC_MODE2), + BPF_APROG_SEC("flow_dissector", BPF_PROG_TYPE_FLOW_DISSECTOR, + BPF_FLOW_DISSECTOR), + BPF_EAPROG_SEC("cgroup/bind4", BPF_PROG_TYPE_CGROUP_SOCK_ADDR, + BPF_CGROUP_INET4_BIND), + BPF_EAPROG_SEC("cgroup/bind6", BPF_PROG_TYPE_CGROUP_SOCK_ADDR, + BPF_CGROUP_INET6_BIND), + BPF_EAPROG_SEC("cgroup/connect4", BPF_PROG_TYPE_CGROUP_SOCK_ADDR, + BPF_CGROUP_INET4_CONNECT), + BPF_EAPROG_SEC("cgroup/connect6", BPF_PROG_TYPE_CGROUP_SOCK_ADDR, + BPF_CGROUP_INET6_CONNECT), + BPF_EAPROG_SEC("cgroup/sendmsg4", BPF_PROG_TYPE_CGROUP_SOCK_ADDR, + BPF_CGROUP_UDP4_SENDMSG), + BPF_EAPROG_SEC("cgroup/sendmsg6", BPF_PROG_TYPE_CGROUP_SOCK_ADDR, + BPF_CGROUP_UDP6_SENDMSG), }; +#undef BPF_PROG_SEC_IMPL #undef BPF_PROG_SEC -#undef BPF_PROG_SEC_FULL -#undef BPF_S_PROG_SEC -#undef BPF_SA_PROG_SEC +#undef BPF_APROG_SEC +#undef BPF_EAPROG_SEC +#undef BPF_APROG_COMPAT int libbpf_prog_type_by_name(const char *name, enum bpf_prog_type *prog_type, enum bpf_attach_type *expected_attach_type) @@ -2156,6 +2177,25 @@ int libbpf_prog_type_by_name(const char *name, enum bpf_prog_type *prog_type, return -EINVAL; } +int libbpf_attach_type_by_name(const char *name, + enum bpf_attach_type *attach_type) +{ + int i; + + if (!name) + return -EINVAL; + + for (i = 0; i < ARRAY_SIZE(section_names); i++) { + if (strncmp(name, section_names[i].sec, section_names[i].len)) + continue; + if (section_names[i].attach_type == -EINVAL) + return -EINVAL; + *attach_type = section_names[i].attach_type; + return 0; + } + return -EINVAL; +} + static int bpf_program__identify_section(struct bpf_program *prog, enum bpf_prog_type *prog_type, diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index e3b00e23e181..8af8d3663991 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -1,4 +1,4 @@ -/* SPDX-License-Identifier: LGPL-2.1 */ +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ /* * Common eBPF ELF object loading operations. @@ -6,22 +6,9 @@ * Copyright (C) 2013-2015 Alexei Starovoitov <ast@kernel.org> * Copyright (C) 2015 Wang Nan <wangnan0@huawei.com> * Copyright (C) 2015 Huawei Inc. - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU Lesser General Public - * License as published by the Free Software Foundation; - * version 2.1 of the License (not later!) - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU Lesser General Public License for more details. - * - * You should have received a copy of the GNU Lesser General Public - * License along with this program; if not, see <http://www.gnu.org/licenses> */ -#ifndef __BPF_LIBBPF_H -#define __BPF_LIBBPF_H +#ifndef __LIBBPF_LIBBPF_H +#define __LIBBPF_LIBBPF_H #include <stdio.h> #include <stdint.h> @@ -104,6 +91,8 @@ void *bpf_object__priv(struct bpf_object *prog); int libbpf_prog_type_by_name(const char *name, enum bpf_prog_type *prog_type, enum bpf_attach_type *expected_attach_type); +int libbpf_attach_type_by_name(const char *name, + enum bpf_attach_type *attach_type); /* Accessors of bpf_program */ struct bpf_program; @@ -126,10 +115,13 @@ void bpf_program__set_ifindex(struct bpf_program *prog, __u32 ifindex); const char *bpf_program__title(struct bpf_program *prog, bool needs_copy); +int bpf_program__load(struct bpf_program *prog, char *license, + __u32 kern_version); int bpf_program__fd(struct bpf_program *prog); int bpf_program__pin_instance(struct bpf_program *prog, const char *path, int instance); int bpf_program__pin(struct bpf_program *prog, const char *path); +void bpf_program__unload(struct bpf_program *prog); struct bpf_insn; @@ -299,18 +291,15 @@ int bpf_perf_event_read_simple(void *mem, unsigned long size, void **buf, size_t *buf_len, bpf_perf_event_print_t fn, void *priv); -struct nlmsghdr; struct nlattr; -typedef int (*dump_nlmsg_t)(void *cookie, void *msg, struct nlattr **tb); -typedef int (*__dump_nlmsg_t)(struct nlmsghdr *nlmsg, dump_nlmsg_t, - void *cookie); -int bpf_netlink_open(unsigned int *nl_pid); -int nl_get_link(int sock, unsigned int nl_pid, dump_nlmsg_t dump_link_nlmsg, - void *cookie); -int nl_get_class(int sock, unsigned int nl_pid, int ifindex, - dump_nlmsg_t dump_class_nlmsg, void *cookie); -int nl_get_qdisc(int sock, unsigned int nl_pid, int ifindex, - dump_nlmsg_t dump_qdisc_nlmsg, void *cookie); -int nl_get_filter(int sock, unsigned int nl_pid, int ifindex, int handle, - dump_nlmsg_t dump_filter_nlmsg, void *cookie); -#endif +typedef int (*libbpf_dump_nlmsg_t)(void *cookie, void *msg, struct nlattr **tb); +int libbpf_netlink_open(unsigned int *nl_pid); +int libbpf_nl_get_link(int sock, unsigned int nl_pid, + libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie); +int libbpf_nl_get_class(int sock, unsigned int nl_pid, int ifindex, + libbpf_dump_nlmsg_t dump_class_nlmsg, void *cookie); +int libbpf_nl_get_qdisc(int sock, unsigned int nl_pid, int ifindex, + libbpf_dump_nlmsg_t dump_qdisc_nlmsg, void *cookie); +int libbpf_nl_get_filter(int sock, unsigned int nl_pid, int ifindex, int handle, + libbpf_dump_nlmsg_t dump_filter_nlmsg, void *cookie); +#endif /* __LIBBPF_LIBBPF_H */ diff --git a/tools/lib/bpf/libbpf_errno.c b/tools/lib/bpf/libbpf_errno.c index 2464ade3b326..d83b17f8435c 100644 --- a/tools/lib/bpf/libbpf_errno.c +++ b/tools/lib/bpf/libbpf_errno.c @@ -1,23 +1,10 @@ -// SPDX-License-Identifier: LGPL-2.1 +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) /* * Copyright (C) 2013-2015 Alexei Starovoitov <ast@kernel.org> * Copyright (C) 2015 Wang Nan <wangnan0@huawei.com> * Copyright (C) 2015 Huawei Inc. * Copyright (C) 2017 Nicira, Inc. - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU Lesser General Public - * License as published by the Free Software Foundation; - * version 2.1 of the License (not later!) - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU Lesser General Public License for more details. - * - * You should have received a copy of the GNU Lesser General Public - * License along with this program; if not, see <http://www.gnu.org/licenses> */ #include <stdio.h> diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c index fde1d7bf8199..0ce67aea8f3b 100644 --- a/tools/lib/bpf/netlink.c +++ b/tools/lib/bpf/netlink.c @@ -1,4 +1,4 @@ -// SPDX-License-Identifier: LGPL-2.1 +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) /* Copyright (c) 2018 Facebook */ #include <stdlib.h> @@ -18,7 +18,10 @@ #define SOL_NETLINK 270 #endif -int bpf_netlink_open(__u32 *nl_pid) +typedef int (*__dump_nlmsg_t)(struct nlmsghdr *nlmsg, libbpf_dump_nlmsg_t, + void *cookie); + +int libbpf_netlink_open(__u32 *nl_pid) { struct sockaddr_nl sa; socklen_t addrlen; @@ -62,7 +65,7 @@ cleanup: } static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq, - __dump_nlmsg_t _fn, dump_nlmsg_t fn, + __dump_nlmsg_t _fn, libbpf_dump_nlmsg_t fn, void *cookie) { bool multipart = true; @@ -100,7 +103,7 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq, if (!err->error) continue; ret = err->error; - nla_dump_errormsg(nh); + libbpf_nla_dump_errormsg(nh); goto done; case NLMSG_DONE: return 0; @@ -130,7 +133,7 @@ int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags) } req; __u32 nl_pid; - sock = bpf_netlink_open(&nl_pid); + sock = libbpf_netlink_open(&nl_pid); if (sock < 0) return sock; @@ -178,8 +181,8 @@ cleanup: return ret; } -static int __dump_link_nlmsg(struct nlmsghdr *nlh, dump_nlmsg_t dump_link_nlmsg, - void *cookie) +static int __dump_link_nlmsg(struct nlmsghdr *nlh, + libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie) { struct nlattr *tb[IFLA_MAX + 1], *attr; struct ifinfomsg *ifi = NLMSG_DATA(nlh); @@ -187,14 +190,14 @@ static int __dump_link_nlmsg(struct nlmsghdr *nlh, dump_nlmsg_t dump_link_nlmsg, len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*ifi)); attr = (struct nlattr *) ((void *) ifi + NLMSG_ALIGN(sizeof(*ifi))); - if (nla_parse(tb, IFLA_MAX, attr, len, NULL) != 0) + if (libbpf_nla_parse(tb, IFLA_MAX, attr, len, NULL) != 0) return -LIBBPF_ERRNO__NLPARSE; return dump_link_nlmsg(cookie, ifi, tb); } -int nl_get_link(int sock, unsigned int nl_pid, dump_nlmsg_t dump_link_nlmsg, - void *cookie) +int libbpf_nl_get_link(int sock, unsigned int nl_pid, + libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie) { struct { struct nlmsghdr nlh; @@ -216,7 +219,8 @@ int nl_get_link(int sock, unsigned int nl_pid, dump_nlmsg_t dump_link_nlmsg, } static int __dump_class_nlmsg(struct nlmsghdr *nlh, - dump_nlmsg_t dump_class_nlmsg, void *cookie) + libbpf_dump_nlmsg_t dump_class_nlmsg, + void *cookie) { struct nlattr *tb[TCA_MAX + 1], *attr; struct tcmsg *t = NLMSG_DATA(nlh); @@ -224,14 +228,14 @@ static int __dump_class_nlmsg(struct nlmsghdr *nlh, len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*t)); attr = (struct nlattr *) ((void *) t + NLMSG_ALIGN(sizeof(*t))); - if (nla_parse(tb, TCA_MAX, attr, len, NULL) != 0) + if (libbpf_nla_parse(tb, TCA_MAX, attr, len, NULL) != 0) return -LIBBPF_ERRNO__NLPARSE; return dump_class_nlmsg(cookie, t, tb); } -int nl_get_class(int sock, unsigned int nl_pid, int ifindex, - dump_nlmsg_t dump_class_nlmsg, void *cookie) +int libbpf_nl_get_class(int sock, unsigned int nl_pid, int ifindex, + libbpf_dump_nlmsg_t dump_class_nlmsg, void *cookie) { struct { struct nlmsghdr nlh; @@ -254,7 +258,8 @@ int nl_get_class(int sock, unsigned int nl_pid, int ifindex, } static int __dump_qdisc_nlmsg(struct nlmsghdr *nlh, - dump_nlmsg_t dump_qdisc_nlmsg, void *cookie) + libbpf_dump_nlmsg_t dump_qdisc_nlmsg, + void *cookie) { struct nlattr *tb[TCA_MAX + 1], *attr; struct tcmsg *t = NLMSG_DATA(nlh); @@ -262,14 +267,14 @@ static int __dump_qdisc_nlmsg(struct nlmsghdr *nlh, len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*t)); attr = (struct nlattr *) ((void *) t + NLMSG_ALIGN(sizeof(*t))); - if (nla_parse(tb, TCA_MAX, attr, len, NULL) != 0) + if (libbpf_nla_parse(tb, TCA_MAX, attr, len, NULL) != 0) return -LIBBPF_ERRNO__NLPARSE; return dump_qdisc_nlmsg(cookie, t, tb); } -int nl_get_qdisc(int sock, unsigned int nl_pid, int ifindex, - dump_nlmsg_t dump_qdisc_nlmsg, void *cookie) +int libbpf_nl_get_qdisc(int sock, unsigned int nl_pid, int ifindex, + libbpf_dump_nlmsg_t dump_qdisc_nlmsg, void *cookie) { struct { struct nlmsghdr nlh; @@ -292,7 +297,8 @@ int nl_get_qdisc(int sock, unsigned int nl_pid, int ifindex, } static int __dump_filter_nlmsg(struct nlmsghdr *nlh, - dump_nlmsg_t dump_filter_nlmsg, void *cookie) + libbpf_dump_nlmsg_t dump_filter_nlmsg, + void *cookie) { struct nlattr *tb[TCA_MAX + 1], *attr; struct tcmsg *t = NLMSG_DATA(nlh); @@ -300,14 +306,14 @@ static int __dump_filter_nlmsg(struct nlmsghdr *nlh, len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*t)); attr = (struct nlattr *) ((void *) t + NLMSG_ALIGN(sizeof(*t))); - if (nla_parse(tb, TCA_MAX, attr, len, NULL) != 0) + if (libbpf_nla_parse(tb, TCA_MAX, attr, len, NULL) != 0) return -LIBBPF_ERRNO__NLPARSE; return dump_filter_nlmsg(cookie, t, tb); } -int nl_get_filter(int sock, unsigned int nl_pid, int ifindex, int handle, - dump_nlmsg_t dump_filter_nlmsg, void *cookie) +int libbpf_nl_get_filter(int sock, unsigned int nl_pid, int ifindex, int handle, + libbpf_dump_nlmsg_t dump_filter_nlmsg, void *cookie) { struct { struct nlmsghdr nlh; diff --git a/tools/lib/bpf/nlattr.c b/tools/lib/bpf/nlattr.c index 49f514119bdb..1e69c0c8d413 100644 --- a/tools/lib/bpf/nlattr.c +++ b/tools/lib/bpf/nlattr.c @@ -1,13 +1,8 @@ -// SPDX-License-Identifier: LGPL-2.1 +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) /* * NETLINK Netlink attributes * - * This library is free software; you can redistribute it and/or - * modify it under the terms of the GNU Lesser General Public - * License as published by the Free Software Foundation version 2.1 - * of the License. - * * Copyright (c) 2003-2013 Thomas Graf <tgraf@suug.ch> */ @@ -17,13 +12,13 @@ #include <string.h> #include <stdio.h> -static uint16_t nla_attr_minlen[NLA_TYPE_MAX+1] = { - [NLA_U8] = sizeof(uint8_t), - [NLA_U16] = sizeof(uint16_t), - [NLA_U32] = sizeof(uint32_t), - [NLA_U64] = sizeof(uint64_t), - [NLA_STRING] = 1, - [NLA_FLAG] = 0, +static uint16_t nla_attr_minlen[LIBBPF_NLA_TYPE_MAX+1] = { + [LIBBPF_NLA_U8] = sizeof(uint8_t), + [LIBBPF_NLA_U16] = sizeof(uint16_t), + [LIBBPF_NLA_U32] = sizeof(uint32_t), + [LIBBPF_NLA_U64] = sizeof(uint64_t), + [LIBBPF_NLA_STRING] = 1, + [LIBBPF_NLA_FLAG] = 0, }; static struct nlattr *nla_next(const struct nlattr *nla, int *remaining) @@ -47,9 +42,9 @@ static int nla_type(const struct nlattr *nla) } static int validate_nla(struct nlattr *nla, int maxtype, - struct nla_policy *policy) + struct libbpf_nla_policy *policy) { - struct nla_policy *pt; + struct libbpf_nla_policy *pt; unsigned int minlen = 0; int type = nla_type(nla); @@ -58,23 +53,24 @@ static int validate_nla(struct nlattr *nla, int maxtype, pt = &policy[type]; - if (pt->type > NLA_TYPE_MAX) + if (pt->type > LIBBPF_NLA_TYPE_MAX) return 0; if (pt->minlen) minlen = pt->minlen; - else if (pt->type != NLA_UNSPEC) + else if (pt->type != LIBBPF_NLA_UNSPEC) minlen = nla_attr_minlen[pt->type]; - if (nla_len(nla) < minlen) + if (libbpf_nla_len(nla) < minlen) return -1; - if (pt->maxlen && nla_len(nla) > pt->maxlen) + if (pt->maxlen && libbpf_nla_len(nla) > pt->maxlen) return -1; - if (pt->type == NLA_STRING) { - char *data = nla_data(nla); - if (data[nla_len(nla) - 1] != '\0') + if (pt->type == LIBBPF_NLA_STRING) { + char *data = libbpf_nla_data(nla); + + if (data[libbpf_nla_len(nla) - 1] != '\0') return -1; } @@ -104,15 +100,15 @@ static inline int nlmsg_len(const struct nlmsghdr *nlh) * @see nla_validate * @return 0 on success or a negative error code. */ -int nla_parse(struct nlattr *tb[], int maxtype, struct nlattr *head, int len, - struct nla_policy *policy) +int libbpf_nla_parse(struct nlattr *tb[], int maxtype, struct nlattr *head, + int len, struct libbpf_nla_policy *policy) { struct nlattr *nla; int rem, err; memset(tb, 0, sizeof(struct nlattr *) * (maxtype + 1)); - nla_for_each_attr(nla, head, len, rem) { + libbpf_nla_for_each_attr(nla, head, len, rem) { int type = nla_type(nla); if (type > maxtype) @@ -144,23 +140,25 @@ errout: * @arg policy Attribute validation policy. * * Feeds the stream of attributes nested into the specified attribute - * to nla_parse(). + * to libbpf_nla_parse(). * - * @see nla_parse + * @see libbpf_nla_parse * @return 0 on success or a negative error code. */ -int nla_parse_nested(struct nlattr *tb[], int maxtype, struct nlattr *nla, - struct nla_policy *policy) +int libbpf_nla_parse_nested(struct nlattr *tb[], int maxtype, + struct nlattr *nla, + struct libbpf_nla_policy *policy) { - return nla_parse(tb, maxtype, nla_data(nla), nla_len(nla), policy); + return libbpf_nla_parse(tb, maxtype, libbpf_nla_data(nla), + libbpf_nla_len(nla), policy); } /* dump netlink extended ack error message */ -int nla_dump_errormsg(struct nlmsghdr *nlh) +int libbpf_nla_dump_errormsg(struct nlmsghdr *nlh) { - struct nla_policy extack_policy[NLMSGERR_ATTR_MAX + 1] = { - [NLMSGERR_ATTR_MSG] = { .type = NLA_STRING }, - [NLMSGERR_ATTR_OFFS] = { .type = NLA_U32 }, + struct libbpf_nla_policy extack_policy[NLMSGERR_ATTR_MAX + 1] = { + [NLMSGERR_ATTR_MSG] = { .type = LIBBPF_NLA_STRING }, + [NLMSGERR_ATTR_OFFS] = { .type = LIBBPF_NLA_U32 }, }; struct nlattr *tb[NLMSGERR_ATTR_MAX + 1], *attr; struct nlmsgerr *err; @@ -181,14 +179,15 @@ int nla_dump_errormsg(struct nlmsghdr *nlh) attr = (struct nlattr *) ((void *) err + hlen); alen = nlh->nlmsg_len - hlen; - if (nla_parse(tb, NLMSGERR_ATTR_MAX, attr, alen, extack_policy) != 0) { + if (libbpf_nla_parse(tb, NLMSGERR_ATTR_MAX, attr, alen, + extack_policy) != 0) { fprintf(stderr, "Failed to parse extended error attributes\n"); return 0; } if (tb[NLMSGERR_ATTR_MSG]) - errmsg = (char *) nla_data(tb[NLMSGERR_ATTR_MSG]); + errmsg = (char *) libbpf_nla_data(tb[NLMSGERR_ATTR_MSG]); fprintf(stderr, "Kernel error message: %s\n", errmsg); diff --git a/tools/lib/bpf/nlattr.h b/tools/lib/bpf/nlattr.h index a6e2396bce7c..6cc3ac91690f 100644 --- a/tools/lib/bpf/nlattr.h +++ b/tools/lib/bpf/nlattr.h @@ -1,18 +1,13 @@ -/* SPDX-License-Identifier: LGPL-2.1 */ +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ /* * NETLINK Netlink attributes * - * This library is free software; you can redistribute it and/or - * modify it under the terms of the GNU Lesser General Public - * License as published by the Free Software Foundation version 2.1 - * of the License. - * * Copyright (c) 2003-2013 Thomas Graf <tgraf@suug.ch> */ -#ifndef __NLATTR_H -#define __NLATTR_H +#ifndef __LIBBPF_NLATTR_H +#define __LIBBPF_NLATTR_H #include <stdint.h> #include <linux/netlink.h> @@ -23,19 +18,19 @@ * Standard attribute types to specify validation policy */ enum { - NLA_UNSPEC, /**< Unspecified type, binary data chunk */ - NLA_U8, /**< 8 bit integer */ - NLA_U16, /**< 16 bit integer */ - NLA_U32, /**< 32 bit integer */ - NLA_U64, /**< 64 bit integer */ - NLA_STRING, /**< NUL terminated character string */ - NLA_FLAG, /**< Flag */ - NLA_MSECS, /**< Micro seconds (64bit) */ - NLA_NESTED, /**< Nested attributes */ - __NLA_TYPE_MAX, + LIBBPF_NLA_UNSPEC, /**< Unspecified type, binary data chunk */ + LIBBPF_NLA_U8, /**< 8 bit integer */ + LIBBPF_NLA_U16, /**< 16 bit integer */ + LIBBPF_NLA_U32, /**< 32 bit integer */ + LIBBPF_NLA_U64, /**< 64 bit integer */ + LIBBPF_NLA_STRING, /**< NUL terminated character string */ + LIBBPF_NLA_FLAG, /**< Flag */ + LIBBPF_NLA_MSECS, /**< Micro seconds (64bit) */ + LIBBPF_NLA_NESTED, /**< Nested attributes */ + __LIBBPF_NLA_TYPE_MAX, }; -#define NLA_TYPE_MAX (__NLA_TYPE_MAX - 1) +#define LIBBPF_NLA_TYPE_MAX (__LIBBPF_NLA_TYPE_MAX - 1) /** * @ingroup attr @@ -43,8 +38,8 @@ enum { * * See section @core_doc{core_attr_parse,Attribute Parsing} for more details. */ -struct nla_policy { - /** Type of attribute or NLA_UNSPEC */ +struct libbpf_nla_policy { + /** Type of attribute or LIBBPF_NLA_UNSPEC */ uint16_t type; /** Minimal length of payload required */ @@ -62,49 +57,50 @@ struct nla_policy { * @arg len length of attribute stream * @arg rem initialized to len, holds bytes currently remaining in stream */ -#define nla_for_each_attr(pos, head, len, rem) \ +#define libbpf_nla_for_each_attr(pos, head, len, rem) \ for (pos = head, rem = len; \ nla_ok(pos, rem); \ pos = nla_next(pos, &(rem))) /** - * nla_data - head of payload + * libbpf_nla_data - head of payload * @nla: netlink attribute */ -static inline void *nla_data(const struct nlattr *nla) +static inline void *libbpf_nla_data(const struct nlattr *nla) { return (char *) nla + NLA_HDRLEN; } -static inline uint8_t nla_getattr_u8(const struct nlattr *nla) +static inline uint8_t libbpf_nla_getattr_u8(const struct nlattr *nla) { - return *(uint8_t *)nla_data(nla); + return *(uint8_t *)libbpf_nla_data(nla); } -static inline uint32_t nla_getattr_u32(const struct nlattr *nla) +static inline uint32_t libbpf_nla_getattr_u32(const struct nlattr *nla) { - return *(uint32_t *)nla_data(nla); + return *(uint32_t *)libbpf_nla_data(nla); } -static inline const char *nla_getattr_str(const struct nlattr *nla) +static inline const char *libbpf_nla_getattr_str(const struct nlattr *nla) { - return (const char *)nla_data(nla); + return (const char *)libbpf_nla_data(nla); } /** - * nla_len - length of payload + * libbpf_nla_len - length of payload * @nla: netlink attribute */ -static inline int nla_len(const struct nlattr *nla) +static inline int libbpf_nla_len(const struct nlattr *nla) { return nla->nla_len - NLA_HDRLEN; } -int nla_parse(struct nlattr *tb[], int maxtype, struct nlattr *head, int len, - struct nla_policy *policy); -int nla_parse_nested(struct nlattr *tb[], int maxtype, struct nlattr *nla, - struct nla_policy *policy); +int libbpf_nla_parse(struct nlattr *tb[], int maxtype, struct nlattr *head, + int len, struct libbpf_nla_policy *policy); +int libbpf_nla_parse_nested(struct nlattr *tb[], int maxtype, + struct nlattr *nla, + struct libbpf_nla_policy *policy); -int nla_dump_errormsg(struct nlmsghdr *nlh); +int libbpf_nla_dump_errormsg(struct nlmsghdr *nlh); -#endif /* __NLATTR_H */ +#endif /* __LIBBPF_NLATTR_H */ diff --git a/tools/lib/bpf/str_error.c b/tools/lib/bpf/str_error.c index b8798114a357..00e48ac5b806 100644 --- a/tools/lib/bpf/str_error.c +++ b/tools/lib/bpf/str_error.c @@ -1,4 +1,4 @@ -// SPDX-License-Identifier: LGPL-2.1 +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) #undef _GNU_SOURCE #include <string.h> #include <stdio.h> @@ -9,7 +9,7 @@ * libc, while checking strerror_r() return to avoid having to check this in * all places calling it. */ -char *str_error(int err, char *dst, int len) +char *libbpf_strerror_r(int err, char *dst, int len) { int ret = strerror_r(err, dst, len); if (ret) diff --git a/tools/lib/bpf/str_error.h b/tools/lib/bpf/str_error.h index 355b1db571d1..a139334d57b6 100644 --- a/tools/lib/bpf/str_error.h +++ b/tools/lib/bpf/str_error.h @@ -1,6 +1,6 @@ -// SPDX-License-Identifier: LGPL-2.1 -#ifndef BPF_STR_ERROR -#define BPF_STR_ERROR +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ +#ifndef __LIBBPF_STR_ERROR_H +#define __LIBBPF_STR_ERROR_H -char *str_error(int err, char *dst, int len); -#endif // BPF_STR_ERROR +char *libbpf_strerror_r(int err, char *dst, int len); +#endif /* __LIBBPF_STR_ERROR_H */ diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index fd3851d5c079..1381ab81099c 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -23,7 +23,8 @@ $(TEST_CUSTOM_PROGS): $(OUTPUT)/%: %.c TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs \ test_align test_verifier_log test_dev_cgroup test_tcpbpf_user \ test_sock test_btf test_sockmap test_lirc_mode2_user get_cgroup_id_user \ - test_socket_cookie test_cgroup_storage test_select_reuseport + test_socket_cookie test_cgroup_storage test_select_reuseport test_section_names \ + test_netcnt TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test_obj_id.o \ test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o sockmap_parse_prog.o \ @@ -35,7 +36,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test test_get_stack_rawtp.o test_sockmap_kern.o test_sockhash_kern.o \ test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o test_lirc_mode2_kern.o \ get_cgroup_id_kern.o socket_cookie_prog.o test_select_reuseport_kern.o \ - test_skb_cgroup_id_kern.o bpf_flow.o + test_skb_cgroup_id_kern.o bpf_flow.o netcnt_prog.o test_sk_lookup_kern.o # Order correspond to 'make run_tests' order TEST_PROGS := test_kmod.sh \ @@ -72,6 +73,7 @@ $(OUTPUT)/test_tcpbpf_user: cgroup_helpers.c $(OUTPUT)/test_progs: trace_helpers.c $(OUTPUT)/get_cgroup_id_user: cgroup_helpers.c $(OUTPUT)/test_cgroup_storage: cgroup_helpers.c +$(OUTPUT)/test_netcnt: cgroup_helpers.c .PHONY: force diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h index e4be7730222d..1d407b3494f9 100644 --- a/tools/testing/selftests/bpf/bpf_helpers.h +++ b/tools/testing/selftests/bpf/bpf_helpers.h @@ -143,6 +143,18 @@ static unsigned long long (*bpf_skb_cgroup_id)(void *ctx) = (void *) BPF_FUNC_skb_cgroup_id; static unsigned long long (*bpf_skb_ancestor_cgroup_id)(void *ctx, int level) = (void *) BPF_FUNC_skb_ancestor_cgroup_id; +static struct bpf_sock *(*bpf_sk_lookup_tcp)(void *ctx, + struct bpf_sock_tuple *tuple, + int size, unsigned int netns_id, + unsigned long long flags) = + (void *) BPF_FUNC_sk_lookup_tcp; +static struct bpf_sock *(*bpf_sk_lookup_udp)(void *ctx, + struct bpf_sock_tuple *tuple, + int size, unsigned int netns_id, + unsigned long long flags) = + (void *) BPF_FUNC_sk_lookup_udp; +static int (*bpf_sk_release)(struct bpf_sock *sk) = + (void *) BPF_FUNC_sk_release; /* llvm builtin functions that eBPF C program may use to * emit BPF_LD_ABS and BPF_LD_IND instructions diff --git a/tools/testing/selftests/bpf/netcnt_common.h b/tools/testing/selftests/bpf/netcnt_common.h new file mode 100644 index 000000000000..81084c1c2c23 --- /dev/null +++ b/tools/testing/selftests/bpf/netcnt_common.h @@ -0,0 +1,24 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef __NETCNT_COMMON_H +#define __NETCNT_COMMON_H + +#include <linux/types.h> + +#define MAX_PERCPU_PACKETS 32 + +struct percpu_net_cnt { + __u64 packets; + __u64 bytes; + + __u64 prev_ts; + + __u64 prev_packets; + __u64 prev_bytes; +}; + +struct net_cnt { + __u64 packets; + __u64 bytes; +}; + +#endif diff --git a/tools/testing/selftests/bpf/netcnt_prog.c b/tools/testing/selftests/bpf/netcnt_prog.c new file mode 100644 index 000000000000..1198abca1360 --- /dev/null +++ b/tools/testing/selftests/bpf/netcnt_prog.c @@ -0,0 +1,71 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/bpf.h> +#include <linux/version.h> + +#include "bpf_helpers.h" +#include "netcnt_common.h" + +#define MAX_BPS (3 * 1024 * 1024) + +#define REFRESH_TIME_NS 100000000 +#define NS_PER_SEC 1000000000 + +struct bpf_map_def SEC("maps") percpu_netcnt = { + .type = BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE, + .key_size = sizeof(struct bpf_cgroup_storage_key), + .value_size = sizeof(struct percpu_net_cnt), +}; + +struct bpf_map_def SEC("maps") netcnt = { + .type = BPF_MAP_TYPE_CGROUP_STORAGE, + .key_size = sizeof(struct bpf_cgroup_storage_key), + .value_size = sizeof(struct net_cnt), +}; + +SEC("cgroup/skb") +int bpf_nextcnt(struct __sk_buff *skb) +{ + struct percpu_net_cnt *percpu_cnt; + char fmt[] = "%d %llu %llu\n"; + struct net_cnt *cnt; + __u64 ts, dt; + int ret; + + cnt = bpf_get_local_storage(&netcnt, 0); + percpu_cnt = bpf_get_local_storage(&percpu_netcnt, 0); + + percpu_cnt->packets++; + percpu_cnt->bytes += skb->len; + + if (percpu_cnt->packets > MAX_PERCPU_PACKETS) { + __sync_fetch_and_add(&cnt->packets, + percpu_cnt->packets); + percpu_cnt->packets = 0; + + __sync_fetch_and_add(&cnt->bytes, + percpu_cnt->bytes); + percpu_cnt->bytes = 0; + } + + ts = bpf_ktime_get_ns(); + dt = ts - percpu_cnt->prev_ts; + + dt *= MAX_BPS; + dt /= NS_PER_SEC; + + if (cnt->bytes + percpu_cnt->bytes - percpu_cnt->prev_bytes < dt) + ret = 1; + else + ret = 0; + + if (dt > REFRESH_TIME_NS) { + percpu_cnt->prev_ts = ts; + percpu_cnt->prev_packets = cnt->packets; + percpu_cnt->prev_bytes = cnt->bytes; + } + + return !!ret; +} + +char _license[] SEC("license") = "GPL"; +__u32 _version SEC("version") = LINUX_VERSION_CODE; diff --git a/tools/testing/selftests/bpf/test_cgroup_storage.c b/tools/testing/selftests/bpf/test_cgroup_storage.c index 4e196e3bfecf..f44834155f25 100644 --- a/tools/testing/selftests/bpf/test_cgroup_storage.c +++ b/tools/testing/selftests/bpf/test_cgroup_storage.c @@ -4,6 +4,7 @@ #include <linux/filter.h> #include <stdio.h> #include <stdlib.h> +#include <sys/sysinfo.h> #include "bpf_rlimit.h" #include "cgroup_helpers.h" @@ -15,6 +16,14 @@ char bpf_log_buf[BPF_LOG_BUF_SIZE]; int main(int argc, char **argv) { struct bpf_insn prog[] = { + BPF_LD_MAP_FD(BPF_REG_1, 0), /* percpu map fd */ + BPF_MOV64_IMM(BPF_REG_2, 0), /* flags, not used */ + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_get_local_storage), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_0, 0), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, 0x1), + BPF_STX_MEM(BPF_W, BPF_REG_0, BPF_REG_3, 0), + BPF_LD_MAP_FD(BPF_REG_1, 0), /* map fd */ BPF_MOV64_IMM(BPF_REG_2, 0), /* flags, not used */ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, @@ -28,9 +37,18 @@ int main(int argc, char **argv) }; size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn); int error = EXIT_FAILURE; - int map_fd, prog_fd, cgroup_fd; + int map_fd, percpu_map_fd, prog_fd, cgroup_fd; struct bpf_cgroup_storage_key key; unsigned long long value; + unsigned long long *percpu_value; + int cpu, nproc; + + nproc = get_nprocs_conf(); + percpu_value = malloc(sizeof(*percpu_value) * nproc); + if (!percpu_value) { + printf("Not enough memory for per-cpu area (%d cpus)\n", nproc); + goto err; + } map_fd = bpf_create_map(BPF_MAP_TYPE_CGROUP_STORAGE, sizeof(key), sizeof(value), 0, 0); @@ -39,7 +57,15 @@ int main(int argc, char **argv) goto out; } - prog[0].imm = map_fd; + percpu_map_fd = bpf_create_map(BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE, + sizeof(key), sizeof(value), 0, 0); + if (percpu_map_fd < 0) { + printf("Failed to create map: %s\n", strerror(errno)); + goto out; + } + + prog[0].imm = percpu_map_fd; + prog[7].imm = map_fd; prog_fd = bpf_load_program(BPF_PROG_TYPE_CGROUP_SKB, prog, insns_cnt, "GPL", 0, bpf_log_buf, BPF_LOG_BUF_SIZE); @@ -77,7 +103,15 @@ int main(int argc, char **argv) } if (bpf_map_lookup_elem(map_fd, &key, &value)) { - printf("Failed to lookup cgroup storage\n"); + printf("Failed to lookup cgroup storage 0\n"); + goto err; + } + + for (cpu = 0; cpu < nproc; cpu++) + percpu_value[cpu] = 1000; + + if (bpf_map_update_elem(percpu_map_fd, &key, percpu_value, 0)) { + printf("Failed to update the data in the cgroup storage\n"); goto err; } @@ -120,11 +154,31 @@ int main(int argc, char **argv) goto err; } + /* Check the final value of the counter in the percpu local storage */ + + for (cpu = 0; cpu < nproc; cpu++) + percpu_value[cpu] = 0; + + if (bpf_map_lookup_elem(percpu_map_fd, &key, percpu_value)) { + printf("Failed to lookup the per-cpu cgroup storage\n"); + goto err; + } + + value = 0; + for (cpu = 0; cpu < nproc; cpu++) + value += percpu_value[cpu]; + + if (value != nproc * 1000 + 6) { + printf("Unexpected data in the per-cpu cgroup storage\n"); + goto err; + } + error = 0; printf("test_cgroup_storage:PASS\n"); err: cleanup_cgroup_environment(); + free(percpu_value); out: return error; diff --git a/tools/testing/selftests/bpf/test_netcnt.c b/tools/testing/selftests/bpf/test_netcnt.c new file mode 100644 index 000000000000..7887df693399 --- /dev/null +++ b/tools/testing/selftests/bpf/test_netcnt.c @@ -0,0 +1,158 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <errno.h> +#include <assert.h> +#include <sys/sysinfo.h> +#include <sys/time.h> + +#include <linux/bpf.h> +#include <bpf/bpf.h> +#include <bpf/libbpf.h> + +#include "cgroup_helpers.h" +#include "bpf_rlimit.h" +#include "netcnt_common.h" + +#define BPF_PROG "./netcnt_prog.o" +#define TEST_CGROUP "/test-network-counters/" + +static int bpf_find_map(const char *test, struct bpf_object *obj, + const char *name) +{ + struct bpf_map *map; + + map = bpf_object__find_map_by_name(obj, name); + if (!map) { + printf("%s:FAIL:map '%s' not found\n", test, name); + return -1; + } + return bpf_map__fd(map); +} + +int main(int argc, char **argv) +{ + struct percpu_net_cnt *percpu_netcnt; + struct bpf_cgroup_storage_key key; + int map_fd, percpu_map_fd; + int error = EXIT_FAILURE; + struct net_cnt netcnt; + struct bpf_object *obj; + int prog_fd, cgroup_fd; + unsigned long packets; + unsigned long bytes; + int cpu, nproc; + __u32 prog_cnt; + + nproc = get_nprocs_conf(); + percpu_netcnt = malloc(sizeof(*percpu_netcnt) * nproc); + if (!percpu_netcnt) { + printf("Not enough memory for per-cpu area (%d cpus)\n", nproc); + goto err; + } + + if (bpf_prog_load(BPF_PROG, BPF_PROG_TYPE_CGROUP_SKB, + &obj, &prog_fd)) { + printf("Failed to load bpf program\n"); + goto out; + } + + if (setup_cgroup_environment()) { + printf("Failed to load bpf program\n"); + goto err; + } + + /* Create a cgroup, get fd, and join it */ + cgroup_fd = create_and_get_cgroup(TEST_CGROUP); + if (!cgroup_fd) { + printf("Failed to create test cgroup\n"); + goto err; + } + + if (join_cgroup(TEST_CGROUP)) { + printf("Failed to join cgroup\n"); + goto err; + } + + /* Attach bpf program */ + if (bpf_prog_attach(prog_fd, cgroup_fd, BPF_CGROUP_INET_EGRESS, 0)) { + printf("Failed to attach bpf program"); + goto err; + } + + assert(system("ping localhost -6 -c 10000 -f -q > /dev/null") == 0); + + if (bpf_prog_query(cgroup_fd, BPF_CGROUP_INET_EGRESS, 0, NULL, NULL, + &prog_cnt)) { + printf("Failed to query attached programs"); + goto err; + } + + map_fd = bpf_find_map(__func__, obj, "netcnt"); + if (map_fd < 0) { + printf("Failed to find bpf map with net counters"); + goto err; + } + + percpu_map_fd = bpf_find_map(__func__, obj, "percpu_netcnt"); + if (percpu_map_fd < 0) { + printf("Failed to find bpf map with percpu net counters"); + goto err; + } + + if (bpf_map_get_next_key(map_fd, NULL, &key)) { + printf("Failed to get key in cgroup storage\n"); + goto err; + } + + if (bpf_map_lookup_elem(map_fd, &key, &netcnt)) { + printf("Failed to lookup cgroup storage\n"); + goto err; + } + + if (bpf_map_lookup_elem(percpu_map_fd, &key, &percpu_netcnt[0])) { + printf("Failed to lookup percpu cgroup storage\n"); + goto err; + } + + /* Some packets can be still in per-cpu cache, but not more than + * MAX_PERCPU_PACKETS. + */ + packets = netcnt.packets; + bytes = netcnt.bytes; + for (cpu = 0; cpu < nproc; cpu++) { + if (percpu_netcnt[cpu].packets > MAX_PERCPU_PACKETS) { + printf("Unexpected percpu value: %llu\n", + percpu_netcnt[cpu].packets); + goto err; + } + + packets += percpu_netcnt[cpu].packets; + bytes += percpu_netcnt[cpu].bytes; + } + + /* No packets should be lost */ + if (packets != 10000) { + printf("Unexpected packet count: %lu\n", packets); + goto err; + } + + /* Let's check that bytes counter matches the number of packets + * multiplied by the size of ipv6 ICMP packet. + */ + if (bytes != packets * 104) { + printf("Unexpected bytes count: %lu\n", bytes); + goto err; + } + + error = 0; + printf("test_netcnt:PASS\n"); + +err: + cleanup_cgroup_environment(); + free(percpu_netcnt); + +out: + return error; +} diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c index 63a671803ed6..e8becca9c521 100644 --- a/tools/testing/selftests/bpf/test_progs.c +++ b/tools/testing/selftests/bpf/test_progs.c @@ -1698,6 +1698,43 @@ static void test_task_fd_query_tp(void) "sys_enter_read"); } +static void test_reference_tracking() +{ + const char *file = "./test_sk_lookup_kern.o"; + struct bpf_object *obj; + struct bpf_program *prog; + __u32 duration; + int err = 0; + + obj = bpf_object__open(file); + if (IS_ERR(obj)) { + error_cnt++; + return; + } + + bpf_object__for_each_program(prog, obj) { + const char *title; + + /* Ignore .text sections */ + title = bpf_program__title(prog, false); + if (strstr(title, ".text") != NULL) + continue; + + bpf_program__set_type(prog, BPF_PROG_TYPE_SCHED_CLS); + + /* Expect verifier failure if test name has 'fail' */ + if (strstr(title, "fail") != NULL) { + libbpf_set_print(NULL, NULL, NULL); + err = !bpf_program__load(prog, "GPL", 0); + libbpf_set_print(printf, printf, NULL); + } else { + err = bpf_program__load(prog, "GPL", 0); + } + CHECK(err, title, "\n"); + } + bpf_object__close(obj); +} + int main(void) { jit_enabled = is_jit_enabled(); @@ -1719,6 +1756,7 @@ int main(void) test_get_stack_raw_tp(); test_task_fd_query_rawtp(); test_task_fd_query_tp(); + test_reference_tracking(); printf("Summary: %d PASSED, %d FAILED\n", pass_cnt, error_cnt); return error_cnt ? EXIT_FAILURE : EXIT_SUCCESS; diff --git a/tools/testing/selftests/bpf/test_section_names.c b/tools/testing/selftests/bpf/test_section_names.c new file mode 100644 index 000000000000..7c4f41572b1c --- /dev/null +++ b/tools/testing/selftests/bpf/test_section_names.c @@ -0,0 +1,208 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (c) 2018 Facebook + +#include <err.h> +#include <bpf/libbpf.h> + +#include "bpf_util.h" + +struct sec_name_test { + const char sec_name[32]; + struct { + int rc; + enum bpf_prog_type prog_type; + enum bpf_attach_type expected_attach_type; + } expected_load; + struct { + int rc; + enum bpf_attach_type attach_type; + } expected_attach; +}; + +static struct sec_name_test tests[] = { + {"InvAliD", {-EINVAL, 0, 0}, {-EINVAL, 0} }, + {"cgroup", {-EINVAL, 0, 0}, {-EINVAL, 0} }, + {"socket", {0, BPF_PROG_TYPE_SOCKET_FILTER, 0}, {-EINVAL, 0} }, + {"kprobe/", {0, BPF_PROG_TYPE_KPROBE, 0}, {-EINVAL, 0} }, + {"kretprobe/", {0, BPF_PROG_TYPE_KPROBE, 0}, {-EINVAL, 0} }, + {"classifier", {0, BPF_PROG_TYPE_SCHED_CLS, 0}, {-EINVAL, 0} }, + {"action", {0, BPF_PROG_TYPE_SCHED_ACT, 0}, {-EINVAL, 0} }, + {"tracepoint/", {0, BPF_PROG_TYPE_TRACEPOINT, 0}, {-EINVAL, 0} }, + { + "raw_tracepoint/", + {0, BPF_PROG_TYPE_RAW_TRACEPOINT, 0}, + {-EINVAL, 0}, + }, + {"xdp", {0, BPF_PROG_TYPE_XDP, 0}, {-EINVAL, 0} }, + {"perf_event", {0, BPF_PROG_TYPE_PERF_EVENT, 0}, {-EINVAL, 0} }, + {"lwt_in", {0, BPF_PROG_TYPE_LWT_IN, 0}, {-EINVAL, 0} }, + {"lwt_out", {0, BPF_PROG_TYPE_LWT_OUT, 0}, {-EINVAL, 0} }, + {"lwt_xmit", {0, BPF_PROG_TYPE_LWT_XMIT, 0}, {-EINVAL, 0} }, + {"lwt_seg6local", {0, BPF_PROG_TYPE_LWT_SEG6LOCAL, 0}, {-EINVAL, 0} }, + { + "cgroup_skb/ingress", + {0, BPF_PROG_TYPE_CGROUP_SKB, 0}, + {0, BPF_CGROUP_INET_INGRESS}, + }, + { + "cgroup_skb/egress", + {0, BPF_PROG_TYPE_CGROUP_SKB, 0}, + {0, BPF_CGROUP_INET_EGRESS}, + }, + {"cgroup/skb", {0, BPF_PROG_TYPE_CGROUP_SKB, 0}, {-EINVAL, 0} }, + { + "cgroup/sock", + {0, BPF_PROG_TYPE_CGROUP_SOCK, 0}, + {0, BPF_CGROUP_INET_SOCK_CREATE}, + }, + { + "cgroup/post_bind4", + {0, BPF_PROG_TYPE_CGROUP_SOCK, BPF_CGROUP_INET4_POST_BIND}, + {0, BPF_CGROUP_INET4_POST_BIND}, + }, + { + "cgroup/post_bind6", + {0, BPF_PROG_TYPE_CGROUP_SOCK, BPF_CGROUP_INET6_POST_BIND}, + {0, BPF_CGROUP_INET6_POST_BIND}, + }, + { + "cgroup/dev", + {0, BPF_PROG_TYPE_CGROUP_DEVICE, 0}, + {0, BPF_CGROUP_DEVICE}, + }, + {"sockops", {0, BPF_PROG_TYPE_SOCK_OPS, 0}, {0, BPF_CGROUP_SOCK_OPS} }, + { + "sk_skb/stream_parser", + {0, BPF_PROG_TYPE_SK_SKB, 0}, + {0, BPF_SK_SKB_STREAM_PARSER}, + }, + { + "sk_skb/stream_verdict", + {0, BPF_PROG_TYPE_SK_SKB, 0}, + {0, BPF_SK_SKB_STREAM_VERDICT}, + }, + {"sk_skb", {0, BPF_PROG_TYPE_SK_SKB, 0}, {-EINVAL, 0} }, + {"sk_msg", {0, BPF_PROG_TYPE_SK_MSG, 0}, {0, BPF_SK_MSG_VERDICT} }, + {"lirc_mode2", {0, BPF_PROG_TYPE_LIRC_MODE2, 0}, {0, BPF_LIRC_MODE2} }, + { + "flow_dissector", + {0, BPF_PROG_TYPE_FLOW_DISSECTOR, 0}, + {0, BPF_FLOW_DISSECTOR}, + }, + { + "cgroup/bind4", + {0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_INET4_BIND}, + {0, BPF_CGROUP_INET4_BIND}, + }, + { + "cgroup/bind6", + {0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_INET6_BIND}, + {0, BPF_CGROUP_INET6_BIND}, + }, + { + "cgroup/connect4", + {0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_INET4_CONNECT}, + {0, BPF_CGROUP_INET4_CONNECT}, + }, + { + "cgroup/connect6", + {0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_INET6_CONNECT}, + {0, BPF_CGROUP_INET6_CONNECT}, + }, + { + "cgroup/sendmsg4", + {0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_UDP4_SENDMSG}, + {0, BPF_CGROUP_UDP4_SENDMSG}, + }, + { + "cgroup/sendmsg6", + {0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_UDP6_SENDMSG}, + {0, BPF_CGROUP_UDP6_SENDMSG}, + }, +}; + +static int test_prog_type_by_name(const struct sec_name_test *test) +{ + enum bpf_attach_type expected_attach_type; + enum bpf_prog_type prog_type; + int rc; + + rc = libbpf_prog_type_by_name(test->sec_name, &prog_type, + &expected_attach_type); + + if (rc != test->expected_load.rc) { + warnx("prog: unexpected rc=%d for %s", rc, test->sec_name); + return -1; + } + + if (rc) + return 0; + + if (prog_type != test->expected_load.prog_type) { + warnx("prog: unexpected prog_type=%d for %s", prog_type, + test->sec_name); + return -1; + } + + if (expected_attach_type != test->expected_load.expected_attach_type) { + warnx("prog: unexpected expected_attach_type=%d for %s", + expected_attach_type, test->sec_name); + return -1; + } + + return 0; +} + +static int test_attach_type_by_name(const struct sec_name_test *test) +{ + enum bpf_attach_type attach_type; + int rc; + + rc = libbpf_attach_type_by_name(test->sec_name, &attach_type); + + if (rc != test->expected_attach.rc) { + warnx("attach: unexpected rc=%d for %s", rc, test->sec_name); + return -1; + } + + if (rc) + return 0; + + if (attach_type != test->expected_attach.attach_type) { + warnx("attach: unexpected attach_type=%d for %s", attach_type, + test->sec_name); + return -1; + } + + return 0; +} + +static int run_test_case(const struct sec_name_test *test) +{ + if (test_prog_type_by_name(test)) + return -1; + if (test_attach_type_by_name(test)) + return -1; + return 0; +} + +static int run_tests(void) +{ + int passes = 0; + int fails = 0; + int i; + + for (i = 0; i < ARRAY_SIZE(tests); ++i) { + if (run_test_case(&tests[i])) + ++fails; + else + ++passes; + } + printf("Summary: %d PASSED, %d FAILED\n", passes, fails); + return fails ? -1 : 0; +} + +int main(int argc, char **argv) +{ + return run_tests(); +} diff --git a/tools/testing/selftests/bpf/test_sk_lookup_kern.c b/tools/testing/selftests/bpf/test_sk_lookup_kern.c new file mode 100644 index 000000000000..b745bdc08c2b --- /dev/null +++ b/tools/testing/selftests/bpf/test_sk_lookup_kern.c @@ -0,0 +1,180 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +// Copyright (c) 2018 Covalent IO, Inc. http://covalent.io + +#include <stddef.h> +#include <stdbool.h> +#include <string.h> +#include <linux/bpf.h> +#include <linux/if_ether.h> +#include <linux/in.h> +#include <linux/ip.h> +#include <linux/ipv6.h> +#include <linux/pkt_cls.h> +#include <linux/tcp.h> +#include <sys/socket.h> +#include "bpf_helpers.h" +#include "bpf_endian.h" + +int _version SEC("version") = 1; +char _license[] SEC("license") = "GPL"; + +/* Fill 'tuple' with L3 info, and attempt to find L4. On fail, return NULL. */ +static struct bpf_sock_tuple *get_tuple(void *data, __u64 nh_off, + void *data_end, __u16 eth_proto, + bool *ipv4) +{ + struct bpf_sock_tuple *result; + __u8 proto = 0; + __u64 ihl_len; + + if (eth_proto == bpf_htons(ETH_P_IP)) { + struct iphdr *iph = (struct iphdr *)(data + nh_off); + + if (iph + 1 > data_end) + return NULL; + ihl_len = iph->ihl * 4; + proto = iph->protocol; + *ipv4 = true; + result = (struct bpf_sock_tuple *)&iph->saddr; + } else if (eth_proto == bpf_htons(ETH_P_IPV6)) { + struct ipv6hdr *ip6h = (struct ipv6hdr *)(data + nh_off); + + if (ip6h + 1 > data_end) + return NULL; + ihl_len = sizeof(*ip6h); + proto = ip6h->nexthdr; + *ipv4 = true; + result = (struct bpf_sock_tuple *)&ip6h->saddr; + } + + if (data + nh_off + ihl_len > data_end || proto != IPPROTO_TCP) + return NULL; + + return result; +} + +SEC("sk_lookup_success") +int bpf_sk_lookup_test0(struct __sk_buff *skb) +{ + void *data_end = (void *)(long)skb->data_end; + void *data = (void *)(long)skb->data; + struct ethhdr *eth = (struct ethhdr *)(data); + struct bpf_sock_tuple *tuple; + struct bpf_sock *sk; + size_t tuple_len; + bool ipv4; + + if (eth + 1 > data_end) + return TC_ACT_SHOT; + + tuple = get_tuple(data, sizeof(*eth), data_end, eth->h_proto, &ipv4); + if (!tuple || tuple + sizeof *tuple > data_end) + return TC_ACT_SHOT; + + tuple_len = ipv4 ? sizeof(tuple->ipv4) : sizeof(tuple->ipv6); + sk = bpf_sk_lookup_tcp(skb, tuple, tuple_len, 0, 0); + if (sk) + bpf_sk_release(sk); + return sk ? TC_ACT_OK : TC_ACT_UNSPEC; +} + +SEC("sk_lookup_success_simple") +int bpf_sk_lookup_test1(struct __sk_buff *skb) +{ + struct bpf_sock_tuple tuple = {}; + struct bpf_sock *sk; + + sk = bpf_sk_lookup_tcp(skb, &tuple, sizeof(tuple), 0, 0); + if (sk) + bpf_sk_release(sk); + return 0; +} + +SEC("fail_use_after_free") +int bpf_sk_lookup_uaf(struct __sk_buff *skb) +{ + struct bpf_sock_tuple tuple = {}; + struct bpf_sock *sk; + __u32 family = 0; + + sk = bpf_sk_lookup_tcp(skb, &tuple, sizeof(tuple), 0, 0); + if (sk) { + bpf_sk_release(sk); + family = sk->family; + } + return family; +} + +SEC("fail_modify_sk_pointer") +int bpf_sk_lookup_modptr(struct __sk_buff *skb) +{ + struct bpf_sock_tuple tuple = {}; + struct bpf_sock *sk; + __u32 family; + + sk = bpf_sk_lookup_tcp(skb, &tuple, sizeof(tuple), 0, 0); + if (sk) { + sk += 1; + bpf_sk_release(sk); + } + return 0; +} + +SEC("fail_modify_sk_or_null_pointer") +int bpf_sk_lookup_modptr_or_null(struct __sk_buff *skb) +{ + struct bpf_sock_tuple tuple = {}; + struct bpf_sock *sk; + __u32 family; + + sk = bpf_sk_lookup_tcp(skb, &tuple, sizeof(tuple), 0, 0); + sk += 1; + if (sk) + bpf_sk_release(sk); + return 0; +} + +SEC("fail_no_release") +int bpf_sk_lookup_test2(struct __sk_buff *skb) +{ + struct bpf_sock_tuple tuple = {}; + + bpf_sk_lookup_tcp(skb, &tuple, sizeof(tuple), 0, 0); + return 0; +} + +SEC("fail_release_twice") +int bpf_sk_lookup_test3(struct __sk_buff *skb) +{ + struct bpf_sock_tuple tuple = {}; + struct bpf_sock *sk; + + sk = bpf_sk_lookup_tcp(skb, &tuple, sizeof(tuple), 0, 0); + bpf_sk_release(sk); + bpf_sk_release(sk); + return 0; +} + +SEC("fail_release_unchecked") +int bpf_sk_lookup_test4(struct __sk_buff *skb) +{ + struct bpf_sock_tuple tuple = {}; + struct bpf_sock *sk; + + sk = bpf_sk_lookup_tcp(skb, &tuple, sizeof(tuple), 0, 0); + bpf_sk_release(sk); + return 0; +} + +void lookup_no_release(struct __sk_buff *skb) +{ + struct bpf_sock_tuple tuple = {}; + bpf_sk_lookup_tcp(skb, &tuple, sizeof(tuple), 0, 0); +} + +SEC("fail_no_release_subcall") +int bpf_sk_lookup_test5(struct __sk_buff *skb) +{ + lookup_no_release(skb); + return 0; +} diff --git a/tools/testing/selftests/bpf/test_socket_cookie.c b/tools/testing/selftests/bpf/test_socket_cookie.c index 68e108e4687a..b6c2c605d8c0 100644 --- a/tools/testing/selftests/bpf/test_socket_cookie.c +++ b/tools/testing/selftests/bpf/test_socket_cookie.c @@ -158,11 +158,7 @@ static int run_test(int cgfd) bpf_object__for_each_program(prog, pobj) { prog_name = bpf_program__title(prog, /*needs_copy*/ false); - if (strcmp(prog_name, "cgroup/connect6") == 0) { - attach_type = BPF_CGROUP_INET6_CONNECT; - } else if (strcmp(prog_name, "sockops") == 0) { - attach_type = BPF_CGROUP_SOCK_OPS; - } else { + if (libbpf_attach_type_by_name(prog_name, &attach_type)) { log_err("Unexpected prog: %s", prog_name); goto err; } diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c index 67c412d19c09..bc9cd8537467 100644 --- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -3,6 +3,7 @@ * * Copyright (c) 2014 PLUMgrid, http://plumgrid.com * Copyright (c) 2017 Facebook + * Copyright (c) 2018 Covalent IO, Inc. http://covalent.io * * This program is free software; you can redistribute it and/or * modify it under the terms of version 2 of the GNU General Public @@ -68,6 +69,7 @@ struct bpf_test { int fixup_prog2[MAX_FIXUPS]; int fixup_map_in_map[MAX_FIXUPS]; int fixup_cgroup_storage[MAX_FIXUPS]; + int fixup_percpu_cgroup_storage[MAX_FIXUPS]; const char *errstr; const char *errstr_unpriv; uint32_t retval; @@ -177,6 +179,24 @@ static void bpf_fill_rand_ld_dw(struct bpf_test *self) self->retval = (uint32_t)res; } +/* BPF_SK_LOOKUP contains 13 instructions, if you need to fix up maps */ +#define BPF_SK_LOOKUP \ + /* struct bpf_sock_tuple tuple = {} */ \ + BPF_MOV64_IMM(BPF_REG_2, 0), \ + BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8), \ + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -16), \ + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -24), \ + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -32), \ + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -40), \ + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -48), \ + /* sk = sk_lookup_tcp(ctx, &tuple, sizeof tuple, 0, 0) */ \ + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), \ + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -48), \ + BPF_MOV64_IMM(BPF_REG_3, sizeof(struct bpf_sock_tuple)), \ + BPF_MOV64_IMM(BPF_REG_4, 0), \ + BPF_MOV64_IMM(BPF_REG_5, 0), \ + BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp) + static struct bpf_test tests[] = { { "add+sub+mul", @@ -2707,6 +2727,137 @@ static struct bpf_test tests[] = { .prog_type = BPF_PROG_TYPE_SCHED_CLS, }, { + "unpriv: spill/fill of different pointers stx - ctx and sock", + .insns = { + BPF_MOV64_REG(BPF_REG_8, BPF_REG_1), + /* struct bpf_sock *sock = bpf_sock_lookup(...); */ + BPF_SK_LOOKUP, + BPF_MOV64_REG(BPF_REG_2, BPF_REG_0), + /* u64 foo; */ + /* void *target = &foo; */ + BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -8), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_8), + /* if (skb == NULL) *target = sock; */ + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 1), + BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_2, 0), + /* else *target = skb; */ + BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1), + BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, 0), + /* struct __sk_buff *skb = *target; */ + BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_6, 0), + /* skb->mark = 42; */ + BPF_MOV64_IMM(BPF_REG_3, 42), + BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, + offsetof(struct __sk_buff, mark)), + /* if (sk) bpf_sk_release(sk) */ + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 1), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = REJECT, + .errstr = "type=ctx expected=sock", + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + }, + { + "unpriv: spill/fill of different pointers stx - leak sock", + .insns = { + BPF_MOV64_REG(BPF_REG_8, BPF_REG_1), + /* struct bpf_sock *sock = bpf_sock_lookup(...); */ + BPF_SK_LOOKUP, + BPF_MOV64_REG(BPF_REG_2, BPF_REG_0), + /* u64 foo; */ + /* void *target = &foo; */ + BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -8), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_8), + /* if (skb == NULL) *target = sock; */ + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 1), + BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_2, 0), + /* else *target = skb; */ + BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1), + BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, 0), + /* struct __sk_buff *skb = *target; */ + BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_6, 0), + /* skb->mark = 42; */ + BPF_MOV64_IMM(BPF_REG_3, 42), + BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, + offsetof(struct __sk_buff, mark)), + BPF_EXIT_INSN(), + }, + .result = REJECT, + //.errstr = "same insn cannot be used with different pointers", + .errstr = "Unreleased reference", + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + }, + { + "unpriv: spill/fill of different pointers stx - sock and ctx (read)", + .insns = { + BPF_MOV64_REG(BPF_REG_8, BPF_REG_1), + /* struct bpf_sock *sock = bpf_sock_lookup(...); */ + BPF_SK_LOOKUP, + BPF_MOV64_REG(BPF_REG_2, BPF_REG_0), + /* u64 foo; */ + /* void *target = &foo; */ + BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -8), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_8), + /* if (skb) *target = skb */ + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 1), + BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, 0), + /* else *target = sock */ + BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1), + BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_2, 0), + /* struct bpf_sock *sk = *target; */ + BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_6, 0), + /* if (sk) u32 foo = sk->mark; bpf_sk_release(sk); */ + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 2), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, + offsetof(struct bpf_sock, mark)), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = REJECT, + .errstr = "same insn cannot be used with different pointers", + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + }, + { + "unpriv: spill/fill of different pointers stx - sock and ctx (write)", + .insns = { + BPF_MOV64_REG(BPF_REG_8, BPF_REG_1), + /* struct bpf_sock *sock = bpf_sock_lookup(...); */ + BPF_SK_LOOKUP, + BPF_MOV64_REG(BPF_REG_2, BPF_REG_0), + /* u64 foo; */ + /* void *target = &foo; */ + BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -8), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_8), + /* if (skb) *target = skb */ + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 1), + BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, 0), + /* else *target = sock */ + BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1), + BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_2, 0), + /* struct bpf_sock *sk = *target; */ + BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_6, 0), + /* if (sk) sk->mark = 42; bpf_sk_release(sk); */ + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 3), + BPF_MOV64_IMM(BPF_REG_3, 42), + BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, + offsetof(struct bpf_sock, mark)), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = REJECT, + //.errstr = "same insn cannot be used with different pointers", + .errstr = "cannot write into socket", + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + }, + { "unpriv: spill/fill of different pointers ldx", .insns = { BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_10), @@ -3275,7 +3426,7 @@ static struct bpf_test tests[] = { BPF_ST_MEM(BPF_DW, BPF_REG_1, offsetof(struct __sk_buff, mark), 0), BPF_EXIT_INSN(), }, - .errstr = "BPF_ST stores into R1 context is not allowed", + .errstr = "BPF_ST stores into R1 inv is not allowed", .result = REJECT, .prog_type = BPF_PROG_TYPE_SCHED_CLS, }, @@ -3287,7 +3438,7 @@ static struct bpf_test tests[] = { BPF_REG_0, offsetof(struct __sk_buff, mark), 0), BPF_EXIT_INSN(), }, - .errstr = "BPF_XADD stores into R1 context is not allowed", + .errstr = "BPF_XADD stores into R1 inv is not allowed", .result = REJECT, .prog_type = BPF_PROG_TYPE_SCHED_CLS, }, @@ -3637,7 +3788,7 @@ static struct bpf_test tests[] = { BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, - .errstr = "R3 pointer arithmetic on PTR_TO_PACKET_END", + .errstr = "R3 pointer arithmetic on pkt_end", .result = REJECT, .prog_type = BPF_PROG_TYPE_SCHED_CLS, }, @@ -4676,7 +4827,7 @@ static struct bpf_test tests[] = { .prog_type = BPF_PROG_TYPE_CGROUP_SKB, }, { - "invalid per-cgroup storage access 3", + "invalid cgroup storage access 3", .insns = { BPF_MOV64_IMM(BPF_REG_2, 0), BPF_LD_MAP_FD(BPF_REG_1, 0), @@ -4744,6 +4895,121 @@ static struct bpf_test tests[] = { .prog_type = BPF_PROG_TYPE_CGROUP_SKB, }, { + "valid per-cpu cgroup storage access", + .insns = { + BPF_MOV64_IMM(BPF_REG_2, 0), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_get_local_storage), + BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_0, 0), + BPF_MOV64_REG(BPF_REG_0, BPF_REG_1), + BPF_ALU64_IMM(BPF_AND, BPF_REG_0, 1), + BPF_EXIT_INSN(), + }, + .fixup_percpu_cgroup_storage = { 1 }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_CGROUP_SKB, + }, + { + "invalid per-cpu cgroup storage access 1", + .insns = { + BPF_MOV64_IMM(BPF_REG_2, 0), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_get_local_storage), + BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_0, 0), + BPF_MOV64_REG(BPF_REG_0, BPF_REG_1), + BPF_ALU64_IMM(BPF_AND, BPF_REG_0, 1), + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 1 }, + .result = REJECT, + .errstr = "cannot pass map_type 1 into func bpf_get_local_storage", + .prog_type = BPF_PROG_TYPE_CGROUP_SKB, + }, + { + "invalid per-cpu cgroup storage access 2", + .insns = { + BPF_MOV64_IMM(BPF_REG_2, 0), + BPF_LD_MAP_FD(BPF_REG_1, 1), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_get_local_storage), + BPF_ALU64_IMM(BPF_AND, BPF_REG_0, 1), + BPF_EXIT_INSN(), + }, + .result = REJECT, + .errstr = "fd 1 is not pointing to valid bpf_map", + .prog_type = BPF_PROG_TYPE_CGROUP_SKB, + }, + { + "invalid per-cpu cgroup storage access 3", + .insns = { + BPF_MOV64_IMM(BPF_REG_2, 0), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_get_local_storage), + BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_0, 256), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 1), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .fixup_percpu_cgroup_storage = { 1 }, + .result = REJECT, + .errstr = "invalid access to map value, value_size=64 off=256 size=4", + .prog_type = BPF_PROG_TYPE_CGROUP_SKB, + }, + { + "invalid per-cpu cgroup storage access 4", + .insns = { + BPF_MOV64_IMM(BPF_REG_2, 0), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_get_local_storage), + BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_0, -2), + BPF_MOV64_REG(BPF_REG_0, BPF_REG_1), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 1), + BPF_EXIT_INSN(), + }, + .fixup_cgroup_storage = { 1 }, + .result = REJECT, + .errstr = "invalid access to map value, value_size=64 off=-2 size=4", + .prog_type = BPF_PROG_TYPE_CGROUP_SKB, + }, + { + "invalid per-cpu cgroup storage access 5", + .insns = { + BPF_MOV64_IMM(BPF_REG_2, 7), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_get_local_storage), + BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_0, 0), + BPF_MOV64_REG(BPF_REG_0, BPF_REG_1), + BPF_ALU64_IMM(BPF_AND, BPF_REG_0, 1), + BPF_EXIT_INSN(), + }, + .fixup_percpu_cgroup_storage = { 1 }, + .result = REJECT, + .errstr = "get_local_storage() doesn't support non-zero flags", + .prog_type = BPF_PROG_TYPE_CGROUP_SKB, + }, + { + "invalid per-cpu cgroup storage access 6", + .insns = { + BPF_MOV64_REG(BPF_REG_2, BPF_REG_1), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_get_local_storage), + BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_0, 0), + BPF_MOV64_REG(BPF_REG_0, BPF_REG_1), + BPF_ALU64_IMM(BPF_AND, BPF_REG_0, 1), + BPF_EXIT_INSN(), + }, + .fixup_percpu_cgroup_storage = { 1 }, + .result = REJECT, + .errstr = "get_local_storage() doesn't support non-zero flags", + .prog_type = BPF_PROG_TYPE_CGROUP_SKB, + }, + { "multiple registers share map_lookup_elem result", .insns = { BPF_MOV64_IMM(BPF_REG_1, 10), @@ -4780,7 +5046,7 @@ static struct bpf_test tests[] = { BPF_EXIT_INSN(), }, .fixup_map1 = { 4 }, - .errstr = "R4 pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL", + .errstr = "R4 pointer arithmetic on map_value_or_null", .result = REJECT, .prog_type = BPF_PROG_TYPE_SCHED_CLS }, @@ -4801,7 +5067,7 @@ static struct bpf_test tests[] = { BPF_EXIT_INSN(), }, .fixup_map1 = { 4 }, - .errstr = "R4 pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL", + .errstr = "R4 pointer arithmetic on map_value_or_null", .result = REJECT, .prog_type = BPF_PROG_TYPE_SCHED_CLS }, @@ -4822,7 +5088,7 @@ static struct bpf_test tests[] = { BPF_EXIT_INSN(), }, .fixup_map1 = { 4 }, - .errstr = "R4 pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL", + .errstr = "R4 pointer arithmetic on map_value_or_null", .result = REJECT, .prog_type = BPF_PROG_TYPE_SCHED_CLS }, @@ -5150,7 +5416,7 @@ static struct bpf_test tests[] = { .errstr_unpriv = "R2 leaks addr into mem", .result_unpriv = REJECT, .result = REJECT, - .errstr = "BPF_XADD stores into R1 context is not allowed", + .errstr = "BPF_XADD stores into R1 inv is not allowed", }, { "leak pointer into ctx 2", @@ -5165,7 +5431,7 @@ static struct bpf_test tests[] = { .errstr_unpriv = "R10 leaks addr into mem", .result_unpriv = REJECT, .result = REJECT, - .errstr = "BPF_XADD stores into R1 context is not allowed", + .errstr = "BPF_XADD stores into R1 inv is not allowed", }, { "leak pointer into ctx 3", @@ -7137,7 +7403,7 @@ static struct bpf_test tests[] = { BPF_EXIT_INSN(), }, .fixup_map_in_map = { 3 }, - .errstr = "R1 pointer arithmetic on CONST_PTR_TO_MAP prohibited", + .errstr = "R1 pointer arithmetic on map_ptr prohibited", .result = REJECT, }, { @@ -8811,7 +9077,7 @@ static struct bpf_test tests[] = { BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, - .errstr = "R3 pointer arithmetic on PTR_TO_PACKET_END", + .errstr = "R3 pointer arithmetic on pkt_end", .result = REJECT, .prog_type = BPF_PROG_TYPE_XDP, }, @@ -8830,7 +9096,7 @@ static struct bpf_test tests[] = { BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, - .errstr = "R3 pointer arithmetic on PTR_TO_PACKET_END", + .errstr = "R3 pointer arithmetic on pkt_end", .result = REJECT, .prog_type = BPF_PROG_TYPE_XDP, }, @@ -12114,7 +12380,7 @@ static struct bpf_test tests[] = { BPF_EXIT_INSN(), }, .result = REJECT, - .errstr = "BPF_XADD stores into R2 packet", + .errstr = "BPF_XADD stores into R2 ctx", .prog_type = BPF_PROG_TYPE_XDP, }, { @@ -12442,6 +12708,214 @@ static struct bpf_test tests[] = { .result = ACCEPT, }, { + "reference tracking: leak potential reference", + .insns = { + BPF_SK_LOOKUP, + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), /* leak reference */ + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .errstr = "Unreleased reference", + .result = REJECT, + }, + { + "reference tracking: leak potential reference on stack", + .insns = { + BPF_SK_LOOKUP, + BPF_MOV64_REG(BPF_REG_4, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8), + BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_0, 0), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .errstr = "Unreleased reference", + .result = REJECT, + }, + { + "reference tracking: leak potential reference on stack 2", + .insns = { + BPF_SK_LOOKUP, + BPF_MOV64_REG(BPF_REG_4, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8), + BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_0, 0), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_ST_MEM(BPF_DW, BPF_REG_4, 0, 0), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .errstr = "Unreleased reference", + .result = REJECT, + }, + { + "reference tracking: zero potential reference", + .insns = { + BPF_SK_LOOKUP, + BPF_MOV64_IMM(BPF_REG_0, 0), /* leak reference */ + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .errstr = "Unreleased reference", + .result = REJECT, + }, + { + "reference tracking: copy and zero potential references", + .insns = { + BPF_SK_LOOKUP, + BPF_MOV64_REG(BPF_REG_7, BPF_REG_0), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_MOV64_IMM(BPF_REG_7, 0), /* leak reference */ + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .errstr = "Unreleased reference", + .result = REJECT, + }, + { + "reference tracking: release reference without check", + .insns = { + BPF_SK_LOOKUP, + /* reference in r0 may be NULL */ + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_MOV64_IMM(BPF_REG_2, 0), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .errstr = "type=sock_or_null expected=sock", + .result = REJECT, + }, + { + "reference tracking: release reference", + .insns = { + BPF_SK_LOOKUP, + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = ACCEPT, + }, + { + "reference tracking: release reference 2", + .insns = { + BPF_SK_LOOKUP, + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), + BPF_EXIT_INSN(), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = ACCEPT, + }, + { + "reference tracking: release reference twice", + .insns = { + BPF_SK_LOOKUP, + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .errstr = "type=inv expected=sock", + .result = REJECT, + }, + { + "reference tracking: release reference twice inside branch", + .insns = { + BPF_SK_LOOKUP, + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3), /* goto end */ + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .errstr = "type=inv expected=sock", + .result = REJECT, + }, + { + "reference tracking: alloc, check, free in one subbranch", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct __sk_buff, data)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, + offsetof(struct __sk_buff, data_end)), + BPF_MOV64_REG(BPF_REG_0, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 16), + /* if (offsetof(skb, mark) > data_len) exit; */ + BPF_JMP_REG(BPF_JLE, BPF_REG_0, BPF_REG_3, 1), + BPF_EXIT_INSN(), + BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_2, + offsetof(struct __sk_buff, mark)), + BPF_SK_LOOKUP, + BPF_JMP_IMM(BPF_JEQ, BPF_REG_6, 0, 1), /* mark == 0? */ + /* Leak reference in R0 */ + BPF_EXIT_INSN(), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), /* sk NULL? */ + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .errstr = "Unreleased reference", + .result = REJECT, + }, + { + "reference tracking: alloc, check, free in both subbranches", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct __sk_buff, data)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, + offsetof(struct __sk_buff, data_end)), + BPF_MOV64_REG(BPF_REG_0, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 16), + /* if (offsetof(skb, mark) > data_len) exit; */ + BPF_JMP_REG(BPF_JLE, BPF_REG_0, BPF_REG_3, 1), + BPF_EXIT_INSN(), + BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_2, + offsetof(struct __sk_buff, mark)), + BPF_SK_LOOKUP, + BPF_JMP_IMM(BPF_JEQ, BPF_REG_6, 0, 4), /* mark == 0? */ + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), /* sk NULL? */ + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), /* sk NULL? */ + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = ACCEPT, + }, + { + "reference tracking in call: free reference in subprog", + .insns = { + BPF_SK_LOOKUP, + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), /* unchecked reference */ + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + + /* subprog 1 */ + BPF_MOV64_REG(BPF_REG_2, BPF_REG_1), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_2, 0, 1), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = ACCEPT, + }, + { "pass modified ctx pointer to helper, 1", .insns = { BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -612), @@ -12511,6 +12985,407 @@ static struct bpf_test tests[] = { .prog_type = BPF_PROG_TYPE_SCHED_CLS, .result = ACCEPT, }, + { + "reference tracking in call: free reference in subprog and outside", + .insns = { + BPF_SK_LOOKUP, + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), /* unchecked reference */ + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 3), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + + /* subprog 1 */ + BPF_MOV64_REG(BPF_REG_2, BPF_REG_1), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_2, 0, 1), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .errstr = "type=inv expected=sock", + .result = REJECT, + }, + { + "reference tracking in call: alloc & leak reference in subprog", + .insns = { + BPF_MOV64_REG(BPF_REG_4, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 3), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + + /* subprog 1 */ + BPF_MOV64_REG(BPF_REG_6, BPF_REG_4), + BPF_SK_LOOKUP, + /* spill unchecked sk_ptr into stack of caller */ + BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_0, 0), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .errstr = "Unreleased reference", + .result = REJECT, + }, + { + "reference tracking in call: alloc in subprog, release outside", + .insns = { + BPF_MOV64_REG(BPF_REG_4, BPF_REG_10), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 4), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + + /* subprog 1 */ + BPF_SK_LOOKUP, + BPF_EXIT_INSN(), /* return sk */ + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .retval = POINTER_VALUE, + .result = ACCEPT, + }, + { + "reference tracking in call: sk_ptr leak into caller stack", + .insns = { + BPF_MOV64_REG(BPF_REG_4, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + + /* subprog 1 */ + BPF_MOV64_REG(BPF_REG_5, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_5, -8), + BPF_STX_MEM(BPF_DW, BPF_REG_5, BPF_REG_4, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 5), + /* spill unchecked sk_ptr into stack of caller */ + BPF_MOV64_REG(BPF_REG_5, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_5, -8), + BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_5, 0), + BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_0, 0), + BPF_EXIT_INSN(), + + /* subprog 2 */ + BPF_SK_LOOKUP, + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .errstr = "Unreleased reference", + .result = REJECT, + }, + { + "reference tracking in call: sk_ptr spill into caller stack", + .insns = { + BPF_MOV64_REG(BPF_REG_4, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + + /* subprog 1 */ + BPF_MOV64_REG(BPF_REG_5, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_5, -8), + BPF_STX_MEM(BPF_DW, BPF_REG_5, BPF_REG_4, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 8), + /* spill unchecked sk_ptr into stack of caller */ + BPF_MOV64_REG(BPF_REG_5, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_5, -8), + BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_5, 0), + BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_0, 0), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), + /* now the sk_ptr is verified, free the reference */ + BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_4, 0), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + + /* subprog 2 */ + BPF_SK_LOOKUP, + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = ACCEPT, + }, + { + "reference tracking: allow LD_ABS", + .insns = { + BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), + BPF_SK_LOOKUP, + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_LD_ABS(BPF_B, 0), + BPF_LD_ABS(BPF_H, 0), + BPF_LD_ABS(BPF_W, 0), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = ACCEPT, + }, + { + "reference tracking: forbid LD_ABS while holding reference", + .insns = { + BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), + BPF_SK_LOOKUP, + BPF_LD_ABS(BPF_B, 0), + BPF_LD_ABS(BPF_H, 0), + BPF_LD_ABS(BPF_W, 0), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .errstr = "BPF_LD_[ABS|IND] cannot be mixed with socket references", + .result = REJECT, + }, + { + "reference tracking: allow LD_IND", + .insns = { + BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), + BPF_SK_LOOKUP, + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_MOV64_IMM(BPF_REG_7, 1), + BPF_LD_IND(BPF_W, BPF_REG_7, -0x200000), + BPF_MOV64_REG(BPF_REG_0, BPF_REG_7), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = ACCEPT, + .retval = 1, + }, + { + "reference tracking: forbid LD_IND while holding reference", + .insns = { + BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), + BPF_SK_LOOKUP, + BPF_MOV64_REG(BPF_REG_4, BPF_REG_0), + BPF_MOV64_IMM(BPF_REG_7, 1), + BPF_LD_IND(BPF_W, BPF_REG_7, -0x200000), + BPF_MOV64_REG(BPF_REG_0, BPF_REG_7), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_4), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 1), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .errstr = "BPF_LD_[ABS|IND] cannot be mixed with socket references", + .result = REJECT, + }, + { + "reference tracking: check reference or tail call", + .insns = { + BPF_MOV64_REG(BPF_REG_7, BPF_REG_1), + BPF_SK_LOOKUP, + /* if (sk) bpf_sk_release() */ + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 7), + /* bpf_tail_call() */ + BPF_MOV64_IMM(BPF_REG_3, 2), + BPF_LD_MAP_FD(BPF_REG_2, 0), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_7), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_tail_call), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + }, + .fixup_prog1 = { 17 }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = ACCEPT, + }, + { + "reference tracking: release reference then tail call", + .insns = { + BPF_MOV64_REG(BPF_REG_7, BPF_REG_1), + BPF_SK_LOOKUP, + /* if (sk) bpf_sk_release() */ + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 1), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + /* bpf_tail_call() */ + BPF_MOV64_IMM(BPF_REG_3, 2), + BPF_LD_MAP_FD(BPF_REG_2, 0), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_7), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_tail_call), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .fixup_prog1 = { 18 }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = ACCEPT, + }, + { + "reference tracking: leak possible reference over tail call", + .insns = { + BPF_MOV64_REG(BPF_REG_7, BPF_REG_1), + /* Look up socket and store in REG_6 */ + BPF_SK_LOOKUP, + /* bpf_tail_call() */ + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_MOV64_IMM(BPF_REG_3, 2), + BPF_LD_MAP_FD(BPF_REG_2, 0), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_7), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_tail_call), + BPF_MOV64_IMM(BPF_REG_0, 0), + /* if (sk) bpf_sk_release() */ + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 1), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + }, + .fixup_prog1 = { 16 }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .errstr = "tail_call would lead to reference leak", + .result = REJECT, + }, + { + "reference tracking: leak checked reference over tail call", + .insns = { + BPF_MOV64_REG(BPF_REG_7, BPF_REG_1), + /* Look up socket and store in REG_6 */ + BPF_SK_LOOKUP, + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + /* if (!sk) goto end */ + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 7), + /* bpf_tail_call() */ + BPF_MOV64_IMM(BPF_REG_3, 0), + BPF_LD_MAP_FD(BPF_REG_2, 0), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_7), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_tail_call), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + }, + .fixup_prog1 = { 17 }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .errstr = "tail_call would lead to reference leak", + .result = REJECT, + }, + { + "reference tracking: mangle and release sock_or_null", + .insns = { + BPF_SK_LOOKUP, + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 5), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .errstr = "R1 pointer arithmetic on sock_or_null prohibited", + .result = REJECT, + }, + { + "reference tracking: mangle and release sock", + .insns = { + BPF_SK_LOOKUP, + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 5), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .errstr = "R1 pointer arithmetic on sock prohibited", + .result = REJECT, + }, + { + "reference tracking: access member", + .insns = { + BPF_SK_LOOKUP, + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3), + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_0, 4), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = ACCEPT, + }, + { + "reference tracking: write to member", + .insns = { + BPF_SK_LOOKUP, + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 5), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), + BPF_LD_IMM64(BPF_REG_2, 42), + BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_2, + offsetof(struct bpf_sock, mark)), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_LD_IMM64(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .errstr = "cannot write into socket", + .result = REJECT, + }, + { + "reference tracking: invalid 64-bit access of member", + .insns = { + BPF_SK_LOOKUP, + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3), + BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_0, 0), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .errstr = "invalid bpf_sock access off=0 size=8", + .result = REJECT, + }, + { + "reference tracking: access after release", + .insns = { + BPF_SK_LOOKUP, + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .errstr = "!read_ok", + .result = REJECT, + }, + { + "reference tracking: direct access for lookup", + .insns = { + /* Check that the packet is at least 64B long */ + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct __sk_buff, data)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, + offsetof(struct __sk_buff, data_end)), + BPF_MOV64_REG(BPF_REG_0, BPF_REG_2), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 64), + BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 9), + /* sk = sk_lookup_tcp(ctx, skb->data, ...) */ + BPF_MOV64_IMM(BPF_REG_3, sizeof(struct bpf_sock_tuple)), + BPF_MOV64_IMM(BPF_REG_4, 0), + BPF_MOV64_IMM(BPF_REG_5, 0), + BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3), + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_0, 4), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), + BPF_EMIT_CALL(BPF_FUNC_sk_release), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = ACCEPT, + }, }; static int probe_filter_length(const struct bpf_insn *fp) @@ -12536,18 +13411,18 @@ static int create_map(uint32_t type, uint32_t size_key, return fd; } -static int create_prog_dummy1(void) +static int create_prog_dummy1(enum bpf_map_type prog_type) { struct bpf_insn prog[] = { BPF_MOV64_IMM(BPF_REG_0, 42), BPF_EXIT_INSN(), }; - return bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER, prog, + return bpf_load_program(prog_type, prog, ARRAY_SIZE(prog), "GPL", 0, NULL, 0); } -static int create_prog_dummy2(int mfd, int idx) +static int create_prog_dummy2(enum bpf_map_type prog_type, int mfd, int idx) { struct bpf_insn prog[] = { BPF_MOV64_IMM(BPF_REG_3, idx), @@ -12558,11 +13433,12 @@ static int create_prog_dummy2(int mfd, int idx) BPF_EXIT_INSN(), }; - return bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER, prog, + return bpf_load_program(prog_type, prog, ARRAY_SIZE(prog), "GPL", 0, NULL, 0); } -static int create_prog_array(uint32_t max_elem, int p1key) +static int create_prog_array(enum bpf_map_type prog_type, uint32_t max_elem, + int p1key) { int p2key = 1; int mfd, p1fd, p2fd; @@ -12574,8 +13450,8 @@ static int create_prog_array(uint32_t max_elem, int p1key) return -1; } - p1fd = create_prog_dummy1(); - p2fd = create_prog_dummy2(mfd, p2key); + p1fd = create_prog_dummy1(prog_type); + p2fd = create_prog_dummy2(prog_type, mfd, p2key); if (p1fd < 0 || p2fd < 0) goto out; if (bpf_map_update_elem(mfd, &p1key, &p1fd, BPF_ANY) < 0) @@ -12615,23 +13491,25 @@ static int create_map_in_map(void) return outer_map_fd; } -static int create_cgroup_storage(void) +static int create_cgroup_storage(bool percpu) { + enum bpf_map_type type = percpu ? BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE : + BPF_MAP_TYPE_CGROUP_STORAGE; int fd; - fd = bpf_create_map(BPF_MAP_TYPE_CGROUP_STORAGE, - sizeof(struct bpf_cgroup_storage_key), + fd = bpf_create_map(type, sizeof(struct bpf_cgroup_storage_key), TEST_DATA_LEN, 0, 0); if (fd < 0) - printf("Failed to create array '%s'!\n", strerror(errno)); + printf("Failed to create cgroup storage '%s'!\n", + strerror(errno)); return fd; } static char bpf_vlog[UINT_MAX >> 8]; -static void do_test_fixup(struct bpf_test *test, struct bpf_insn *prog, - int *map_fds) +static void do_test_fixup(struct bpf_test *test, enum bpf_map_type prog_type, + struct bpf_insn *prog, int *map_fds) { int *fixup_map1 = test->fixup_map1; int *fixup_map2 = test->fixup_map2; @@ -12641,6 +13519,7 @@ static void do_test_fixup(struct bpf_test *test, struct bpf_insn *prog, int *fixup_prog2 = test->fixup_prog2; int *fixup_map_in_map = test->fixup_map_in_map; int *fixup_cgroup_storage = test->fixup_cgroup_storage; + int *fixup_percpu_cgroup_storage = test->fixup_percpu_cgroup_storage; if (test->fill_helper) test->fill_helper(test); @@ -12686,7 +13565,7 @@ static void do_test_fixup(struct bpf_test *test, struct bpf_insn *prog, } if (*fixup_prog1) { - map_fds[4] = create_prog_array(4, 0); + map_fds[4] = create_prog_array(prog_type, 4, 0); do { prog[*fixup_prog1].imm = map_fds[4]; fixup_prog1++; @@ -12694,7 +13573,7 @@ static void do_test_fixup(struct bpf_test *test, struct bpf_insn *prog, } if (*fixup_prog2) { - map_fds[5] = create_prog_array(8, 7); + map_fds[5] = create_prog_array(prog_type, 8, 7); do { prog[*fixup_prog2].imm = map_fds[5]; fixup_prog2++; @@ -12710,12 +13589,20 @@ static void do_test_fixup(struct bpf_test *test, struct bpf_insn *prog, } if (*fixup_cgroup_storage) { - map_fds[7] = create_cgroup_storage(); + map_fds[7] = create_cgroup_storage(false); do { prog[*fixup_cgroup_storage].imm = map_fds[7]; fixup_cgroup_storage++; } while (*fixup_cgroup_storage); } + + if (*fixup_percpu_cgroup_storage) { + map_fds[8] = create_cgroup_storage(true); + do { + prog[*fixup_percpu_cgroup_storage].imm = map_fds[8]; + fixup_percpu_cgroup_storage++; + } while (*fixup_percpu_cgroup_storage); + } } static void do_test_single(struct bpf_test *test, bool unpriv, @@ -12732,11 +13619,13 @@ static void do_test_single(struct bpf_test *test, bool unpriv, for (i = 0; i < MAX_NR_MAPS; i++) map_fds[i] = -1; - do_test_fixup(test, prog, map_fds); + if (!prog_type) + prog_type = BPF_PROG_TYPE_SOCKET_FILTER; + do_test_fixup(test, prog_type, prog, map_fds); prog_len = probe_filter_length(prog); - fd_prog = bpf_verify_program(prog_type ? : BPF_PROG_TYPE_SOCKET_FILTER, - prog, prog_len, test->flags & F_LOAD_WITH_STRICT_ALIGNMENT, + fd_prog = bpf_verify_program(prog_type, prog, prog_len, + test->flags & F_LOAD_WITH_STRICT_ALIGNMENT, "GPL", 0, bpf_vlog, sizeof(bpf_vlog), 1); expected_ret = unpriv && test->result_unpriv != UNDEF ? |