summaryrefslogtreecommitdiffstats
path: root/net/ipv4/tcp_minisocks.c
diff options
context:
space:
mode:
authorKuniyuki Iwashima <kuniyu@amazon.co.jp>2021-06-12 21:32:20 +0900
committerDaniel Borkmann <daniel@iogearbox.net>2021-06-15 18:01:06 +0200
commitd4f2c86b2b7e2e606e0868b38c8c6c49cc193a8e (patch)
tree8d45fc1c5bf3ba6c1726e3212dc54ed8e488abaa /net/ipv4/tcp_minisocks.c
parentc905dee62232db583b50fe214080b98db623151e (diff)
downloadlinux-d4f2c86b2b7e2e606e0868b38c8c6c49cc193a8e.tar.gz
linux-d4f2c86b2b7e2e606e0868b38c8c6c49cc193a8e.tar.bz2
linux-d4f2c86b2b7e2e606e0868b38c8c6c49cc193a8e.zip
tcp: Migrate TCP_NEW_SYN_RECV requests at receiving the final ACK.
This patch also changes the code to call reuseport_migrate_sock() and inet_reqsk_clone(), but unlike the other cases, we do not call inet_reqsk_clone() right after reuseport_migrate_sock(). Currently, in the receive path for TCP_NEW_SYN_RECV sockets, its listener has three kinds of refcnt: (A) for listener itself (B) carried by reuqest_sock (C) sock_hold() in tcp_v[46]_rcv() While processing the req, (A) may disappear by close(listener). Also, (B) can disappear by accept(listener) once we put the req into the accept queue. So, we have to hold another refcnt (C) for the listener to prevent use-after-free. For socket migration, we call reuseport_migrate_sock() to select a listener with (A) and to increment the new listener's refcnt in tcp_v[46]_rcv(). This refcnt corresponds to (C) and is cleaned up later in tcp_v[46]_rcv(). Thus we have to take another refcnt (B) for the newly cloned request_sock. In inet_csk_complete_hashdance(), we hold the count (B), clone the req, and try to put the new req into the accept queue. By migrating req after winning the "own_req" race, we can avoid such a worst situation: CPU 1 looks up req1 CPU 2 looks up req1, unhashes it, then CPU 1 loses the race CPU 3 looks up req2, unhashes it, then CPU 2 loses the race ... Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210612123224.12525-8-kuniyu@amazon.co.jp
Diffstat (limited to 'net/ipv4/tcp_minisocks.c')
-rw-r--r--net/ipv4/tcp_minisocks.c4
1 files changed, 2 insertions, 2 deletions
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 7513ba45553d..f258a4c0da71 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -775,8 +775,8 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
goto listen_overflow;
if (own_req && rsk_drop_req(req)) {
- reqsk_queue_removed(&inet_csk(sk)->icsk_accept_queue, req);
- inet_csk_reqsk_queue_drop_and_put(sk, req);
+ reqsk_queue_removed(&inet_csk(req->rsk_listener)->icsk_accept_queue, req);
+ inet_csk_reqsk_queue_drop_and_put(req->rsk_listener, req);
return child;
}