tcp: fix incorrect undo caused by DSACK of TLP retransmit

Loss recovery undo_retrans bookkeeping had a long-standing bug where a DSACK from a spurious TLP retransmit packet could cause an erroneous undo of a fast recovery or RTO recovery that repaired a single really-lost packet (in a sequence range outside that of the TLP retransmit). Basically, because the loss recovery state machine didn't account for the fact that it sent a TLP retransmit, the DSACK for the TLP retransmit could erroneously be implicitly be interpreted as corresponding to the normal fast recovery or RTO recovery retransmit that plugged a real hole, thus resulting in an improper undo. For example, consider the following buggy scenario where there is a real packet loss but the congestion control response is improperly undone because of this bug: + send packets P1, P2, P3, P4 + P1 is really lost + send TLP retransmit of P4 + receive SACK for original P2, P3, P4 + enter fast recovery, fast-retransmit P1, increment undo_retrans to 1 + receive DSACK for TLP P4, decrement undo_retrans to 0, undo (bug!) + receive cumulative ACK for P1-P4 (fast retransmit plugged real hole) The fix: when we initialize undo machinery in tcp_init_undo(), if there is a TLP retransmit in flight, then increment tp->undo_retrans so that we make sure that we receive a DSACK corresponding to the TLP retransmit, as well as DSACKs for all later normal retransmits, before triggering a loss recovery undo. Note that we also have to move the line that clears tp->tlp_high_seq for RTO recovery, so that upon RTO we remember the tp->tlp_high_seq value until tcp_init_undo() and clear it only afterward. Also note that the bug dates back to the original 2013 TLP implementation, commit 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)"). However, this patch will only compile and work correctly with kernels that have tp->tlp_retrans, which was added only in v5.8 in 2020 in commit 76be93fc0702 ("tcp: allow at most one TLP probe per flight"). So we associate this fix with that later commit. Fixes: 76be93fc0702 ("tcp: allow at most one TLP probe per flight") Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Kevin Yang <yyd@google.com> Link: https://patch.msgid.link/20240703171246.1739561-1-ncardwell.sw@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
author: Neal Cardwell <ncardwell@google.com> 2024-07-03 13:12:46 -0400
committer: Jakub Kicinski <kuba@kernel.org> 2024-07-05 18:03:44 -0700
commit: 0ec986ed7bab6801faed1440e8839dcc710331ff (patch)
tree: 882f3c5efcfcc613884ae08c934492fa296c0b22 /net/ipv4/tcp_timer.c
parent: 842c361b24294d7c7be2bd65323f8aef8877d7cb (diff)
download: linux-0ec986ed7bab6801faed1440e8839dcc710331ff.tar.gz
linux-0ec986ed7bab6801faed1440e8839dcc710331ff.tar.bz2
linux-0ec986ed7bab6801faed1440e8839dcc710331ff.zip
1 files changed, 0 insertions, 2 deletions
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 5bfd76a31af6..db9d826560e5 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -536,8 +536,6 @@ void tcp_retransmit_timer(struct sock *sk)
 	if (WARN_ON_ONCE(!skb))
 		return;
 
-	tp->tlp_high_seq = 0;
-
 	if (!tp->snd_wnd && !sock_flag(sk, SOCK_DEAD) &&
 	    !((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV))) {
 		/* Receiver dastardly shrinks window. Our retransmits
author	Neal Cardwell <ncardwell@google.com>	2024-07-03 13:12:46 -0400
committer	Jakub Kicinski <kuba@kernel.org>	2024-07-05 18:03:44 -0700
commit	0ec986ed7bab6801faed1440e8839dcc710331ff (patch)
tree	882f3c5efcfcc613884ae08c934492fa296c0b22 /net/ipv4/tcp_timer.c
parent	842c361b24294d7c7be2bd65323f8aef8877d7cb (diff)
download	linux-0ec986ed7bab6801faed1440e8839dcc710331ff.tar.gz linux-0ec986ed7bab6801faed1440e8839dcc710331ff.tar.bz2 linux-0ec986ed7bab6801faed1440e8839dcc710331ff.zip