tcp: when scheduling TLP, time of RTO should account for current ACK

[ Upstream commit ed66dfaf236c04d414de1d218441296e57fb2bd2 ] Fix the TLP scheduling logic so that when scheduling a TLP probe, we ensure that the estimated time at which an RTO would fire accounts for the fact that ACKs indicating forward progress should push back RTO times. After the following fix: df92c8394e6e ("tcp: fix xmit timer to only be reset if data ACKed/SACKed") we had an unintentional behavior change in the following kind of scenario: suppose the RTT variance has been very low recently. Then suppose we send out a flight of N packets and our RTT is 100ms: t=0: send a flight of N packets t=100ms: receive an ACK for N-1 packets The response before df92c8394e6e that was: -> schedule a TLP for now + RTO_interval The response after df92c8394e6e is: -> schedule a TLP for t=0 + RTO_interval Since RTO_interval = srtt + RTT_variance, this means that we have scheduled a TLP timer at a point in the future that only accounts for RTT_variance. If the RTT_variance term is small, this means that the timer fires soon. Before df92c8394e6e this would not happen, because in that code, when we receive an ACK for a prefix of flight, we did: 1) Near the top of tcp_ack(), switch from TLP timer to RTO at write_queue_head->paket_tx_time + RTO_interval: if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) tcp_rearm_rto(sk); 2) In tcp_clean_rtx_queue(), update the RTO to now + RTO_interval: if (flag & FLAG_ACKED) { tcp_rearm_rto(sk); 3) In tcp_ack() after tcp_fastretrans_alert() switch from RTO to TLP at now + RTO_interval: if (icsk->icsk_pending == ICSK_TIME_RETRANS) tcp_schedule_loss_probe(sk); In df92c8394e6e we removed that 3-phase dance, and instead directly set the TLP timer once: we set the TLP timer in cases like this to write_queue_head->packet_tx_time + RTO_interval. So if the RTT variance is small, then this means that this is setting the TLP timer to fire quite soon. This means if the ACK for the tail of the flight takes longer than an RTT to arrive (often due to delayed ACKs), then the TLP timer fires too quickly. Fixes: df92c8394e6e ("tcp: fix xmit timer to only be reset if data ACKed/SACKed") Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
author: Neal Cardwell <ncardwell@google.com> 2017-11-17 21:06:14 -0500
committer: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 2017-12-17 15:07:58 +0100
commit: 241eb29c019a0b4e2a3ff5ca4b4449374aeb5f87 (patch)
tree: 6dedd41de9638628111d09adc4e84a02cac444b4 /net/ipv4/tcp_input.c
parent: 616bada6fd46b77351a4033392035de7875c48b6 (diff)
download: linux-stable-241eb29c019a0b4e2a3ff5ca4b4449374aeb5f87.tar.gz
linux-stable-241eb29c019a0b4e2a3ff5ca4b4449374aeb5f87.tar.bz2
linux-stable-241eb29c019a0b4e2a3ff5ca4b4449374aeb5f87.zip
1 files changed, 1 insertions, 1 deletions
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 4b10e79210aa..c5447b9f8517 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3021,7 +3021,7 @@ void tcp_rearm_rto(struct sock *sk)
 /* Try to schedule a loss probe; if that doesn't work, then schedule an RTO. */
 static void tcp_set_xmit_timer(struct sock *sk)
 {
-	if (!tcp_schedule_loss_probe(sk))
+	if (!tcp_schedule_loss_probe(sk, true))
 		tcp_rearm_rto(sk);
 }
author	Neal Cardwell <ncardwell@google.com>	2017-11-17 21:06:14 -0500
committer	Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-17 15:07:58 +0100
commit	241eb29c019a0b4e2a3ff5ca4b4449374aeb5f87 (patch)
tree	6dedd41de9638628111d09adc4e84a02cac444b4 /net/ipv4/tcp_input.c
parent	616bada6fd46b77351a4033392035de7875c48b6 (diff)
download	linux-stable-241eb29c019a0b4e2a3ff5ca4b4449374aeb5f87.tar.gz linux-stable-241eb29c019a0b4e2a3ff5ca4b4449374aeb5f87.tar.bz2 linux-stable-241eb29c019a0b4e2a3ff5ca4b4449374aeb5f87.zip