summaryrefslogtreecommitdiffstats
path: root/net/ipv4
diff options
context:
space:
mode:
authorEric Dumazet <edumazet@google.com>2023-08-17 18:23:53 +0000
committerJakub Kicinski <kuba@kernel.org>2023-08-18 19:29:36 -0700
commit726e9e8b94b92d69004af17a68f9f37ffc0358b9 (patch)
tree42016eb3bb7ca8be1126b7aa324b7f6a169982ba /net/ipv4
parentfc720399ffd9e3cc556dc48773f3cde1d28fc20d (diff)
downloadlinux-726e9e8b94b92d69004af17a68f9f37ffc0358b9.tar.gz
linux-726e9e8b94b92d69004af17a68f9f37ffc0358b9.tar.bz2
linux-726e9e8b94b92d69004af17a68f9f37ffc0358b9.zip
tcp: refine skb->ooo_okay setting
Enabling BIG TCP on a low end platform apparently increased chances of getting flows locked on one busy TX queue. A similar problem was handled in commit 9b462d02d6dd ("tcp: TCP Small Queues and strange attractors"), but the strategy worked for either bulk flows, or 'large enough' RPC. BIG TCP changed how large RPC needed to be to enable the work around: If RPC fits in a single skb, TSQ never triggers. Root cause for the problem is a busy TX queue, with delayed TX completions. This patch changes how we set skb->ooo_okay to detect the case TX completion was not done, but incoming ACK already was processed and emptied rtx queue. Update the comment to explain the tricky details. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230817182353.2523746-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Diffstat (limited to 'net/ipv4')
-rw-r--r--net/ipv4/tcp_output.c21
1 files changed, 14 insertions, 7 deletions
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 769a558159ee..e6b4fbd642f7 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1301,14 +1301,21 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
}
tcp_header_size = tcp_options_size + sizeof(struct tcphdr);
- /* if no packet is in qdisc/device queue, then allow XPS to select
- * another queue. We can be called from tcp_tsq_handler()
- * which holds one reference to sk.
- *
- * TODO: Ideally, in-flight pure ACK packets should not matter here.
- * One way to get this would be to set skb->truesize = 2 on them.
+ /* We set skb->ooo_okay to one if this packet can select
+ * a different TX queue than prior packets of this flow,
+ * to avoid self inflicted reorders.
+ * The 'other' queue decision is based on current cpu number
+ * if XPS is enabled, or sk->sk_txhash otherwise.
+ * We can switch to another (and better) queue if:
+ * 1) No packet with payload is in qdisc/device queues.
+ * Delays in TX completion can defeat the test
+ * even if packets were already sent.
+ * 2) Or rtx queue is empty.
+ * This mitigates above case if ACK packets for
+ * all prior packets were already processed.
*/
- skb->ooo_okay = sk_wmem_alloc_get(sk) < SKB_TRUESIZE(1);
+ skb->ooo_okay = sk_wmem_alloc_get(sk) < SKB_TRUESIZE(1) ||
+ tcp_rtx_queue_empty(sk);
/* If we had to use memory reserve to allocate this skb,
* this might cause drops if packet is looped back :