tcp: free batches of packets in tcp_prune_ofo_queue()
authorEric Dumazet <edumazet@google.com>
Mon, 23 Jul 2018 16:28:17 +0000 (09:28 -0700)
committerDavid S. Miller <davem@davemloft.net>
Mon, 23 Jul 2018 19:01:36 +0000 (12:01 -0700)
Juha-Matti Tilli reported that malicious peers could inject tiny
packets in out_of_order_queue, forcing very expensive calls
to tcp_collapse_ofo_queue() and tcp_prune_ofo_queue() for
every incoming packet. out_of_order_queue rb-tree can contain
thousands of nodes, iterating over all of them is not nice.

Before linux-4.9, we would have pruned all packets in ofo_queue
in one go, every XXXX packets. XXXX depends on sk_rcvbuf and skbs
truesize, but is about 7000 packets with tcp_rmem[2] default of 6 MB.

Since we plan to increase tcp_rmem[2] in the future to cope with
modern BDP, can not revert to the old behavior, without great pain.

Strategy taken in this patch is to purge ~12.5 % of the queue capacity.

Fixes: 36a6503fedda ("tcp: refine tcp_prune_ofo_queue() to not drop all packets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Juha-Matti Tilli <juha-matti.tilli@iki.fi>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/ipv4/tcp_input.c

index 6bade06..64e45b2 100644 (file)
@@ -4942,6 +4942,7 @@ new_range:
  * 2) not add too big latencies if thousands of packets sit there.
  *    (But if application shrinks SO_RCVBUF, we could still end up
  *     freeing whole queue here)
+ * 3) Drop at least 12.5 % of sk_rcvbuf to avoid malicious attacks.
  *
  * Return true if queue has shrunk.
  */
@@ -4949,20 +4950,26 @@ static bool tcp_prune_ofo_queue(struct sock *sk)
 {
        struct tcp_sock *tp = tcp_sk(sk);
        struct rb_node *node, *prev;
+       int goal;
 
        if (RB_EMPTY_ROOT(&tp->out_of_order_queue))
                return false;
 
        NET_INC_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED);
+       goal = sk->sk_rcvbuf >> 3;
        node = &tp->ooo_last_skb->rbnode;
        do {
                prev = rb_prev(node);
                rb_erase(node, &tp->out_of_order_queue);
+               goal -= rb_to_skb(node)->truesize;
                tcp_drop(sk, rb_to_skb(node));
-               sk_mem_reclaim(sk);
-               if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
-                   !tcp_under_memory_pressure(sk))
-                       break;
+               if (!prev || goal <= 0) {
+                       sk_mem_reclaim(sk);
+                       if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
+                           !tcp_under_memory_pressure(sk))
+                               break;
+                       goal = sk->sk_rcvbuf >> 3;
+               }
                node = prev;
        } while (node);
        tp->ooo_last_skb = rb_to_skb(prev);