Skip to content
  1. Jan 08, 2021
    • Jakub Kicinski's avatar
      Merge branch 'net-fix-netfilter-defrag-ip-tunnel-pmtu-blackhole' · 704a0f85
      Jakub Kicinski authored
      Florian Westphal says:
      
      ====================
      net: fix netfilter defrag/ip tunnel pmtu blackhole
      
      Christian Perle reported a PMTU blackhole due to unexpected interaction
      between the ip defragmentation that comes with connection tracking and
      ip tunnels.
      
      Unfortunately setting 'nopmtudisc' on the tunnel breaks the test
      scenario even without netfilter.
      
      Christinas setup looks like this:
           +--------+       +---------+       +--------+
           |Router A|-------|Wanrouter|-------|Router B|
           |        |.IPIP..|         |..IPIP.|        |
           +--------+       +---------+       +--------+
                /             mtu 1400           \
               /                                  \
       +--------+                                  +--------+
       |Client A|                                  |Client B|
       +--------+                                  +--------+
      
      MTU is 1500 everywhere, except on Router A to Wanrouter and
      Wanrouter to Router B.
      
      Router A and Router B use IPIP tunnel interfaces to tunnel traffic
      between Client A and Client B over WAN.
      
      Client A sends a 1400 byte UDP datagram to Client B.
      This packet gets encapsulated in the IPIP tunnel.
      
      This works, packet is received on client B.
      
      When conntrack (or anything else that forces ip defragmentation) is
      enabled on Router A, the packet gets dropped on Router A after
      encapsulation because they exceed the link MTU.
      
      Setting the 'nopmtudisc' flag on the IPIP tunnel makes things worse,
      no packets pass even in the no-netfilter scenario.
      
      Patch one is a reproducer script for selftest infra.
      
      Patch two is a fix for 'nopmtudisc' behaviour so ip_tunnel will send
      an icmp error to Client A.  This allows 'nopmtudisc' tunnel to forward
      the UDP datagrams.
      
      Patch three enables ip refragmentation for all reassembled packets, just
      like ipv6.
      ====================
      
      Link: https://lore.kernel.org/r/20210105231523.622-1-fw@strlen.de
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      704a0f85
    • Florian Westphal's avatar
      net: ip: always refragment ip defragmented packets · bb4cc1a1
      Florian Westphal authored
      Conntrack reassembly records the largest fragment size seen in IPCB.
      However, when this gets forwarded/transmitted, fragmentation will only
      be forced if one of the fragmented packets had the DF bit set.
      
      In that case, a flag in IPCB will force fragmentation even if the
      MTU is large enough.
      
      This should work fine, but this breaks with ip tunnels.
      Consider client that sends a UDP datagram of size X to another host.
      
      The client fragments the datagram, so two packets, of size y and z, are
      sent. DF bit is not set on any of these packets.
      
      Middlebox netfilter reassembles those packets back to single size-X
      packet, before routing decision.
      
      packet-size-vs-mtu checks in ip_forward are irrelevant, because DF bit
      isn't set.  At output time, ip refragmentation is skipped as well
      because x is still smaller than the mtu of the output device.
      
      If ttransmit device is an ip tunnel, the packet size increases to
      x+overhead.
      
      Also, tunnel might be configured to force DF bit on outer header.
      
      In this case, packet will be dropped (exceeds MTU) and an ICMP error is
      generated back to sender.
      
      But sender already respects the announced MTU, all the packets that
      it sent did fit the announced mtu.
      
      Force refragmentation as per original sizes unconditionally so ip tunnel
      will encapsulate the fragments instead.
      
      The only other solution I see is to place ip refragmentation in
      the ip_tunnel code to handle this case.
      
      Fixes: d6b915e2
      
       ("ip_fragment: don't forward defragmented DF packet")
      Reported-by: default avatarChristian Perle <christian.perle@secunet.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bb4cc1a1
    • Florian Westphal's avatar
      net: fix pmtu check in nopmtudisc mode · 50c66167
      Florian Westphal authored
      For some reason ip_tunnel insist on setting the DF bit anyway when the
      inner header has the DF bit set, EVEN if the tunnel was configured with
      'nopmtudisc'.
      
      This means that the script added in the previous commit
      cannot be made to work by adding the 'nopmtudisc' flag to the
      ip tunnel configuration. Doing so breaks connectivity even for the
      without-conntrack/netfilter scenario.
      
      When nopmtudisc is set, the tunnel will skip the mtu check, so no
      icmp error is sent to client. Then, because inner header has DF set,
      the outer header gets added with DF bit set as well.
      
      IP stack then sends an error to itself because the packet exceeds
      the device MTU.
      
      Fixes: 23a3647b
      
       ("ip_tunnels: Use skb-len to PMTU check.")
      Cc: Stefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      50c66167
    • Florian Westphal's avatar
      selftests: netfilter: add selftest for ipip pmtu discovery with enabled connection tracking · 9e7a67de
      Florian Westphal authored
      
      
      Convert Christians bug description into a reproducer.
      
      Cc: Shuah Khan <shuah@kernel.org>
      Reported-by: default avatarChristian Perle <christian.perle@secunet.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9e7a67de
    • Lukas Bulwahn's avatar
      docs: octeontx2: tune rst markup · f3562f5e
      Lukas Bulwahn authored
      Commit 80b94148
      
       ("docs: octeontx2: Add Documentation for NPA health
      reporters") added new documentation with improper formatting for rst, and
      caused a few new warnings for make htmldocs in octeontx2.rst:169--202.
      
      Tune markup and formatting for better presentation in the HTML view.
      
      Signed-off-by: default avatarLukas Bulwahn <lukas.bulwahn@gmail.com>
      Acked-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Acked-by: default avatarGeorge Cherian <george.cherian@marvell.com>
      Link: https://lore.kernel.org/r/20210106161735.21751-1-lukas.bulwahn@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f3562f5e
    • Sean Tranchetti's avatar
      tools: selftests: add test for changing routes with PTMU exceptions · 5316a7c0
      Sean Tranchetti authored
      
      
      Adds new 2 new tests to the PTMU script: pmtu_ipv4/6_route_change.
      
      These tests explicitly test for a recently discovered problem in the
      IPv6 routing framework where PMTU exceptions were not properly released
      when replacing a route via "ip route change ...".
      
      After creating PMTU exceptions, the route from the device A to R1 will be
      replaced with a new route, then device A will be deleted. If the PMTU
      exceptions were properly cleaned up by the kernel, this device deletion
      will succeed. Otherwise, the unregistration of the device will stall, and
      messages such as the following will be logged in dmesg:
      
      unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 4
      
      Signed-off-by: default avatarSean Tranchetti <stranche@codeaurora.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/1609892546-11389-2-git-send-email-stranche@quicinc.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5316a7c0
    • Sean Tranchetti's avatar
      net: ipv6: fib: flush exceptions when purging route · d8f5c296
      Sean Tranchetti authored
      Route removal is handled by two code paths. The main removal path is via
      fib6_del_route() which will handle purging any PMTU exceptions from the
      cache, removing all per-cpu copies of the DST entry used by the route, and
      releasing the fib6_info struct.
      
      The second removal location is during fib6_add_rt2node() during a route
      replacement operation. This path also calls fib6_purge_rt() to handle
      cleaning up the per-cpu copies of the DST entries and releasing the
      fib6_info associated with the older route, but it does not flush any PMTU
      exceptions that the older route had. Since the older route is removed from
      the tree during the replacement, we lose any way of accessing it again.
      
      As these lingering DSTs and the fib6_info struct are holding references to
      the underlying netdevice struct as well, unregistering that device from the
      kernel can never complete.
      
      Fixes: 2b760fcf
      
       ("ipv6: hook up exception table to store dst cache")
      Signed-off-by: default avatarSean Tranchetti <stranche@codeaurora.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/1609892546-11389-1-git-send-email-stranche@quicinc.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d8f5c296
    • Jakub Kicinski's avatar
      Merge tag 'linux-can-fixes-for-5.11-20210107' of... · c8c748fb
      Jakub Kicinski authored
      Merge tag 'linux-can-fixes-for-5.11-20210107' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2021-01-07
      
      The first patch is by me for the m_can driver and removes an erroneous
      m_can_clk_stop() from the driver's unregister function.
      
      The second patch targets the tcan4x5x driver, is by me, and fixes the bit
      timing constant parameters.
      
      The next two patches are by me, target the mcp251xfd driver, and fix a race
      condition in the optimized TEF path (which was added in net-next for v5.11).
      The similar code in the RX path is changed to look the same, although it
      doesn't suffer from the race condition.
      
      A patch by Lad Prabhakar updates the description and help text for the rcar CAN
      driver to reflect all supported SoCs.
      
      In the last patch Sriram Dash transfers the maintainership of the m_can driver
      to Pankaj Sharma.
      
      * tag 'linux-can-fixes-for-5.11-20210107' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
        MAINTAINERS: Update MCAN MMIO device driver maintainer
        can: rcar: Kconfig: update help description for CAN_RCAR config
        can: mcp251xfd: mcp251xfd_handle_rxif_ring(): first increment RX tail pointer in HW, then in driver
        can: mcp251xfd: mcp251xfd_handle_tefif(): fix TEF vs. TX race condition
        can: tcan4x5x: fix bittiming const, use common bittiming from m_can driver
        can: m_can: m_can_class_unregister(): remove erroneous m_can_clk_stop()
      ====================
      
      Link: https://lore.kernel.org/r/20210107103451.183477-1-mkl@pengutronix.de
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c8c748fb
  2. Jan 07, 2021
  3. Jan 06, 2021