Skip to content
  1. Nov 06, 2020
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Use spin_lock_bh for async_icosq_lock · f42139ba
      Maxim Mikityanskiy authored
      async_icosq_lock may be taken from softirq and non-softirq contexts. It
      requires protection with spin_lock_bh, otherwise a softirq may be
      triggered in the middle of the critical section, and it may deadlock if
      it tries to take the same lock. This patch fixes such a scenario by
      using spin_lock_bh to disable softirqs on that CPU while inside the
      critical section.
      
      Fixes: 8d94b590
      
       ("net/mlx5e: Turn XSK ICOSQ into a general asynchronous one")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      f42139ba
    • Vlad Buslov's avatar
      net/mlx5e: Protect encap route dev from concurrent release · 78c906e4
      Vlad Buslov authored
      In functions mlx5e_route_lookup_ipv{4|6}() route_dev can be arbitrary net
      device and not necessary mlx5 eswitch port representor. As such, in order
      to ensure that route_dev is not destroyed concurrent the code needs either
      explicitly take reference to the device before releasing reference to
      rtable instance or ensure that caller holds rtnl lock. First approach is
      chosen as a fix since rtnl lock dependency was intentionally removed from
      mlx5 TC layer.
      
      To prevent unprotected usage of route_dev in encap code take a reference to
      the device before releasing rt. Don't save direct pointer to the device in
      mlx5_encap_entry structure and use ifindex instead. Modify users of
      route_dev pointer to properly obtain the net device instance from its
      ifindex.
      
      Fixes: 61086f39 ("net/mlx5e: Protect encap hash table with mutex")
      Fixes: 6707f74b
      
       ("net/mlx5e: Update hw flows when encap source mac changed")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      78c906e4
    • Maor Dickman's avatar
      net/mlx5e: Fix modify header actions memory leak · e68e28b4
      Maor Dickman authored
      Modify header actions are allocated during parse tc actions and only
      freed during the flow creation, however, on error flow the allocated
      memory is wrongly unfreed.
      
      Fix this by calling dealloc_mod_hdr_actions in __mlx5e_add_fdb_flow
      and mlx5e_add_nic_flow error flow.
      
      Fixes: d7e75a32 ("net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions")
      Fixes: 2f4fe4ca
      
       ("net/mlx5e: Add offloading of NIC TC pedit (header re-write) actions")
      Signed-off-by: default avatarMaor Dickman <maord@nvidia.com>
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      e68e28b4
  2. Nov 03, 2020
  3. Nov 01, 2020
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 859191b2
      Jakub Kicinski authored
      
      
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Incorrect netlink report logic in flowtable and genID.
      
      2) Add a selftest to check that wireguard passes the right sk
         to ip_route_me_harder, from Jason A. Donenfeld.
      
      3) Pass the actual sk to ip_route_me_harder(), also from Jason.
      
      4) Missing expression validation of updates via nft --check.
      
      5) Update byte and packet counters regardless of whether they
         match, from Stefano Brivio.
      ====================
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      859191b2
    • wenxu's avatar
      ip_tunnel: fix over-mtu packet send fail without TUNNEL_DONT_FRAGMENT flags · 20149e9e
      wenxu authored
      The tunnel device such as vxlan, bareudp and geneve in the lwt mode set
      the outer df only based TUNNEL_DONT_FRAGMENT.
      And this was also the behavior for gre device before switching to use
      ip_md_tunnel_xmit in commit 962924fa ("ip_gre: Refactor collect
      metatdata mode tunnel xmit to ip_md_tunnel_xmit")
      
      When the ip_gre in lwt mode xmit with ip_md_tunnel_xmi changed the rule and
      make the discrepancy between handling of DF by different tunnels. So in the
      ip_md_tunnel_xmit should follow the same rule like other tunnels.
      
      Fixes: cfc7381b
      
       ("ip_tunnel: add collect_md mode to IPIP tunnel")
      Signed-off-by: default avatarwenxu <wenxu@ucloud.cn>
      Link: https://lore.kernel.org/r/1604028728-31100-1-git-send-email-wenxu@ucloud.cn
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      20149e9e
    • Mark Deneen's avatar
      cadence: force nonlinear buffers to be cloned · 403dc167
      Mark Deneen authored
      In my test setup, I had a SAMA5D27 device configured with ip forwarding, and
      second device with usb ethernet (r8152) sending ICMP packets.  If the packet
      was larger than about 220 bytes, the SAMA5 device would "oops" with the
      following trace:
      
      kernel BUG at net/core/skbuff.c:1863!
      Internal error: Oops - BUG: 0 [#1] ARM
      Modules linked in: xt_MASQUERADE ppp_async ppp_generic slhc iptable_nat xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 can_raw can bridge stp llc ipt_REJECT nf_reject_ipv4 sd_mod cdc_ether usbnet usb_storage r8152 scsi_mod mii o
      ption usb_wwan usbserial micrel macb at91_sama5d2_adc phylink gpio_sama5d2_piobu m_can_platform m_can industrialio_triggered_buffer kfifo_buf of_mdio can_dev fixed_phy sdhci_of_at91 sdhci_pltfm libphy sdhci mmc_core ohci_at91 ehci_atmel o
      hci_hcd iio_rescale industrialio sch_fq_codel spidev prox2_hal(O)
      CPU: 0 PID: 0 Comm: swapper Tainted: G           O      5.9.1-prox2+ #1
      Hardware name: Atmel SAMA5
      PC is at skb_put+0x3c/0x50
      LR is at macb_start_xmit+0x134/0xad0 [macb]
      pc : [<c05258cc>]    lr : [<bf0ea5b8>]    psr: 20070113
      sp : c0d01a60  ip : c07232c0  fp : c4250000
      r10: c0d03cc8  r9 : 00000000  r8 : c0d038c0
      r7 : 00000000  r6 : 00000008  r5 : c59b66c0  r4 : 0000002a
      r3 : 8f659eff  r2 : c59e9eea  r1 : 00000001  r0 : c59b66c0
      Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
      Control: 10c53c7d  Table: 2640c059  DAC: 00000051
      Process swapper (pid: 0, stack limit = 0x75002d81)
      
      <snipped stack>
      
      [<c05258cc>] (skb_put) from [<bf0ea5b8>] (macb_start_xmit+0x134/0xad0 [macb])
      [<bf0ea5b8>] (macb_start_xmit [macb]) from [<c053e504>] (dev_hard_start_xmit+0x90/0x11c)
      [<c053e504>] (dev_hard_start_xmit) from [<c0571180>] (sch_direct_xmit+0x124/0x260)
      [<c0571180>] (sch_direct_xmit) from [<c053eae4>] (__dev_queue_xmit+0x4b0/0x6d0)
      [<c053eae4>] (__dev_queue_xmit) from [<c05a5650>] (ip_finish_output2+0x350/0x580)
      [<c05a5650>] (ip_finish_output2) from [<c05a7e24>] (ip_output+0xb4/0x13c)
      [<c05a7e24>] (ip_output) from [<c05a39d0>] (ip_forward+0x474/0x500)
      [<c05a39d0>] (ip_forward) from [<c05a13d8>] (ip_sublist_rcv_finish+0x3c/0x50)
      [<c05a13d8>] (ip_sublist_rcv_finish) from [<c05a19b8>] (ip_sublist_rcv+0x11c/0x188)
      [<c05a19b8>] (ip_sublist_rcv) from [<c05a2494>] (ip_list_rcv+0xf8/0x124)
      [<c05a2494>] (ip_list_rcv) from [<c05403c4>] (__netif_receive_skb_list_core+0x1a0/0x20c)
      [<c05403c4>] (__netif_receive_skb_list_core) from [<c05405c4>] (netif_receive_skb_list_internal+0x194/0x230)
      [<c05405c4>] (netif_receive_skb_list_internal) from [<c0540684>] (gro_normal_list.part.0+0x14/0x28)
      [<c0540684>] (gro_normal_list.part.0) from [<c0541280>] (napi_complete_done+0x16c/0x210)
      [<c0541280>] (napi_complete_done) from [<bf14c1c0>] (r8152_poll+0x684/0x708 [r8152])
      [<bf14c1c0>] (r8152_poll [r8152]) from [<c0541424>] (net_rx_action+0x100/0x328)
      [<c0541424>] (net_rx_action) from [<c01012ec>] (__do_softirq+0xec/0x274)
      [<c01012ec>] (__do_softirq) from [<c012d6d4>] (irq_exit+0xcc/0xd0)
      [<c012d6d4>] (irq_exit) from [<c0160960>] (__handle_domain_irq+0x58/0xa4)
      [<c0160960>] (__handle_domain_irq) from [<c0100b0c>] (__irq_svc+0x6c/0x90)
      Exception stack(0xc0d01ef0 to 0xc0d01f38)
      1ee0:                                     00000000 0000003d 0c31f383 c0d0fa00
      1f00: c0d2eb80 00000000 c0d2e630 4dad8c49 4da967b0 0000003d 0000003d 00000000
      1f20: fffffff5 c0d01f40 c04e0f88 c04e0f8c 30070013 ffffffff
      [<c0100b0c>] (__irq_svc) from [<c04e0f8c>] (cpuidle_enter_state+0x7c/0x378)
      [<c04e0f8c>] (cpuidle_enter_state) from [<c04e12c4>] (cpuidle_enter+0x28/0x38)
      [<c04e12c4>] (cpuidle_enter) from [<c014f710>] (do_idle+0x194/0x214)
      [<c014f710>] (do_idle) from [<c014fa50>] (cpu_startup_entry+0xc/0x14)
      [<c014fa50>] (cpu_startup_entry) from [<c0a00dc8>] (start_kernel+0x46c/0x4a0)
      Code: e580c054 8a000002 e1a00002 e8bd8070 (e7f001f2)
      ---[ end trace 146c8a334115490c ]---
      
      The solution was to force nonlinear buffers to be cloned.  This was previously
      reported by Klaus Doth (https://www.spinics.net/lists/netdev/msg556937.html
      
      )
      but never formally submitted as a patch.
      
      This is the third revision, hopefully the formatting is correct this time!
      
      Suggested-by: default avatarKlaus Doth <krnl@doth.eu>
      Fixes: 653e92a9
      
       ("net: macb: add support for padding and fcs computation")
      Signed-off-by: default avatarMark Deneen <mdeneen@saucontech.com>
      Link: https://lore.kernel.org/r/20201030155814.622831-1-mdeneen@saucontech.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      403dc167
    • Jakub Kicinski's avatar
      Merge branch 'ipv6-reply-icmp-error-if-fragment-doesn-t-contain-all-headers' · 72a41f95
      Jakub Kicinski authored
      Hangbin Liu says:
      
      ====================
      IPv6: reply ICMP error if fragment doesn't contain all headers
      
      When our Engineer run latest IPv6 Core Conformance test, test v6LC.1.3.6:
      First Fragment Doesn’t Contain All Headers[1] failed. The test purpose is to
      verify that the node (Linux for example) should properly process IPv6 packets
      that don’t include all the headers through the Upper-Layer header.
      
      Based on RFC 8200, Section 4.5 Fragment Header
      
        -  If the first fragment does not include all headers through an
           Upper-Layer header, then that fragment should be discarded and
           an ICMP Parameter Problem, Code 3, message should be sent to
           the source of the fragment, with the Pointer field set to zero.
      
      The first patch add a definition for ICMPv6 Parameter Problem, code 3.
      The second patch add a check for the 1st fragment packet to make sure
      Upper-Layer header exist.
      
      [1] Page 68, v6LC.1.3.6: First Fragment Doesn’t Contain All Headers part A, B,
      C and D at https://ipv6ready.org/docs/Core_Conformance_5_0_0.pdf
      [2] My reproducer:
      
      import sys, os
      from scapy.all import *
      
      def send_frag_dst_opt(src_ip6, dst_ip6):
          ip6 = IPv6(src = src_ip6, dst = dst_ip6, nh = 44)
      
          frag_1 = IPv6ExtHdrFragment(nh = 60, m = 1)
          dst_opt = IPv6ExtHdrDestOpt(nh = 58)
      
          frag_2 = IPv6ExtHdrFragment(nh = 58, offset = 4, m = 1)
          icmp_echo = ICMPv6EchoRequest(seq = 1)
      
          pkt_1 = ip6/frag_1/dst_opt
          pkt_2 = ip6/frag_2/icmp_echo
      
          send(pkt_1)
          send(pkt_2)
      
      def send_frag_route_opt(src_ip6, dst_ip6):
          ip6 = IPv6(src = src_ip6, dst = dst_ip6, nh = 44)
      
          frag_1 = IPv6ExtHdrFragment(nh = 43, m = 1)
          route_opt = IPv6ExtHdrRouting(nh = 58)
      
          frag_2 = IPv6ExtHdrFragment(nh = 58, offset = 4, m = 1)
          icmp_echo = ICMPv6EchoRequest(seq = 2)
      
          pkt_1 = ip6/frag_1/route_opt
          pkt_2 = ip6/frag_2/icmp_echo
      
          send(pkt_1)
          send(pkt_2)
      
      if __name__ == '__main__':
          src = sys.argv[1]
          dst = sys.argv[2]
          conf.iface = sys.argv[3]
          send_frag_dst_opt(src, dst)
          send_frag_route_opt(src, dst)
      ====================
      
      Link: https://lore.kernel.org/r/20201027123313.3717941-1-liuhangbin@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      72a41f95
    • Hangbin Liu's avatar
      IPv6: reply ICMP error if the first fragment don't include all headers · 2efdaaaf
      Hangbin Liu authored
      
      
      Based on RFC 8200, Section 4.5 Fragment Header:
      
        -  If the first fragment does not include all headers through an
           Upper-Layer header, then that fragment should be discarded and
           an ICMP Parameter Problem, Code 3, message should be sent to
           the source of the fragment, with the Pointer field set to zero.
      
      Checking each packet header in IPv6 fast path will have performance impact,
      so I put the checking in ipv6_frag_rcv().
      
      As the packet may be any kind of L4 protocol, I only checked some common
      protocols' header length and handle others by (offset + 1) > skb->len.
      Also use !(frag_off & htons(IP6_OFFSET)) to catch atomic fragments
      (fragmented packet with only one fragment).
      
      When send ICMP error message, if the 1st truncated fragment is ICMP message,
      icmp6_send() will break as is_ineligible() return true. So I added a check
      in is_ineligible() to let fragment packet with nexthdr ICMP but no ICMP header
      return false.
      
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2efdaaaf
    • Hangbin Liu's avatar
      ICMPv6: Add ICMPv6 Parameter Problem, code 3 definition · b59e286b
      Hangbin Liu authored
      
      
      Based on RFC7112, Section 6:
      
         IANA has added the following "Type 4 - Parameter Problem" message to
         the "Internet Control Message Protocol version 6 (ICMPv6) Parameters"
         registry:
      
            CODE     NAME/DESCRIPTION
             3       IPv6 First Fragment has incomplete IPv6 Header Chain
      
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b59e286b
    • Colin Ian King's avatar
      net: atm: fix update of position index in lec_seq_next · 2f71e006
      Colin Ian King authored
      
      
      The position index in leq_seq_next is not updated when the next
      entry is fetched an no more entries are available. This causes
      seq_file to report the following error:
      
      "seq_file: buggy .next function lec_seq_next [lec] did not update
       position index"
      
      Fix this by always updating the position index.
      
      [ Note: this is an ancient 2002 bug, the sha is from the
        tglx/history repo ]
      
      Fixes 4aea2cbff417 ("[ATM]: Move lan seq_file ops to lec.c [1/3]")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Link: https://lore.kernel.org/r/20201027114925.21843-1-colin.king@canonical.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2f71e006
  4. Oct 31, 2020
    • Stefano Brivio's avatar
      netfilter: ipset: Update byte and packet counters regardless of whether they match · 7d10e62c
      Stefano Brivio authored
      In ip_set_match_extensions(), for sets with counters, we take care of
      updating counters themselves by calling ip_set_update_counter(), and of
      checking if the given comparison and values match, by calling
      ip_set_match_counter() if needed.
      
      However, if a given comparison on counters doesn't match the configured
      values, that doesn't mean the set entry itself isn't matching.
      
      This fix restores the behaviour we had before commit 4750005a
      
      
      ("netfilter: ipset: Fix "don't update counters" mode when counters used
      at the matching"), without reintroducing the issue fixed there: back
      then, mtype_data_match() first updated counters in any case, and then
      took care of matching on counters.
      
      Now, if the IPSET_FLAG_SKIP_COUNTER_UPDATE flag is set,
      ip_set_update_counter() will anyway skip counter updates if desired.
      
      The issue observed is illustrated by this reproducer:
      
        ipset create c hash:ip counters
        ipset add c 192.0.2.1
        iptables -I INPUT -m set --match-set c src --bytes-gt 800 -j DROP
      
      if we now send packets from 192.0.2.1, bytes and packets counters
      for the entry as shown by 'ipset list' are always zero, and, no
      matter how many bytes we send, the rule will never match, because
      counters themselves are not updated.
      
      Reported-by: default avatarMithil Mhatre <mmhatre@redhat.com>
      Fixes: 4750005a
      
       ("netfilter: ipset: Fix "don't update counters" mode when counters used at the matching")
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarJozsef Kadlecsik <kadlec@netfilter.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      7d10e62c
    • Marek Szyprowski's avatar
      net: stmmac: Fix channel lock initialization · 2b94f526
      Marek Szyprowski authored
      Commit 0366f7e0 ("net: stmmac: add ethtool support for get/set
      channels") refactored channel initialization, but during that operation,
      the spinlock initialization got lost. Fix this. This fixes the following
      lockdep warning:
      
      meson8b-dwmac ff3f0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
      INFO: trying to register non-static key.
      the code is fine but needs lockdep annotation.
      turning off the locking correctness validator.
      CPU: 1 PID: 331 Comm: kworker/1:2H Not tainted 5.9.0-rc3+ #1858
      Hardware name: Hardkernel ODROID-N2 (DT)
      Workqueue: kblockd blk_mq_run_work_fn
      Call trace:
       dump_backtrace+0x0/0x1d0
       show_stack+0x14/0x20
       dump_stack+0xe8/0x154
       register_lock_class+0x58c/0x590
       __lock_acquire+0x7c/0x1790
       lock_acquire+0xf4/0x440
       _raw_spin_lock_irqsave+0x80/0xb0
       stmmac_tx_timer+0x4c/0xb0 [stmmac]
       call_timer_fn+0xc4/0x3e8
       run_timer_softirq+0x2b8/0x6c0
       efi_header_end+0x114/0x5f8
       irq_exit+0x104/0x110
       __handle_domain_irq+0x60/0xb8
       gic_handle_irq+0x58/0xb0
       el1_irq+0xbc/0x180
       _raw_spin_unlock_irqrestore+0x48/0x90
       mmc_blk_rw_wait+0x70/0x160
       mmc_blk_mq_issue_rq+0x510/0x830
       mmc_mq_queue_rq+0x13c/0x278
       blk_mq_dispatch_rq_list+0x2a0/0x698
       __blk_mq_do_dispatch_sched+0x254/0x288
       __blk_mq_sched_dispatch_requests+0x190/0x1d8
       blk_mq_sched_dispatch_requests+0x34/0x70
       __blk_mq_run_hw_queue+0xcc/0x148
       blk_mq_run_work_fn+0x20/0x28
       process_one_work+0x2a8/0x718
       worker_thread+0x48/0x460
       kthread+0x134/0x160
       ret_from_fork+0x10/0x1c
      
      Fixes: 0366f7e0
      
       ("net: stmmac: add ethtool support for get/set channels")
      Signed-off-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Link: https://lore.kernel.org/r/20201029185011.4749-1-m.szyprowski@samsung.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2b94f526
    • Wong Vee Khee's avatar
      stmmac: intel: Fix kernel panic on pci probe · 785ff20b
      Wong Vee Khee authored
      The commit "stmmac: intel: Adding ref clock 1us tic for LPI cntr"
      introduced a regression which leads to the kernel panic duing loading
      of the dwmac_intel module.
      
      Move the code block after pci resources is obtained.
      
      Fixes: b4c5f83a
      
       ("stmmac: intel: Adding ref clock 1us tic for LPI cntr")
      Cc: Voon Weifeng <weifeng.voon@intel.com>
      Signed-off-by: default avatarWong Vee Khee <vee.khee.wong@intel.com>
      Link: https://lore.kernel.org/r/20201029093228.1741-1-vee.khee.wong@intel.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      785ff20b
    • Claudiu Manoil's avatar
      gianfar: Account for Tx PTP timestamp in the skb headroom · d6a076d6
      Claudiu Manoil authored
      
      
      When PTP timestamping is enabled on Tx, the controller
      inserts the Tx timestamp at the beginning of the frame
      buffer, between SFD and the L2 frame header. This means
      that the skb provided by the stack is required to have
      enough headroom otherwise a new skb needs to be created
      by the driver to accommodate the timestamp inserted by h/w.
      Up until now the driver was relying on the second option,
      using skb_realloc_headroom() to create a new skb to accommodate
      PTP frames. Turns out that this method is not reliable, as
      reallocation of skbs for PTP frames along with the required
      overhead (skb_set_owner_w, consume_skb) is causing random
      crashes in subsequent skb_*() calls, when multiple concurrent
      TCP streams are run at the same time on the same device
      (as seen in James' report).
      Note that these crashes don't occur with a single TCP stream,
      nor with multiple concurrent UDP streams, but only when multiple
      TCP streams are run concurrently with the PTP packet flow
      (doing skb reallocation).
      This patch enforces the first method, by requesting enough
      headroom from the stack to accommodate PTP frames, and so avoiding
      skb_realloc_headroom() & co, and the crashes no longer occur.
      There's no reason not to set needed_headroom to a large enough
      value to accommodate PTP frames, so in this regard this patch
      is a fix.
      
      Reported-by: default avatarJames Jurack <james.jurack@ametek.com>
      Fixes: bee9e58c
      
       ("gianfar:don't add FCB length to hard_header_len")
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Link: https://lore.kernel.org/r/20201020173605.1173-1-claudiu.manoil@nxp.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d6a076d6
    • Claudiu Manoil's avatar
      gianfar: Replace skb_realloc_headroom with skb_cow_head for PTP · d145c903
      Claudiu Manoil authored
      When PTP timestamping is enabled on Tx, the controller
      inserts the Tx timestamp at the beginning of the frame
      buffer, between SFD and the L2 frame header.  This means
      that the skb provided by the stack is required to have
      enough headroom otherwise a new skb needs to be created
      by the driver to accommodate the timestamp inserted by h/w.
      Up until now the driver was relying on skb_realloc_headroom()
      to create new skbs to accommodate PTP frames.  Turns out that
      this method is not reliable in this context at least, as
      skb_realloc_headroom() for PTP frames can cause random crashes,
      mostly in subsequent skb_*() calls, when multiple concurrent
      TCP streams are run at the same time with the PTP flow
      on the same device (as seen in James' report).  I also noticed
      that when the system is loaded by sending multiple TCP streams,
      the driver receives cloned skbs in large numbers.
      skb_cow_head() instead proves to be stable in this scenario,
      and not only handles cloned skbs too but it's also more efficient
      and widely used in other drivers.
      The commit introducing skb_realloc_headroom in the driver
      goes back to 2009, commit 93c1285c
      ("gianfar: reallocate skb when headroom is not enough for fcb").
      For practical purposes I'm referencing a newer commit (from 2012)
      that brings the code to its current structure (and fixes the PTP
      case).
      
      Fixes: 9c4886e5
      
       ("gianfar: Fix invalid TX frames returned on error queue when time stamping")
      Reported-by: default avatarJames Jurack <james.jurack@ametek.com>
      Suggested-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Link: https://lore.kernel.org/r/20201029081057.8506-1-claudiu.manoil@nxp.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d145c903
  5. Oct 30, 2020
    • Greg Ungerer's avatar
      net: fec: fix MDIO probing for some FEC hardware blocks · 1e6114f5
      Greg Ungerer authored
      Some (apparently older) versions of the FEC hardware block do not like
      the MMFR register being cleared to avoid generation of MII events at
      initialization time. The action of clearing this register results in no
      future MII events being generated at all on the problem block. This means
      the probing of the MDIO bus will find no PHYs.
      
      Create a quirk that can be checked at the FECs MII init time so that
      the right thing is done. The quirk is set as appropriate for the FEC
      hardware blocks that are known to need this.
      
      Fixes: f166f890
      
       ("net: ethernet: fec: Replace interrupt driven MDIO with polled IO")
      Signed-off-by: default avatarGreg Ungerer <gerg@linux-m68k.org>
      Acked-by: default avatarFugang Duan <fugand.duan@nxp.com>
      Tested-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Tested-by: default avatarClemens Gruber <clemens.gruber@pqgruber.com>
      Link: https://lore.kernel.org/r/20201028052232.1315167-1-gerg@linux-m68k.org
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1e6114f5
    • Alexander Ovechkin's avatar
      ip6_tunnel: set inner ipproto before ip6_tnl_encap · 9e7c5b39
      Alexander Ovechkin authored
      ip6_tnl_encap assigns to proto transport protocol which
      encapsulates inner packet, but we must pass to set_inner_ipproto
      protocol of that inner packet.
      
      Calling set_inner_ipproto after ip6_tnl_encap might break gso.
      For example, in case of encapsulating ipv6 packet in fou6 packet, inner_ipproto
      would be set to IPPROTO_UDP instead of IPPROTO_IPV6. This would lead to
      incorrect calling sequence of gso functions:
      ipv6_gso_segment -> udp6_ufo_fragment -> skb_udp_tunnel_segment -> udp6_ufo_fragment
      instead of:
      ipv6_gso_segment -> udp6_ufo_fragment -> skb_udp_tunnel_segment -> ip6ip6_gso_segment
      
      Fixes: 6c11fbf9
      
       ("ip6_tunnel: add MPLS transmit support")
      Signed-off-by: default avatarAlexander Ovechkin <ovov@yandex-team.ru>
      Link: https://lore.kernel.org/r/20201029171012.20904-1-ovov@yandex-team.ru
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9e7c5b39
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: missing validation from the abort path · c0391b6a
      Pablo Neira Ayuso authored
      If userspace does not include the trailing end of batch message, then
      nfnetlink aborts the transaction. This allows to check that ruleset
      updates trigger no errors.
      
      After this patch, invoking this command from the prerouting chain:
      
       # nft -c add rule x y fib saddr . oif type local
      
      fails since oif is not supported there.
      
      This patch fixes the lack of rule validation from the abort/check path
      to catch configuration errors such as the one above.
      
      Fixes: a654de8f
      
       ("netfilter: nf_tables: fix chain dependency validation")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c0391b6a
    • Jason A. Donenfeld's avatar
      netfilter: use actual socket sk rather than skb sk when routing harder · 46d6c5ae
      Jason A. Donenfeld authored
      If netfilter changes the packet mark when mangling, the packet is
      rerouted using the route_me_harder set of functions. Prior to this
      commit, there's one big difference between route_me_harder and the
      ordinary initial routing functions, described in the comment above
      __ip_queue_xmit():
      
         /* Note: skb->sk can be different from sk, in case of tunnels */
         int __ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl,
      
      That function goes on to correctly make use of sk->sk_bound_dev_if,
      rather than skb->sk->sk_bound_dev_if. And indeed the comment is true: a
      tunnel will receive a packet in ndo_start_xmit with an initial skb->sk.
      It will make some transformations to that packet, and then it will send
      the encapsulated packet out of a *new* socket. That new socket will
      basically always have a different sk_bound_dev_if (otherwise there'd be
      a routing loop). So for the purposes of routing the encapsulated packet,
      the routing information as it pertains to the socket should come from
      that socket's sk, rather than the packet's original skb->sk. For that
      reason __ip_queue_xmit() and related functions all do the right thing.
      
      One might argue that all tunnels should just call skb_orphan(skb) before
      transmitting the encapsulated packet into the new socket. But tunnels do
      *not* do this -- and this is wisely avoided in skb_scrub_packet() too --
      because features like TSQ rely on skb->destructor() being called when
      that buffer space is truely available again. Calling skb_orphan(skb) too
      early would result in buffers filling up unnecessarily and accounting
      info being all wrong. Instead, additional routing must take into account
      the new sk, just as __ip_queue_xmit() notes.
      
      So, this commit addresses the problem by fishing the correct sk out of
      state->sk -- it's already set properly in the call to nf_hook() in
      __ip_local_out(), which receives the sk as part of its normal
      functionality. So we make sure to plumb state->sk through the various
      route_me_harder functions, and then make correct use of it following the
      example of __ip_queue_xmit().
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      46d6c5ae
    • Jason A. Donenfeld's avatar
      wireguard: selftests: check that route_me_harder packets use the right sk · af8afcf1
      Jason A. Donenfeld authored
      
      
      If netfilter changes the packet mark, the packet is rerouted. The
      ip_route_me_harder family of functions fails to use the right sk, opting
      to instead use skb->sk, resulting in a routing loop when used with
      tunnels. With the next change fixing this issue in netfilter, test for
      the relevant condition inside our test suite, since wireguard was where
      the bug was discovered.
      
      Reported-by: default avatarChen Minqiang <ptpt52@gmail.com>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      af8afcf1
    • Pablo Neira Ayuso's avatar
      netfilter: nftables: fix netlink report logic in flowtable and genid · dceababa
      Pablo Neira Ayuso authored
      The netlink report should be sent regardless the available listeners.
      
      Fixes: 84d7fce6 ("netfilter: nf_tables: export rule-set generation ID")
      Fixes: 3b49e2e9
      
       ("netfilter: nf_tables: add flow table netlink frontend")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      dceababa
    • Johannes Berg's avatar
      mac80211: don't require VHT elements for HE on 2.4 GHz · c2f46814
      Johannes Berg authored
      After the previous similar bugfix there was another bug here,
      if no VHT elements were found we also disabled HE. Fix this to
      disable HE only on the 5 GHz band; on 6 GHz it was already not
      disabled, and on 2.4 GHz there need (should) not be any VHT.
      
      Fixes: 57fa5e85 ("mac80211: determine chandef from HE 6 GHz operation")
      Link: https://lore.kernel.org/r/20201013140156.535a2fc6192f.Id6e5e525a60ac18d245d86f4015f1b271fce6ee6@changeid
      
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      c2f46814
    • Ye Bin's avatar
      cfg80211: regulatory: Fix inconsistent format argument · db18d20d
      Ye Bin authored
      
      
      Fix follow warning:
      [net/wireless/reg.c:3619]: (warning) %d in format string (no. 2)
      requires 'int' but the argument type is 'unsigned int'.
      
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Link: https://lore.kernel.org/r/20201009070215.63695-1-yebin10@huawei.com
      
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      db18d20d
    • Mauro Carvalho Chehab's avatar
      mac80211: fix kernel-doc markups · b1e8eb11
      Mauro Carvalho Chehab authored
      
      
      Some identifiers have different names between their prototypes
      and the kernel-doc markup.
      
      Others need to be fixed, as kernel-doc markups should use this format:
              identifier - description
      
      In the specific case of __sta_info_flush(), add a documentation
      for sta_info_flush(), as this one is the one used outside
      sta_info.c.
      
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Reviewed-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Link: https://lore.kernel.org/r/978d35eef2dc76e21c81931804e4eaefbd6d635e.1603469755.git.mchehab+huawei@kernel.org
      
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      b1e8eb11
    • Johannes Berg's avatar
      mac80211: always wind down STA state · dcd479e1
      Johannes Berg authored
      
      
      When (for example) an IBSS station is pre-moved to AUTHORIZED
      before it's inserted, and then the insertion fails, we don't
      clean up the fast RX/TX states that might already have been
      created, since we don't go through all the state transitions
      again on the way down.
      
      Do that, if it hasn't been done already, when the station is
      freed. I considered only freeing the fast TX/RX state there,
      but we might add more state so it's more robust to wind down
      the state properly.
      
      Note that we warn if the station was ever inserted, it should
      have been properly cleaned up in that case, and the driver
      will probably not like things happening out of order.
      
      Reported-by: default avatar <syzbot+2e293dbd67de2836ba42@syzkaller.appspotmail.com>
      Link: https://lore.kernel.org/r/20201009141710.7223b322a955.I95bd08b9ad0e039c034927cce0b75beea38e059b@changeid
      
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      dcd479e1
    • Johannes Berg's avatar
      cfg80211: initialize wdev data earlier · 9bdaf3b9
      Johannes Berg authored
      
      
      There's a race condition in the netdev registration in that
      NETDEV_REGISTER actually happens after the netdev is available,
      and so if we initialize things only there, we might get called
      with an uninitialized wdev through nl80211 - not using a wdev
      but using a netdev interface index.
      
      I found this while looking into a syzbot report, but it doesn't
      really seem to be related, and unfortunately there's no repro
      for it (yet). I can't (yet) explain how it managed to get into
      cfg80211_release_pmsr() from nl80211_netlink_notify() without
      the wdev having been initialized, as the latter only iterates
      the wdevs that are linked into the rdev, which even without the
      change here happened after init.
      
      However, looking at this, it seems fairly clear that the init
      needs to be done earlier, otherwise we might even re-init on a
      netns move, when data might still be pending.
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Link: https://lore.kernel.org/r/20201009135821.fdcbba3aad65.Ie9201d91dbcb7da32318812effdc1561aeaf4cdc@changeid
      
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      9bdaf3b9
    • Johannes Berg's avatar
      mac80211: fix use of skb payload instead of header · 14f46c1e
      Johannes Berg authored
      
      
      When ieee80211_skb_resize() is called from ieee80211_build_hdr()
      the skb has no 802.11 header yet, in fact it consist only of the
      payload as the ethernet frame is removed. As such, we're using
      the payload data for ieee80211_is_mgmt(), which is of course
      completely wrong. This didn't really hurt us because these are
      always data frames, so we could only have added more tailroom
      than we needed if we determined it was a management frame and
      sdata->crypto_tx_tailroom_needed_cnt was false.
      
      However, syzbot found that of course there need not be any payload,
      so we're using at best uninitialized memory for the check.
      
      Fix this to pass explicitly the kind of frame that we have instead
      of checking there, by replacing the "bool may_encrypt" argument
      with an argument that can carry the three possible states - it's
      not going to be encrypted, it's a management frame, or it's a data
      frame (and then we check sdata->crypto_tx_tailroom_needed_cnt).
      
      Reported-by: default avatar <syzbot+32fd1a1bfe355e93f1e2@syzkaller.appspotmail.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Link: https://lore.kernel.org/r/20201009132538.e1fd7f802947.I799b288466ea2815f9d4c84349fae697dca2f189@changeid
      
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      14f46c1e
    • Mathy Vanhoef's avatar
      mac80211: fix regression where EAPOL frames were sent in plaintext · 804fc6a2
      Mathy Vanhoef authored
      When sending EAPOL frames via NL80211 they are treated as injected
      frames in mac80211. Due to commit 1df2bdba ("mac80211: never drop
      injected frames even if normally not allowed") these injected frames
      were not assigned a sta context in the function ieee80211_tx_dequeue,
      causing certain wireless network cards to always send EAPOL frames in
      plaintext. This may cause compatibility issues with some clients or
      APs, which for instance can cause the group key handshake to fail and
      in turn would cause the station to get disconnected.
      
      This commit fixes this regression by assigning a sta context in
      ieee80211_tx_dequeue to injected frames as well.
      
      Note that sending EAPOL frames in plaintext is not a security issue
      since they contain their own encryption and authentication protection.
      
      Cc: stable@vger.kernel.org
      Fixes: 1df2bdba
      
       ("mac80211: never drop injected frames even if normally not allowed")
      Reported-by: default avatarThomas Deutschmann <whissi@gentoo.org>
      Tested-by: default avatarChristian Hesse <list@eworm.de>
      Tested-by: default avatarThomas Deutschmann <whissi@gentoo.org>
      Signed-off-by: default avatarMathy Vanhoef <Mathy.Vanhoef@kuleuven.be>
      Link: https://lore.kernel.org/r/20201019160113.350912-1-Mathy.Vanhoef@kuleuven.be
      
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      804fc6a2
    • Linus Torvalds's avatar
      Merge tag 'fallthrough-fixes-clang-5.10-rc2' of... · 07e08873
      Linus Torvalds authored
      Merge tag 'fallthrough-fixes-clang-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
      
      Pull fallthrough fix from Gustavo A. R. Silva:
       "This fixes a ton of fall-through warnings when building with Clang
        12.0.0 and -Wimplicit-fallthrough"
      
      * tag 'fallthrough-fixes-clang-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
        include: jhash/signal: Fix fall-through warnings for Clang
      07e08873
    • Linus Torvalds's avatar
      Merge tag 'net-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 934291ff
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Current release regressions:
      
         - r8169: fix forced threading conflicting with other shared
           interrupts; we tried to fix the use of raise_softirq_irqoff from an
           IRQ handler on RT by forcing hard irqs, but this driver shares
           legacy PCI IRQs so drop the _irqoff() instead
      
         - tipc: fix memory leak caused by a recent syzbot report fix to
           tipc_buf_append()
      
        Current release - bugs in new features:
      
         - devlink: Unlock on error in dumpit() and fix some error codes
      
         - net/smc: fix null pointer dereference in smc_listen_decline()
      
        Previous release - regressions:
      
         - tcp: Prevent low rmem stalls with SO_RCVLOWAT.
      
         - net: protect tcf_block_unbind with block lock
      
         - ibmveth: Fix use of ibmveth in a bridge; the self-imposed filtering
           to only send legal frames to the hypervisor was too strict
      
         - net: hns3: Clear the CMDQ registers before unmapping BAR region;
           incorrect cleanup order was leading to a crash
      
         - bnxt_en - handful of fixes to fixes:
            - Send HWRM_FUNC_RESET fw command unconditionally, even if there
              are PCIe errors being reported
            - Check abort error state in bnxt_open_nic().
            - Invoke cancel_delayed_work_sync() for PFs also.
            - Fix regression in workqueue cleanup logic in bnxt_remove_one().
      
         - mlxsw: Only advertise link modes supported by both driver and
           device, after removal of 56G support from the driver 56G was not
           cleared from advertised modes
      
         - net/smc: fix suppressed return code
      
        Previous release - always broken:
      
         - netem: fix zero division in tabledist, caused by integer overflow
      
         - bnxt_en: Re-write PCI BARs after PCI fatal error.
      
         - cxgb4: set up filter action after rewrites
      
         - net: ipa: command payloads already mapped
      
        Misc:
      
         - s390/ism: fix incorrect system EID, it's okay to change since it
           was added in current release
      
         - vsock: use ns_capable_noaudit() on socket create to suppress false
           positive audit messages"
      
      * tag 'net-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits)
        r8169: fix issue with forced threading in combination with shared interrupts
        netem: fix zero division in tabledist
        ibmvnic: fix ibmvnic_set_mac
        mptcp: add missing memory scheduling in the rx path
        tipc: fix memory leak caused by tipc_buf_append()
        gtp: fix an use-before-init in gtp_newlink()
        net: protect tcf_block_unbind with block lock
        ibmveth: Fix use of ibmveth in a bridge.
        net/sched: act_mpls: Add softdep on mpls_gso.ko
        ravb: Fix bit fields checking in ravb_hwtstamp_get()
        devlink: Unlock on error in dumpit()
        devlink: Fix some error codes
        chelsio/chtls: fix memory leaks in CPL handlers
        chelsio/chtls: fix deadlock issue
        net: hns3: Clear the CMDQ registers before unmapping BAR region
        bnxt_en: Send HWRM_FUNC_RESET fw command unconditionally.
        bnxt_en: Check abort error state in bnxt_open_nic().
        bnxt_en: Re-write PCI BARs after PCI fatal error.
        bnxt_en: Invoke cancel_delayed_work_sync() for PFs also.
        bnxt_en: Fix regression in workqueue cleanup logic in bnxt_remove_one().
        ...
      934291ff
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · b9c0f4bd
      Linus Torvalds authored
      Pull rdma fixes from Jason Gunthorpe:
       "The good news is people are testing rc1 in the RDMA world - the bad
        news is testing of the for-next area is not as good as I had hoped, as
        we really should have caught at least the rdma_connect_locked() issue
        before now.
      
        Notable merge window regressions that didn't get caught/fixed in time
        for rc1:
      
         - Fix in kernel users of rxe, they were broken by the rapid fix to
           undo the uABI breakage in rxe from another patch
      
         - EFA userspace needs to read the GID table but was broken with the
           new GID table logic
      
         - Fix user triggerable deadlock in mlx5 using devlink reload
      
         - Fix deadlock in several ULPs using rdma_connect from the CM handler
           callbacks
      
         - Memory leak in qedr"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
        RDMA/qedr: Fix memory leak in iWARP CM
        RDMA: Add rdma_connect_locked()
        RDMA/uverbs: Fix false error in query gid IOCTL
        RDMA/mlx5: Fix devlink deadlock on net namespace deletion
        RDMA/rxe: Fix small problem in network_type patch
      b9c0f4bd