Skip to content
  1. Nov 09, 2023
    • Eric Dumazet's avatar
      net_sched: sch_fq: better validate TCA_FQ_WEIGHTS and TCA_FQ_PRIOMAP · f1a3b283
      Eric Dumazet authored
      syzbot was able to trigger the following report while providing
      too small TCA_FQ_WEIGHTS attribute [1]
      
      Fix is to use NLA_POLICY_EXACT_LEN() to ensure user space
      provided correct sizes.
      
      Apply the same fix to TCA_FQ_PRIOMAP.
      
      [1]
      BUG: KMSAN: uninit-value in fq_load_weights net/sched/sch_fq.c:960 [inline]
      BUG: KMSAN: uninit-value in fq_change+0x1348/0x2fe0 net/sched/sch_fq.c:1071
      fq_load_weights net/sched/sch_fq.c:960 [inline]
      fq_change+0x1348/0x2fe0 net/sched/sch_fq.c:1071
      fq_init+0x68e/0x780 net/sched/sch_fq.c:1159
      qdisc_create+0x12f3/0x1be0 net/sched/sch_api.c:1326
      tc_modify_qdisc+0x11ef/0x2c20
      rtnetlink_rcv_msg+0x16a6/0x1840 net/core/rtnetlink.c:6558
      netlink_rcv_skb+0x371/0x650 net/netlink/af_netlink.c:2545
      rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6576
      netlink_unicast_kernel net/netlink/af_netlink.c:1342 [inline]
      netlink_unicast+0xf47/0x1250 net/netlink/af_netlink.c:1368
      netlink_sendmsg+0x1238/0x13d0 net/netlink/af_netlink.c:1910
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg net/socket.c:745 [inline]
      ____sys_sendmsg+0x9c2/0xd60 net/socket.c:2588
      ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2642
      __sys_sendmsg net/socket.c:2671 [inline]
      __do_sys_sendmsg net/socket.c:2680 [inline]
      __se_sys_sendmsg net/socket.c:2678 [inline]
      __x64_sys_sendmsg+0x307/0x490 net/socket.c:2678
      do_syscall_x64 arch/x86/entry/common.c:51 [inline]
      do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82
      entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      Uninit was created at:
      slab_post_alloc_hook+0x129/0xa70 mm/slab.h:768
      slab_alloc_node mm/slub.c:3478 [inline]
      kmem_cache_alloc_node+0x5e9/0xb10 mm/slub.c:3523
      kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:560
      __alloc_skb+0x318/0x740 net/core/skbuff.c:651
      alloc_skb include/linux/skbuff.h:1286 [inline]
      netlink_alloc_large_skb net/netlink/af_netlink.c:1214 [inline]
      netlink_sendmsg+0xb34/0x13d0 net/netlink/af_netlink.c:1885
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg net/socket.c:745 [inline]
      ____sys_sendmsg+0x9c2/0xd60 net/socket.c:2588
      ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2642
      __sys_sendmsg net/socket.c:2671 [inline]
      __do_sys_sendmsg net/socket.c:2680 [inline]
      __se_sys_sendmsg net/socket.c:2678 [inline]
      __x64_sys_sendmsg+0x307/0x490 net/socket.c:2678
      do_syscall_x64 arch/x86/entry/common.c:51 [inline]
      do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82
      entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      CPU: 1 PID: 5001 Comm: syz-executor300 Not tainted 6.6.0-syzkaller-12401-g8f6f76a6a29f #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023
      
      Fixes: 29f834aa ("net_sched: sch_fq: add 3 bands and WRR scheduling")
      Fixes: 49e7265f
      
       ("net_sched: sch_fq: add TCA_FQ_WEIGHTS attribute")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarJamal Hadi <Salim&lt;jhs@mojatatu.com>
      Link: https://lore.kernel.org/r/20231107160440.1992526-1-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f1a3b283
    • Jakub Kicinski's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 09699f19
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2023-11-06 (i40e)
      
      This series contains updates to i40e driver only.
      
      Ivan Vecera resolves a couple issues with devlink; removing a call to
      devlink_port_type_clear() and ensuring devlink port is unregistered
      after the net device.
      
      * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        i40e: Fix devlink port unregistering
        i40e: Do not call devlink_port_type_clear()
      ====================
      
      Link: https://lore.kernel.org/r/20231107003600.653796-1-anthony.l.nguyen@intel.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      09699f19
    • Jakub Kicinski's avatar
      net: kcm: fill in MODULE_DESCRIPTION() · 31356547
      Jakub Kicinski authored
      W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
      
      Link: https://lore.kernel.org/r/20231108020305.537293-1-kuba@kernel.org
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      31356547
    • Jakub Kicinski's avatar
      Merge tag 'nf-23-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 0613736e
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Add missing netfilter modules description to fix W=1, from Florian Westphal.
      
      2) Fix catch-all element GC with timeout when use with the pipapo set
         backend, this remained broken since I tried to fix it this summer,
         then another attempt to fix it recently.
      
      3) Add missing IPVS modules descriptions to fix W=1, also from Florian.
      
      4) xt_recent allocated a too small buffer to store an IPv4-mapped IPv6
         address which can be parsed by in6_pton(), from Maciej Zenczykowski.
         Broken for many releases.
      
      5) Skip IPv4-mapped IPv6, IPv4-compat IPv6, site/link local scoped IPv6
         addressses to set up IPv6 NAT redirect, also from Florian. This is
         broken since 2012.
      
      * tag 'nf-23-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nat: fix ipv6 nat redirect with mapped and scoped addresses
        netfilter: xt_recent: fix (increase) ipv6 literal buffer length
        ipvs: add missing module descriptions
        netfilter: nf_tables: remove catchall element in GC sync path
        netfilter: add missing module descriptions
      ====================
      
      Link: https://lore.kernel.org/r/20231108155802.84617-1-pablo@netfilter.org
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0613736e
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 942b8b38
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2023-11-08
      
      We've added 16 non-merge commits during the last 6 day(s) which contain
      a total of 30 files changed, 341 insertions(+), 130 deletions(-).
      
      The main changes are:
      
      1) Fix a BPF verifier issue in precision tracking for BPF_ALU | BPF_TO_BE |
         BPF_END where the source register was incorrectly marked as precise,
         from Shung-Hsi Yu.
      
      2) Fix a concurrency issue in bpf_timer where the former could still have
         been alive after an application releases or unpins the map, from Hou Tao.
      
      3) Fix a BPF verifier issue where immediates are incorrectly cast to u32
         before being spilled and therefore losing sign information, from Hao Sun.
      
      4) Fix a misplaced BPF_TRACE_ITER in check_css_task_iter_allowlist which
         incorrectly compared bpf_prog_type with bpf_attach_type, from Chuyi Zhou.
      
      5) Add __bpf_hook_{start,end} as well as __bpf_kfunc_{start,end}_defs macros,
         migrate all BPF-related __diag callsites over to it, and add a new
         __diag_ignore_all for -Wmissing-declarations to the macros to address
         recent build warnings, from Dave Marchevsky.
      
      6) Fix broken BPF selftest build of xdp_hw_metadata test on architectures
         where char is not signed, from Björn Töpel.
      
      7) Fix test_maps selftest to properly use LIBBPF_OPTS() macro to initialize
         the bpf_map_create_opts, from Andrii Nakryiko.
      
      8) Fix bpffs selftest to avoid unmounting /sys/kernel/debug as it may have
         been mounted and used by other applications already, from Manu Bretelle.
      
      9) Fix a build issue without CONFIG_CGROUPS wrt css_task open-coded
         iterators, from Matthieu Baerts.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        selftests/bpf: get trusted cgrp from bpf_iter__cgroup directly
        bpf: Let verifier consider {task,cgroup} is trusted in bpf_iter_reg
        selftests/bpf: Fix broken build where char is unsigned
        selftests/bpf: precision tracking test for BPF_NEG and BPF_END
        bpf: Fix precision tracking for BPF_ALU | BPF_TO_BE | BPF_END
        selftests/bpf: Add test for using css_task iter in sleepable progs
        selftests/bpf: Add tests for css_task iter combining with cgroup iter
        bpf: Relax allowlist for css_task iter
        selftests/bpf: fix test_maps' use of bpf_map_create_opts
        bpf: Check map->usercnt after timer->timer is assigned
        bpf: Add __bpf_hook_{start,end} macros
        bpf: Add __bpf_kfunc_{start,end}_defs macros
        selftests/bpf: fix test_bpffs
        selftests/bpf: Add test for immediate spilled to stack
        bpf: Fix check_stack_write_fixed_off() to correctly spill imm
        bpf: fix compilation error without CGROUPS
      ====================
      
      Link: https://lore.kernel.org/r/20231108132448.1970-1-daniel@iogearbox.net
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      942b8b38
    • Vlad Buslov's avatar
      net/sched: act_ct: Always fill offloading tuple iifidx · 9bc64bd0
      Vlad Buslov authored
      Referenced commit doesn't always set iifidx when offloading the flow to
      hardware. Fix the following cases:
      
      - nf_conn_act_ct_ext_fill() is called before extension is created with
      nf_conn_act_ct_ext_add() in tcf_ct_act(). This can cause rule offload with
      unspecified iifidx when connection is offloaded after only single
      original-direction packet has been processed by tc data path. Always fill
      the new nf_conn_act_ct_ext instance after creating it in
      nf_conn_act_ct_ext_add().
      
      - Offloading of unidirectional UDP NEW connections is now supported, but ct
      flow iifidx field is not updated when connection is promoted to
      bidirectional which can result reply-direction iifidx to be zero when
      refreshing the connection. Fill in the extension and update flow iifidx
      before calling flow_offload_refresh().
      
      Fixes: 9795ded7
      
       ("net/sched: act_ct: Fill offloading tuple iifidx")
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Fixes: 6a9bad00 ("net/sched: act_ct: offload UDP NEW connections")
      Link: https://lore.kernel.org/r/20231103151410.764271-1-vladbu@nvidia.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9bc64bd0
  2. Nov 08, 2023
  3. Nov 07, 2023
  4. Nov 06, 2023
    • David S. Miller's avatar
      Merge branch 'smc-fixes' · c1ed833e
      David S. Miller authored
      
      
      D. Wythe says
      
      ====================
      bugfixs for smc
      
      This patches includes bugfix following:
      
      1. hung state
      2. sock leak
      3. potential panic
      
      We have been testing these patches for some time, but
      if you have any questions, please let us know.
      
      --
      v1:
      Fix spelling errors and incorrect function names in descriptions
      
      v2->v1:
      Add fix tags for bugfix patch
      ====================
      
      Reviewed-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1ed833e
    • D. Wythe's avatar
      net/smc: put sk reference if close work was canceled · aa96fbd6
      D. Wythe authored
      Note that we always hold a reference to sock when attempting
      to submit close_work. Therefore, if we have successfully
      canceled close_work from pending, we MUST release that reference
      to avoid potential leaks.
      
      Fixes: 42bfba9e
      
       ("net/smc: immediate termination for SMCD link groups")
      Signed-off-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
      Reviewed-by: default avatarDust Li <dust.li@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa96fbd6
    • D. Wythe's avatar
      net/smc: allow cdc msg send rather than drop it with NULL sndbuf_desc · c5bf605b
      D. Wythe authored
      This patch re-fix the issues mentioned by commit 22a825c5
      ("net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()").
      
      Blocking sending message do solve the issues though, but it also
      prevents the peer to receive the final message. Besides, in logic,
      whether the sndbuf_desc is NULL or not have no impact on the processing
      of cdc message sending.
      
      Hence that, this patch allows the cdc message sending but to check the
      sndbuf_desc with care in smc_cdc_tx_handler().
      
      Fixes: 22a825c5
      
       ("net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()")
      Signed-off-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
      Reviewed-by: default avatarDust Li <dust.li@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5bf605b
    • D. Wythe's avatar
      net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT · 5211c972
      D. Wythe authored
      Considering scenario:
      
      				smc_cdc_rx_handler
      __smc_release
      				sock_set_flag
      smc_close_active()
      sock_set_flag
      
      __set_bit(DEAD)			__set_bit(DONE)
      
      Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
      if the DEAD flag lost, the state SMC_CLOSED  will be never be reached
      in smc_close_passive_work:
      
      if (sock_flag(sk, SOCK_DEAD) &&
      	smc_close_sent_any_close(conn)) {
      	sk->sk_state = SMC_CLOSED;
      } else {
      	/* just shutdown, but not yet closed locally */
      	sk->sk_state = SMC_APPFINCLOSEWAIT;
      }
      
      Replace sock_set_flags or __set_bit to set_bit will fix this problem.
      Since set_bit is atomic.
      
      Fixes: b38d7324
      
       ("smc: socket closing and linkgroup cleanup")
      Signed-off-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
      Reviewed-by: default avatarDust Li <dust.li@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5211c972
    • Jakub Kicinski's avatar
      nfsd: regenerate user space parsers after ynl-gen changes · d93f9528
      Jakub Kicinski authored
      Commit 8cea95b0
      
       ("tools: ynl-gen: handle do ops with no input attrs")
      added support for some of the previously-skipped ops in nfsd.
      Regenerate the user space parsers to fill them in.
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Acked-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d93f9528
    • Kuniyuki Iwashima's avatar
      tcp: Fix SYN option room calculation for TCP-AO. · 0a8e987d
      Kuniyuki Iwashima authored
      When building SYN packet in tcp_syn_options(), MSS, TS, WS, and
      SACKPERM are used without checking the remaining bytes in the
      options area.
      
      To keep that logic as is, we limit the TCP-AO MAC length in
      tcp_ao_parse_crypto().  Currently, the limit is calculated as below.
      
        MAX_TCP_OPTION_SPACE - TCPOLEN_TSTAMP_ALIGNED
                             - TCPOLEN_WSCALE_ALIGNED
                             - TCPOLEN_SACKPERM_ALIGNED
      
      This looks confusing as (1) we pack SACKPERM into the leading
      2-bytes of the aligned 12-bytes of TS and (2) TCPOLEN_MSS_ALIGNED
      is not used.  Fortunately, the calculated limit is not wrong as
      TCPOLEN_SACKPERM_ALIGNED and TCPOLEN_MSS_ALIGNED are the same value.
      
      However, we should use the proper constant in the formula.
      
        MAX_TCP_OPTION_SPACE - TCPOLEN_MSS_ALIGNED
                             - TCPOLEN_TSTAMP_ALIGNED
                             - TCPOLEN_WSCALE_ALIGNED
      
      Fixes: 4954f17d
      
       ("net/tcp: Introduce TCP_AO setsockopt()s")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarDmitry Safonov <dima@arista.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a8e987d
    • Geetha sowjanya's avatar
      octeontx2-pf: Free pending and dropped SQEs · 3423ca23
      Geetha sowjanya authored
      On interface down, the pending SQEs in the NIX get dropped
      or drained out during SMQ flush. But skb's pointed by these
      SQEs never get free or updated to the stack as respective CQE
      never get added.
      This patch fixes the issue by freeing all valid skb's in SQ SG list.
      
      Fixes: b1bc8457
      
       ("octeontx2-pf: Cleanup all receive buffers in SG descriptor")
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3423ca23
    • Jamal Hadi Salim's avatar
      net, sched: Fix SKB_NOT_DROPPED_YET splat under debug config · 40cb2fdf
      Jamal Hadi Salim authored
      Getting the following splat [1] with CONFIG_DEBUG_NET=y and this
      reproducer [2]. Problem seems to be that classifiers clear 'struct
      tcf_result::drop_reason', thereby triggering the warning in
      __kfree_skb_reason() due to reason being 'SKB_NOT_DROPPED_YET' (0).
      
      Fixed by disambiguating a legit error from a verdict with a bogus drop_reason
      
      [1]
      WARNING: CPU: 0 PID: 181 at net/core/skbuff.c:1082 kfree_skb_reason+0x38/0x130
      Modules linked in:
      CPU: 0 PID: 181 Comm: mausezahn Not tainted 6.6.0-rc6-custom-ge43e6d9582e0 #682
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
      RIP: 0010:kfree_skb_reason+0x38/0x130
      [...]
      Call Trace:
       <IRQ>
       __netif_receive_skb_core.constprop.0+0x837/0xdb0
       __netif_receive_skb_one_core+0x3c/0x70
       process_backlog+0x95/0x130
       __napi_poll+0x25/0x1b0
       net_rx_action+0x29b/0x310
       __do_softirq+0xc0/0x29b
       do_softirq+0x43/0x60
       </IRQ>
      
      [2]
      
      ip link add name veth0 type veth peer name veth1
      ip link set dev veth0 up
      ip link set dev veth1 up
      tc qdisc add dev veth1 clsact
      tc filter add dev veth1 ingress pref 1 proto all flower dst_mac 00:11:22:33:44:55 action drop
      mausezahn veth0 -a own -b 00:11:22:33:44:55 -q -c 1
      
      Ido reported:
      
        [...] getting the following splat [1] with CONFIG_DEBUG_NET=y and this
        reproducer [2]. Problem seems to be that classifiers clear 'struct
        tcf_result::drop_reason', thereby triggering the warning in
        __kfree_skb_reason() due to reason being 'SKB_NOT_DROPPED_YET' (0). [...]
      
        [1]
        WARNING: CPU: 0 PID: 181 at net/core/skbuff.c:1082 kfree_skb_reason+0x38/0x130
        Modules linked in:
        CPU: 0 PID: 181 Comm: mausezahn Not tainted 6.6.0-rc6-custom-ge43e6d9582e0 #682
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
        RIP: 0010:kfree_skb_reason+0x38/0x130
        [...]
        Call Trace:
         <IRQ>
         __netif_receive_skb_core.constprop.0+0x837/0xdb0
         __netif_receive_skb_one_core+0x3c/0x70
         process_backlog+0x95/0x130
         __napi_poll+0x25/0x1b0
         net_rx_action+0x29b/0x310
         __do_softirq+0xc0/0x29b
         do_softirq+0x43/0x60
         </IRQ>
      
        [2]
        #!/bin/bash
      
        ip link add name veth0 type veth peer name veth1
        ip link set dev veth0 up
        ip link set dev veth1 up
        tc qdisc add dev veth1 clsact
        tc filter add dev veth1 ingress pref 1 proto all flower dst_mac 00:11:22:33:44:55 action drop
        mausezahn veth0 -a own -b 00:11:22:33:44:55 -q -c 1
      
      What happens is that inside most classifiers the tcf_result is copied over
      from a filter template e.g. *res = f->res which then implicitly overrides
      the prior SKB_DROP_REASON_TC_{INGRESS,EGRESS} default drop code which was
      set via sch_handle_{ingress,egress}() for kfree_skb_reason().
      
      Commit text above copied verbatim from Daniel. The general idea of the patch
      is not very different from what Ido originally posted but instead done at the
      cls_api codepath.
      
      Fixes: 54a59aed
      
       ("net, sched: Make tc-related drop reason more flexible")
      Reported-by: default avatarIdo Schimmel <idosch@idosch.org>
      Signed-off-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Link: https://lore.kernel.org/netdev/ZTjY959R+AFXf3Xy@shredder
      
      
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40cb2fdf
  5. Nov 03, 2023