Skip to content
  1. Jan 02, 2022
  2. Jan 01, 2022
    • Haimin Zhang's avatar
      net ticp:fix a kernel-infoleak in __tipc_sendmsg() · d6d86830
      Haimin Zhang authored
      
      
      struct tipc_socket_addr.ref has a 4-byte hole,and __tipc_getname() currently
      copying it to user space,causing kernel-infoleak.
      
      BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:121 [inline]
      BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:121 [inline] lib/usercopy.c:33
      BUG: KMSAN: kernel-infoleak in _copy_to_user+0x1c9/0x270 lib/usercopy.c:33 lib/usercopy.c:33
       instrument_copy_to_user include/linux/instrumented.h:121 [inline]
       instrument_copy_to_user include/linux/instrumented.h:121 [inline] lib/usercopy.c:33
       _copy_to_user+0x1c9/0x270 lib/usercopy.c:33 lib/usercopy.c:33
       copy_to_user include/linux/uaccess.h:209 [inline]
       copy_to_user include/linux/uaccess.h:209 [inline] net/socket.c:287
       move_addr_to_user+0x3f6/0x600 net/socket.c:287 net/socket.c:287
       __sys_getpeername+0x470/0x6b0 net/socket.c:1987 net/socket.c:1987
       __do_sys_getpeername net/socket.c:1997 [inline]
       __se_sys_getpeername net/socket.c:1994 [inline]
       __do_sys_getpeername net/socket.c:1997 [inline] net/socket.c:1994
       __se_sys_getpeername net/socket.c:1994 [inline] net/socket.c:1994
       __x64_sys_getpeername+0xda/0x120 net/socket.c:1994 net/socket.c:1994
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_x64 arch/x86/entry/common.c:51 [inline] arch/x86/entry/common.c:82
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 arch/x86/entry/common.c:82
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Uninit was stored to memory at:
       tipc_getname+0x575/0x5e0 net/tipc/socket.c:757 net/tipc/socket.c:757
       __sys_getpeername+0x3b3/0x6b0 net/socket.c:1984 net/socket.c:1984
       __do_sys_getpeername net/socket.c:1997 [inline]
       __se_sys_getpeername net/socket.c:1994 [inline]
       __do_sys_getpeername net/socket.c:1997 [inline] net/socket.c:1994
       __se_sys_getpeername net/socket.c:1994 [inline] net/socket.c:1994
       __x64_sys_getpeername+0xda/0x120 net/socket.c:1994 net/socket.c:1994
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_x64 arch/x86/entry/common.c:51 [inline] arch/x86/entry/common.c:82
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 arch/x86/entry/common.c:82
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Uninit was stored to memory at:
       msg_set_word net/tipc/msg.h:212 [inline]
       msg_set_destport net/tipc/msg.h:619 [inline]
       msg_set_word net/tipc/msg.h:212 [inline] net/tipc/socket.c:1486
       msg_set_destport net/tipc/msg.h:619 [inline] net/tipc/socket.c:1486
       __tipc_sendmsg+0x44fa/0x5890 net/tipc/socket.c:1486 net/tipc/socket.c:1486
       tipc_sendmsg+0xeb/0x140 net/tipc/socket.c:1402 net/tipc/socket.c:1402
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg net/socket.c:724 [inline]
       sock_sendmsg_nosec net/socket.c:704 [inline] net/socket.c:2409
       sock_sendmsg net/socket.c:724 [inline] net/socket.c:2409
       ____sys_sendmsg+0xe11/0x12c0 net/socket.c:2409 net/socket.c:2409
       ___sys_sendmsg net/socket.c:2463 [inline]
       ___sys_sendmsg net/socket.c:2463 [inline] net/socket.c:2492
       __sys_sendmsg+0x704/0x840 net/socket.c:2492 net/socket.c:2492
       __do_sys_sendmsg net/socket.c:2501 [inline]
       __se_sys_sendmsg net/socket.c:2499 [inline]
       __do_sys_sendmsg net/socket.c:2501 [inline] net/socket.c:2499
       __se_sys_sendmsg net/socket.c:2499 [inline] net/socket.c:2499
       __x64_sys_sendmsg+0xe2/0x120 net/socket.c:2499 net/socket.c:2499
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_x64 arch/x86/entry/common.c:51 [inline] arch/x86/entry/common.c:82
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 arch/x86/entry/common.c:82
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Local variable skaddr created at:
       __tipc_sendmsg+0x2d0/0x5890 net/tipc/socket.c:1419 net/tipc/socket.c:1419
       tipc_sendmsg+0xeb/0x140 net/tipc/socket.c:1402 net/tipc/socket.c:1402
      
      Bytes 4-7 of 16 are uninitialized
      Memory access of size 16 starts at ffff888113753e00
      Data copied to user address 0000000020000280
      
      Reported-by: default avatar <syzbot+cdbd40e0c3ca02cae3b7@syzkaller.appspotmail.com>
      Signed-off-by: default avatarHaimin Zhang <tcs_kernel@tencent.com>
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Link: https://lore.kernel.org/r/1640918123-14547-1-git-send-email-tcs.kernel@gmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d6d86830
    • Jianguo Wu's avatar
      selftests: net: udpgro_fwd.sh: explicitly checking the available ping feature · 5e75d0b2
      Jianguo Wu authored
      As Paolo pointed out, the result of ping IPv6 address depends on
      the running distro. So explicitly checking the available ping feature,
      as e.g. do the bareudp.sh self-tests.
      
      Fixes: 8b3170e0
      
       ("selftests: net: using ping6 for IPv6 in udpgro_fwd.sh")
      Signed-off-by: default avatarJianguo Wu <wujianguo@chinatelecom.cn>
      Link: https://lore.kernel.org/r/825ee22b-4245-dbf7-d2f7-a230770d6e21@163.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5e75d0b2
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 0f1fe7b8
      Jakub Kicinski authored
      
      
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2021-12-31
      
      We've added 2 non-merge commits during the last 14 day(s) which contain
      a total of 2 files changed, 3 insertions(+), 3 deletions(-).
      
      The main changes are:
      
      1) Revert of an earlier attempt to fix xsk's poll() behavior where it
         turned out that the fix for a rare problem made it much worse in
         general, from Magnus Karlsson. (Fyi, Magnus mentioned that a proper
         fix is coming early next year, so the revert is mainly to avoid
         slipping the behavior into 5.16.)
      
      2) Minor misc spell fix in BPF selftests, from Colin Ian King.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        bpf, selftests: Fix spelling mistake "tained" -> "tainted"
        Revert "xsk: Do not sleep in poll() when need_wakeup set"
      ====================
      
      Link: https://lore.kernel.org/r/20211231160050.16105-1-daniel@iogearbox.net
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0f1fe7b8
  3. Dec 31, 2021
    • David S. Miller's avatar
      Merge branch 'mpr-len-checks' · 4760abaa
      David S. Miller authored
      
      David Ahern says:
      
      ====================
      net: Length checks for attributes within multipath routes
      
      Add length checks for attributes within a multipath route (attributes
      within RTA_MULTIPATH). Motivated by the syzbot report in patch 1 and
      then expanded to other attributes as noted by Ido.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4760abaa
    • David Ahern's avatar
      lwtunnel: Validate RTA_ENCAP_TYPE attribute length · 8bda81a4
      David Ahern authored
      lwtunnel_valid_encap_type_attr is used to validate encap attributes
      within a multipath route. Add length validation checking to the type.
      
      lwtunnel_valid_encap_type_attr is called converting attributes to
      fib{6,}_config struct which means it is used before fib_get_nhs,
      ip6_route_multipath_add, and ip6_route_multipath_del - other
      locations that use rtnh_ok and then nla_get_u16 on RTA_ENCAP_TYPE
      attribute.
      
      Fixes: 9ed59592
      
       ("lwtunnel: fix autoload of lwt modules")
      
      Signed-off-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8bda81a4
    • David Ahern's avatar
      ipv6: Check attribute length for RTA_GATEWAY when deleting multipath route · 1ff15a71
      David Ahern authored
      Make sure RTA_GATEWAY for IPv6 multipath route has enough bytes to hold
      an IPv6 address.
      
      Fixes: 6b9ea5a6
      
       ("ipv6: fix multipath route replace error recovery")
      Signed-off-by: default avatarDavid Ahern <dsahern@kernel.org>
      Cc: Roopa Prabhu <roopa@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ff15a71
    • David Ahern's avatar
      ipv6: Check attribute length for RTA_GATEWAY in multipath route · 4619bcf9
      David Ahern authored
      Commit referenced in the Fixes tag used nla_memcpy for RTA_GATEWAY as
      does the current nla_get_in6_addr. nla_memcpy protects against accessing
      memory greater than what is in the attribute, but there is no check
      requiring the attribute to have an IPv6 address. Add it.
      
      Fixes: 51ebd318
      
       ("ipv6: add support of equal cost multipath (ECMP)")
      Signed-off-by: default avatarDavid Ahern <dsahern@kernel.org>
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4619bcf9
    • David Ahern's avatar
      ipv4: Check attribute length for RTA_FLOW in multipath route · 664b9c4b
      David Ahern authored
      Make sure RTA_FLOW is at least 4B before using.
      
      Fixes: 4e902c57
      
       ("[IPv4]: FIB configuration using struct fib_config")
      Signed-off-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      664b9c4b
    • David Ahern's avatar
      ipv4: Check attribute length for RTA_GATEWAY in multipath route · 7a3429ba
      David Ahern authored
      syzbot reported uninit-value:
      ============================================================
        BUG: KMSAN: uninit-value in fib_get_nhs+0xac4/0x1f80
        net/ipv4/fib_semantics.c:708
         fib_get_nhs+0xac4/0x1f80 net/ipv4/fib_semantics.c:708
         fib_create_info+0x2411/0x4870 net/ipv4/fib_semantics.c:1453
         fib_table_insert+0x45c/0x3a10 net/ipv4/fib_trie.c:1224
         inet_rtm_newroute+0x289/0x420 net/ipv4/fib_frontend.c:886
      
      Add helper to validate RTA_GATEWAY length before using the attribute.
      
      Fixes: 4e902c57
      
       ("[IPv4]: FIB configuration using struct fib_config")
      Reported-by: default avatar <syzbot+d4b9a2851cc3ce998741@syzkaller.appspotmail.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@kernel.org>
      Cc: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a3429ba
    • Linus Torvalds's avatar
      Merge tag 'net-5.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 74c78b42
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from.. Santa?
      
        No regressions on our radar at this point. The igc problem fixed here
        was the last one I was tracking but it was broken in previous
        releases, anyway. Mostly driver fixes and a couple of largish SMC
        fixes.
      
        Current release - regressions:
      
         - xsk: initialise xskb free_list_node, fixup for a -rc7 fix
      
        Current release - new code bugs:
      
         - mlx5: handful of minor fixes:
      
         - use first online CPU instead of hard coded CPU
      
         - fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'
      
         - fix skb memory leak when TC classifier action offloads are disabled
      
         - fix memory leak with rules with internal OvS port
      
        Previous releases - regressions:
      
         - igc: do not enable crosstimestamping for i225-V models
      
        Previous releases - always broken:
      
         - udp: use datalen to cap ipv6 udp max gso segments
      
         - fix use-after-free in tw_timer_handler due to early free of stats
      
         - smc: fix kernel panic caused by race of smc_sock
      
         - smc: don't send CDC/LLC message if link not ready, avoid timeouts
      
         - sctp: use call_rcu to free endpoint, avoid UAF in sock diag
      
         - bridge: mcast: add and enforce query interval minimum
      
         - usb: pegasus: do not drop long Ethernet frames
      
         - mlx5e: fix ICOSQ recovery flow for XSK
      
         - nfc: uapi: use kernel size_t to fix user-space builds"
      
      * tag 'net-5.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
        fsl/fman: Fix missing put_device() call in fman_port_probe
        selftests: net: using ping6 for IPv6 in udpgro_fwd.sh
        Documentation: fix outdated interpretation of ip_no_pmtu_disc
        net/ncsi: check for error return from call to nla_put_u32
        net: bridge: mcast: fix br_multicast_ctx_vlan_global_disabled helper
        net: fix use-after-free in tw_timer_handler
        selftests: net: Fix a typo in udpgro_fwd.sh
        selftests/net: udpgso_bench_tx: fix dst ip argument
        net: bridge: mcast: add and enforce startup query interval minimum
        net: bridge: mcast: add and enforce query interval minimum
        ipv6: raw: check passed optlen before reading
        xsk: Initialise xskb free_list_node
        net/mlx5e: Fix wrong features assignment in case of error
        net/mlx5e: TC, Fix memory leak with rules with internal port
        ionic: Initialize the 'lif->dbid_inuse' bitmap
        igc: Fix TX timestamp support for non-MSI-X platforms
        igc: Do not enable crosstimestamping for i225-V models
        net/smc: fix kernel panic caused by race of smc_sock
        net/smc: don't send CDC/LLC message if link not ready
        NFC: st21nfca: Fix memory leak in device probe and remove
        ...
      74c78b42
    • Linus Torvalds's avatar
      Merge tag 'char-misc-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 9bad743e
      Linus Torvalds authored
      Pull char/misc fixes from Greg KH:
       "Here are two misc driver fixes for 5.16-final:
      
         - binder accounting fix to resolve reported problem
      
         - nitro_enclaves fix for mmap assert warning output
      
        Both of these have been for over a week with no reported issues"
      
      * tag 'char-misc-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        nitro_enclaves: Use get_user_pages_unlocked() call to handle mmap assert
        binder: fix async_free_space accounting for empty parcels
      9bad743e
    • Linus Torvalds's avatar
      Merge tag 'usb-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 2d40060b
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are some small USB driver fixes for 5.16 to resolve some reported
        problems:
      
         - mtu3 driver fixes
      
         - typec ucsi driver fix
      
         - xhci driver quirk added
      
         - usb gadget f_fs fix for reported crash
      
        All of these have been in linux-next for a while with no reported
        problems"
      
      * tag 'usb-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        usb: typec: ucsi: Only check the contract if there is a connection
        xhci: Fresco FL1100 controller should not have BROKEN_MSI quirk set.
        usb: mtu3: set interval of FS intr and isoc endpoint
        usb: mtu3: fix list_head check warning
        usb: mtu3: add memory barrier before set GPD's HWO
        usb: mtu3: fix interval value for intr and isoc
        usb: gadget: f_fs: Clear ffs_eventfd in ffs_data_clear.
      2d40060b
  4. Dec 30, 2021
    • Miaoqian Lin's avatar
      fsl/fman: Fix missing put_device() call in fman_port_probe · bf2b09fe
      Miaoqian Lin authored
      The reference taken by 'of_find_device_by_node()' must be released when
      not needed anymore.
      Add the corresponding 'put_device()' in the and error handling paths.
      
      Fixes: 18a6c85f
      
       ("fsl/fman: Add FMan Port Support")
      Signed-off-by: default avatarMiaoqian Lin <linmq006@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf2b09fe
    • Jianguo Wu's avatar
      selftests: net: using ping6 for IPv6 in udpgro_fwd.sh · 8b3170e0
      Jianguo Wu authored
      udpgro_fwd.sh output following message:
        ping: 2001:db8:1::100: Address family for hostname not supported
      
      Using ping6 when pinging IPv6 addresses.
      
      Fixes: a062260a
      
       ("selftests: net: add UDP GRO forwarding self-tests")
      Signed-off-by: default avatarJianguo Wu <wujianguo@chinatelecom.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b3170e0
    • xu xin's avatar
      Documentation: fix outdated interpretation of ip_no_pmtu_disc · be1c5b53
      xu xin authored
      The updating way of pmtu has changed, but documentation is still in the
      old way. So this patch updates the interpretation of ip_no_pmtu_disc and
      min_pmtu.
      
      See commit 28d35bcd
      
       ("net: ipv4: don't let PMTU updates increase
      route MTU")
      
      Reported-by: default avatarZeal Robot <zealci@zte.com.cn>
      Signed-off-by: default avatarxu xin <xu.xin16@zte.com.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be1c5b53
    • Jakub Kicinski's avatar
      Merge tag 'mlx5-fixes-2021-12-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · ccc0c9be
      Jakub Kicinski authored
      
      
      Saeed Mahameed says:
      
      ====================
      mlx5 fixes 2021-12-28
      
      This series provides bug fixes to mlx5 driver.
      
      * tag 'mlx5-fixes-2021-12-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
        net/mlx5e: Fix wrong features assignment in case of error
        net/mlx5e: TC, Fix memory leak with rules with internal port
      ====================
      
      Link: https://lore.kernel.org/r/20211229065352.30178-1-saeed@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ccc0c9be
    • 蒋家盛's avatar
      net/ncsi: check for error return from call to nla_put_u32 · 92a34ab1
      蒋家盛 authored
      As we can see from the comment of the nla_put() that it could return
      -EMSGSIZE if the tailroom of the skb is insufficient.
      Therefore, it should be better to check the return value of the
      nla_put_u32 and return the error code if error accurs.
      Also, there are many other functions have the same problem, and if this
      patch is correct, I will commit a new version to fix all.
      
      Fixes: 955dc68c
      
       ("net/ncsi: Add generic netlink family")
      Signed-off-by: default avatarJiasheng Jiang <jiasheng@iscas.ac.cn>
      Link: https://lore.kernel.org/r/20211229032118.1706294-1-jiasheng@iscas.ac.cn
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      92a34ab1
    • Nikolay Aleksandrov's avatar
      net: bridge: mcast: fix br_multicast_ctx_vlan_global_disabled helper · 168fed98
      Nikolay Aleksandrov authored
      We need to first check if the context is a vlan one, then we need to
      check the global bridge multicast vlan snooping flag, and finally the
      vlan's multicast flag, otherwise we will unnecessarily enable vlan mcast
      processing (e.g. querier timers).
      
      Fixes: 7b54aaaf
      
       ("net: bridge: multicast: add vlan state initialization and control")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Link: https://lore.kernel.org/r/20211228153142.536969-1-nikolay@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      168fed98
    • Muchun Song's avatar
      net: fix use-after-free in tw_timer_handler · e22e45fc
      Muchun Song authored
      A real world panic issue was found as follow in Linux 5.4.
      
          BUG: unable to handle page fault for address: ffffde49a863de28
          PGD 7e6fe62067 P4D 7e6fe62067 PUD 7e6fe63067 PMD f51e064067 PTE 0
          RIP: 0010:tw_timer_handler+0x20/0x40
          Call Trace:
           <IRQ>
           call_timer_fn+0x2b/0x120
           run_timer_softirq+0x1ef/0x450
           __do_softirq+0x10d/0x2b8
           irq_exit+0xc7/0xd0
           smp_apic_timer_interrupt+0x68/0x120
           apic_timer_interrupt+0xf/0x20
      
      This issue was also reported since 2017 in the thread [1],
      unfortunately, the issue was still can be reproduced after fixing
      DCCP.
      
      The ipv4_mib_exit_net is called before tcp_sk_exit_batch when a net
      namespace is destroyed since tcp_sk_ops is registered befrore
      ipv4_mib_ops, which means tcp_sk_ops is in the front of ipv4_mib_ops
      in the list of pernet_list. There will be a use-after-free on
      net->mib.net_statistics in tw_timer_handler after ipv4_mib_exit_net
      if there are some inflight time-wait timers.
      
      This bug is not introduced by commit f2bf415c ("mib: add net to
      NET_ADD_STATS_BH") since the net_statistics is a global variable
      instead of dynamic allocation and freeing. Actually, commit
      61a7e260 ("mib: put net statistics on struct net") introduces
      the bug since it put net statistics on struct net and free it when
      net namespace is destroyed.
      
      Moving init_ipv4_mibs() to the front of tcp_init() to fix this bug
      and replace pr_crit() with panic() since continuing is meaningless
      when init_ipv4_mibs() fails.
      
      [1] https://groups.google.com/g/syzkaller/c/p1tn-_Kc6l4/m/smuL_FMAAgAJ?pli=1
      
      Fixes: 61a7e260
      
       ("mib: put net statistics on struct net")
      Signed-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: Cong Wang <cong.wang@bytedance.com>
      Cc: Fam Zheng <fam.zheng@bytedance.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20211228104145.9426-1-songmuchun@bytedance.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e22e45fc
    • Jianguo Wu's avatar
      selftests: net: Fix a typo in udpgro_fwd.sh · add25d6d
      Jianguo Wu authored
      $rvs -> $rcv
      
      Fixes: a062260a
      
       ("selftests: net: add UDP GRO forwarding self-tests")
      Signed-off-by: default avatarJianguo Wu <wujianguo@chinatelecom.cn>
      Link: https://lore.kernel.org/r/d247d7c8-a03a-0abf-3c71-4006a051d133@163.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      add25d6d
    • wujianguo's avatar
      selftests/net: udpgso_bench_tx: fix dst ip argument · 9c1952ae
      wujianguo authored
      udpgso_bench_tx call setup_sockaddr() for dest address before
      parsing all arguments, if we specify "-p ${dst_port}" after "-D ${dst_ip}",
      then ${dst_port} will be ignored, and using default cfg_port 8000.
      
      This will cause test case "multiple GRO socks" failed in udpgro.sh.
      
      Setup sockaddr after parsing all arguments.
      
      Fixes: 3a687bef
      
       ("selftests: udp gso benchmark")
      Signed-off-by: default avatarJianguo Wu <wujianguo@chinatelecom.cn>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/ff620d9f-5b52-06ab-5286-44b945453002@163.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9c1952ae
    • Jakub Kicinski's avatar
      Merge branch 'net-bridge-mcast-add-and-enforce-query-interval-minimum' · f7397cd2
      Jakub Kicinski authored
      
      
      Nikolay Aleksandrov says:
      
      ====================
      net: bridge: mcast: add and enforce query interval minimum
      
      This set adds and enforces 1 second minimum value for bridge multicast
      query and startup query intervals in order to avoid rearming the timers
      too often which could lock and crash the host. I doubt anyone is using
      such low values or anything lower than 1 second, so it seems like a good
      minimum. In order to be compatible if the value is lower then it is
      overwritten and a log message is emitted, since we can't return an error
      at this point.
      
      Eric, I looked for the syzbot reports in its dashboard but couldn't find
      them so I've added you as the reporter.
      
      I've prepared a global bridge igmp rate limiting patch but wasn't
      sure if it's ok for -net. It adds a static limit of 32k packets per
      second, I plan to send it for net-next with added drop counters for
      each bridge so it can be easily debugged.
      
      Original report can be seen at:
      https://lore.kernel.org/netdev/e8b9ce41-57b9-b6e2-a46a-ff9c791cf0ba@gmail.com/
      ====================
      
      Link: https://lore.kernel.org/r/20211227172116.320768-1-nikolay@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f7397cd2
    • Nikolay Aleksandrov's avatar
      net: bridge: mcast: add and enforce startup query interval minimum · f83a112b
      Nikolay Aleksandrov authored
      As reported[1] if startup query interval is set too low in combination with
      large number of startup queries and we have multiple bridges or even a
      single bridge with multiple querier vlans configured we can crash the
      machine. Add a 1 second minimum which must be enforced by overwriting the
      value if set lower (i.e. without returning an error) to avoid breaking
      user-space. If that happens a log message is emitted to let the admin know
      that the startup interval has been set to the minimum. It doesn't make
      sense to make the startup interval lower than the normal query interval
      so use the same value of 1 second. The issue has been present since these
      intervals could be user-controlled.
      
      [1] https://lore.kernel.org/netdev/e8b9ce41-57b9-b6e2-a46a-ff9c791cf0ba@gmail.com/
      
      Fixes: d902eee4
      
       ("bridge: Add multicast count/interval sysfs entries")
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f83a112b
    • Nikolay Aleksandrov's avatar
      net: bridge: mcast: add and enforce query interval minimum · 99b40610
      Nikolay Aleksandrov authored
      As reported[1] if query interval is set too low and we have multiple
      bridges or even a single bridge with multiple querier vlans configured
      we can crash the machine. Add a 1 second minimum which must be enforced
      by overwriting the value if set lower (i.e. without returning an error) to
      avoid breaking user-space. If that happens a log message is emitted to let
      the administrator know that the interval has been set to the minimum.
      The issue has been present since these intervals could be user-controlled.
      
      [1] https://lore.kernel.org/netdev/e8b9ce41-57b9-b6e2-a46a-ff9c791cf0ba@gmail.com/
      
      Fixes: d902eee4
      
       ("bridge: Add multicast count/interval sysfs entries")
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      99b40610
    • Tamir Duberstein's avatar
      ipv6: raw: check passed optlen before reading · fb7bc920
      Tamir Duberstein authored
      
      
      Add a check that the user-provided option is at least as long as the
      number of bytes we intend to read. Before this patch we would blindly
      read sizeof(int) bytes even in cases where the user passed
      optlen<sizeof(int), which would potentially read garbage or fault.
      
      Discovered by new tests in https://github.com/google/gvisor/pull/6957 .
      
      The original get_user call predates history in the git repo.
      
      Signed-off-by: default avatarTamir Duberstein <tamird@gmail.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/20211229200947.2862255-1-willemdebruijn.kernel@gmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fb7bc920
    • Linus Torvalds's avatar
      Merge tag 's390-5.16-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · eec4df26
      Linus Torvalds authored
      Pull s390 fix from Heiko Carstens:
      
       - fix s390 mcount regex typo in recordmcount.pl
      
      * tag 's390-5.16-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        recordmcount.pl: fix typo in s390 mcount regex
      eec4df26
    • Ciara Loftus's avatar
      xsk: Initialise xskb free_list_node · 5bec7ca2
      Ciara Loftus authored
      This commit initialises the xskb's free_list_node when the xskb is
      allocated. This prevents a potential false negative returned from a call
      to list_empty for that node, such as the one introduced in commit
      199d983b ("xsk: Fix crash on double free in buffer pool")
      
      In my environment this issue caused packets to not be received by
      the xdpsock application if the traffic was running prior to application
      launch. This happened when the first batch of packets failed the xskmap
      lookup and XDP_PASS was returned from the bpf program. This action is
      handled in the i40e zc driver (and others) by allocating an skbuff,
      freeing the xdp_buff and adding the associated xskb to the
      xsk_buff_pool's free_list if it hadn't been added already. Without this
      fix, the xskb is not added to the free_list because the check to determine
      if it was added already returns an invalid positive result. Later, this
      caused allocation errors in the driver and the failure to receive packets.
      
      Fixes: 199d983b ("xsk: Fix crash on double free in buffer pool")
      Fixes: 2b43470a
      
       ("xsk: Introduce AF_XDP buffer allocation API")
      Signed-off-by: default avatarCiara Loftus <ciara.loftus@intel.com>
      Acked-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Link: https://lore.kernel.org/r/20211220155250.2746-1-ciara.loftus@intel.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5bec7ca2
  5. Dec 29, 2021
  6. Dec 28, 2021
    • David S. Miller's avatar
      Merge branch 'smc-fixes' · 16fa29ae
      David S. Miller authored
      
      
      Dust Li says:
      
      ====================
      net/smc: fix kernel panic caused by race of smc_sock
      
      This patchset fixes the race between smc_release triggered by
      close(2) and cdc_handle triggered by underlaying RDMA device.
      
      The race is caused because the smc_connection may been released
      before the pending tx CDC messages got its CQEs. In order to fix
      this, I add a counter to track how many pending WRs we have posted
      through the smc_connection, and only release the smc_connection
      after there is no pending WRs on the connection.
      
      The first patch prevents posting WR on a QP that is not in RTS
      state. This patch is needed because if we post WR on a QP that
      is not in RTS state, ib_post_send() may success but no CQE will
      return, and that will confuse the counter tracking the pending
      WRs.
      
      The second patch add a counter to track how many WRs were posted
      through the smc_connection, and don't reset the QP on link destroying
      to prevent leak of the counter.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      16fa29ae
    • Dust Li's avatar
      net/smc: fix kernel panic caused by race of smc_sock · 349d4312
      Dust Li authored
      A crash occurs when smc_cdc_tx_handler() tries to access smc_sock
      but smc_release() has already freed it.
      
      [ 4570.695099] BUG: unable to handle page fault for address: 000000002eae9e88
      [ 4570.696048] #PF: supervisor write access in kernel mode
      [ 4570.696728] #PF: error_code(0x0002) - not-present page
      [ 4570.697401] PGD 0 P4D 0
      [ 4570.697716] Oops: 0002 [#1] PREEMPT SMP NOPTI
      [ 4570.698228] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc4+ #111
      [ 4570.699013] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/0
      [ 4570.699933] RIP: 0010:_raw_spin_lock+0x1a/0x30
      <...>
      [ 4570.711446] Call Trace:
      [ 4570.711746]  <IRQ>
      [ 4570.711992]  smc_cdc_tx_handler+0x41/0xc0
      [ 4570.712470]  smc_wr_tx_tasklet_fn+0x213/0x560
      [ 4570.712981]  ? smc_cdc_tx_dismisser+0x10/0x10
      [ 4570.713489]  tasklet_action_common.isra.17+0x66/0x140
      [ 4570.714083]  __do_softirq+0x123/0x2f4
      [ 4570.714521]  irq_exit_rcu+0xc4/0xf0
      [ 4570.714934]  common_interrupt+0xba/0xe0
      
      Though smc_cdc_tx_handler() checked the existence of smc connection,
      smc_release() may have already dismissed and released the smc socket
      before smc_cdc_tx_handler() further visits it.
      
      smc_cdc_tx_handler()           |smc_release()
      if (!conn)                     |
                                     |
                                     |smc_cdc_tx_dismiss_slots()
                                     |      smc_cdc_tx_dismisser()
                                     |
                                     |sock_put(&smc->sk) <- last sock_put,
                                     |                      smc_sock freed
      bh_lock_sock(&smc->sk) (panic) |
      
      To make sure we won't receive any CDC messages after we free the
      smc_sock, add a refcount on the smc_connection for inflight CDC
      message(posted to the QP but haven't received related CQE), and
      don't release the smc_connection until all the inflight CDC messages
      haven been done, for both success or failed ones.
      
      Using refcount on CDC messages brings another problem: when the link
      is going to be destroyed, smcr_link_clear() will reset the QP, which
      then remove all the pending CQEs related to the QP in the CQ. To make
      sure all the CQEs will always come back so the refcount on the
      smc_connection can always reach 0, smc_ib_modify_qp_reset() was replaced
      by smc_ib_modify_qp_error().
      And remove the timeout in smc_wr_tx_wait_no_pending_sends() since we
      need to wait for all pending WQEs done, or we may encounter use-after-
      free when handling CQEs.
      
      For IB device removal routine, we need to wait for all the QPs on that
      device been destroyed before we can destroy CQs on the device, or
      the refcount on smc_connection won't reach 0 and smc_sock cannot be
      released.
      
      Fixes: 5f08318f
      
       ("smc: connection data control (CDC)")
      Reported-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Signed-off-by: default avatarDust Li <dust.li@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      349d4312
    • Dust Li's avatar
      net/smc: don't send CDC/LLC message if link not ready · 90cee52f
      Dust Li authored
      We found smc_llc_send_link_delete_all() sometimes wait
      for 2s timeout when testing with RDMA link up/down.
      It is possible when a smc_link is in ACTIVATING state,
      the underlaying QP is still in RESET or RTR state, which
      cannot send any messages out.
      
      smc_llc_send_link_delete_all() use smc_link_usable() to
      checks whether the link is usable, if the QP is still in
      RESET or RTR state, but the smc_link is in ACTIVATING, this
      LLC message will always fail without any CQE entering the
      CQ, and we will always wait 2s before timeout.
      
      Since we cannot send any messages through the QP before
      the QP enter RTS. I add a wrapper smc_link_sendable()
      which checks the state of QP along with the link state.
      And replace smc_link_usable() with smc_link_sendable()
      in all LLC & CDC message sending routine.
      
      Fixes: 5f08318f
      
       ("smc: connection data control (CDC)")
      Signed-off-by: default avatarDust Li <dust.li@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90cee52f