Skip to content
  1. Jun 09, 2023
    • Linus Torvalds's avatar
      Merge tag 'cgroup-for-6.4-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · 9cd6357f
      Linus Torvalds authored
      Pull cgroup fixes from Tejun Heo:
      
       - Fix css_set reference leaks on fork failures
      
       - Fix CPU hotplug locking in cgroup_transfer_tasks() which is used by
         cgroup1 cpuset
      
       - Doc update
      
      * tag 'cgroup-for-6.4-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cgroup: Documentation: Clarify usage of memory limits
        cgroup: always put cset in cgroup_css_set_put_fork
        cgroup: fix missing cpus_read_{lock,unlock}() in cgroup_transfer_tasks()
      9cd6357f
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 8d15d5e1
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
       "Two tiny arm64 fixes for -rc6.
      
        One fixes a build breakage when MAX_ORDER can be nonsensical if
        CONFIG_EXPERT=y and the other fixes the address masking for perf's
        page fault software events so that it is consistent amongst them:
      
         - Fix build breakage due to bogus MAX_ORDER definitions on !4k pages
      
         - Avoid masking fault address for perf software events"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: mm: pass original fault address to handle_mm_fault() in PER_VMA_LOCK block
        arm64: Remove the ARCH_FORCE_MAX_ORDER config input prompt
      8d15d5e1
    • Linus Torvalds's avatar
      Merge tag 'net-6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 25041a4c
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from can, wifi, netfilter, bluetooth and ebpf.
      
        Current release - regressions:
      
         - bpf: sockmap: avoid potential NULL dereference in
           sk_psock_verdict_data_ready()
      
         - wifi: iwlwifi: fix -Warray-bounds bug in iwl_mvm_wait_d3_notif()
      
         - phylink: actually fix ksettings_set() ethtool call
      
         - eth: dwmac-qcom-ethqos: fix a regression on EMAC < 3
      
        Current release - new code bugs:
      
         - wifi: mt76: fix possible NULL pointer dereference in
           mt7996_mac_write_txwi()
      
        Previous releases - regressions:
      
         - netfilter: fix NULL pointer dereference in nf_confirm_cthelper
      
         - wifi: rtw88/rtw89: correct PS calculation for SUPPORTS_DYNAMIC_PS
      
         - openvswitch: fix upcall counter access before allocation
      
         - bluetooth:
            - fix use-after-free in hci_remove_ltk/hci_remove_irk
            - fix l2cap_disconnect_req deadlock
      
         - nic: bnxt_en: prevent kernel panic when receiving unexpected
           PHC_UPDATE event
      
        Previous releases - always broken:
      
         - core: annotate rfs lockless accesses
      
         - sched: fq_pie: ensure reasonable TCA_FQ_PIE_QUANTUM values
      
         - netfilter: add null check for nla_nest_start_noflag() in
           nft_dump_basechain_hook()
      
         - bpf: fix UAF in task local storage
      
         - ipv4: ping_group_range: allow GID from 2147483648 to 4294967294
      
         - ipv6: rpl: fix route of death.
      
         - tcp: gso: really support BIG TCP
      
         - mptcp: fixes for user-space PM address advertisement
      
         - smc: avoid to access invalid RMBs' MRs in SMCRv1 ADD LINK CONT
      
         - can: avoid possible use-after-free when j1939_can_rx_register fails
      
         - batman-adv: fix UaF while rescheduling delayed work
      
         - eth: qede: fix scheduling while atomic
      
         - eth: ice: make writes to /dev/gnssX synchronous"
      
      * tag 'net-6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
        bnxt_en: Implement .set_port / .unset_port UDP tunnel callbacks
        bnxt_en: Prevent kernel panic when receiving unexpected PHC_UPDATE event
        bnxt_en: Skip firmware fatal error recovery if chip is not accessible
        bnxt_en: Query default VLAN before VNIC setup on a VF
        bnxt_en: Don't issue AP reset during ethtool's reset operation
        bnxt_en: Fix bnxt_hwrm_update_rss_hash_cfg()
        net: bcmgenet: Fix EEE implementation
        eth: ixgbe: fix the wake condition
        eth: bnxt: fix the wake condition
        lib: cpu_rmap: Fix potential use-after-free in irq_cpu_rmap_release()
        bpf: Add extra path pointer check to d_path helper
        net: sched: fix possible refcount leak in tc_chain_tmplt_add()
        net: sched: act_police: fix sparse errors in tcf_police_dump()
        net: openvswitch: fix upcall counter access before allocation
        net: sched: move rtm_tca_policy declaration to include file
        ice: make writes to /dev/gnssX synchronous
        net: sched: add rcu annotations around qdisc->qdisc_sleeping
        rfs: annotate lockless accesses to RFS sock flow table
        rfs: annotate lockless accesses to sk->sk_rxhash
        virtio_net: use control_buf for coalesce params
        ...
      25041a4c
  2. Jun 08, 2023
  3. Jun 07, 2023
    • Jiri Olsa's avatar
      bpf: Add extra path pointer check to d_path helper · f46fab0e
      Jiri Olsa authored
      Anastasios reported crash on stable 5.15 kernel with following
      BPF attached to lsm hook:
      
        SEC("lsm.s/bprm_creds_for_exec")
        int BPF_PROG(bprm_creds_for_exec, struct linux_binprm *bprm)
        {
                struct path *path = &bprm->executable->f_path;
                char p[128] = { 0 };
      
                bpf_d_path(path, p, 128);
                return 0;
        }
      
      But bprm->executable can be NULL, so bpf_d_path call will crash:
      
        BUG: kernel NULL pointer dereference, address: 0000000000000018
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 0 P4D 0
        Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI
        ...
        RIP: 0010:d_path+0x22/0x280
        ...
        Call Trace:
         <TASK>
         bpf_d_path+0x21/0x60
         bpf_prog_db9cf176e84498d9_bprm_creds_for_exec+0x94/0x99
         bpf_trampoline_6442506293_0+0x55/0x1000
         bpf_lsm_bprm_creds_for_exec+0x5/0x10
         security_bprm_creds_for_exec+0x29/0x40
         bprm_execve+0x1c1/0x900
         do_execveat_common.isra.0+0x1af/0x260
         __x64_sys_execve+0x32/0x40
      
      It's problem for all stable trees with bpf_d_path helper, which was
      added in 5.9.
      
      This issue is fixed in current bpf code, where we identify and mark
      trusted pointers, so the above code would fail even to load.
      
      For the sake of the stable trees and to workaround potentially broken
      verifier in the future, adding the code that reads the path object from
      the passed pointer and verifies it's valid in kernel space.
      
      Fixes: 6e22ab9d
      
       ("bpf: Add d_path helper")
      Reported-by: default avatarAnastasios Papagiannis <tasos.papagiannnis@gmail.com>
      Suggested-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20230606181714.532998-1-jolsa@kernel.org
      f46fab0e
    • Hangyu Hua's avatar
      net: sched: fix possible refcount leak in tc_chain_tmplt_add() · 44f8baaf
      Hangyu Hua authored
      try_module_get will be called in tcf_proto_lookup_ops. So module_put needs
      to be called to drop the refcount if ops don't implement the required
      function.
      
      Fixes: 9f407f17
      
       ("net: sched: introduce chain templates")
      Signed-off-by: default avatarHangyu Hua <hbh25y@gmail.com>
      Reviewed-by: default avatarLarysa Zaremba <larysa.zaremba@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44f8baaf
    • Eric Dumazet's avatar
      net: sched: act_police: fix sparse errors in tcf_police_dump() · 682881ee
      Eric Dumazet authored
      Fixes following sparse errors:
      
      net/sched/act_police.c:360:28: warning: dereference of noderef expression
      net/sched/act_police.c:362:45: warning: dereference of noderef expression
      net/sched/act_police.c:362:45: warning: dereference of noderef expression
      net/sched/act_police.c:368:28: warning: dereference of noderef expression
      net/sched/act_police.c:370:45: warning: dereference of noderef expression
      net/sched/act_police.c:370:45: warning: dereference of noderef expression
      net/sched/act_police.c:376:45: warning: dereference of noderef expression
      net/sched/act_police.c:376:45: warning: dereference of noderef expression
      
      Fixes: d1967e49
      
       ("net_sched: act_police: add 2 new attributes to support police 64bit rate and peakrate")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      682881ee
    • Eelco Chaudron's avatar
      net: openvswitch: fix upcall counter access before allocation · de9df6c6
      Eelco Chaudron authored
      Currently, the per cpu upcall counters are allocated after the vport is
      created and inserted into the system. This could lead to the datapath
      accessing the counters before they are allocated resulting in a kernel
      Oops.
      
      Here is an example:
      
        PID: 59693    TASK: ffff0005f4f51500  CPU: 0    COMMAND: "ovs-vswitchd"
         #0 [ffff80000a39b5b0] __switch_to at ffffb70f0629f2f4
         #1 [ffff80000a39b5d0] __schedule at ffffb70f0629f5cc
         #2 [ffff80000a39b650] preempt_schedule_common at ffffb70f0629fa60
         #3 [ffff80000a39b670] dynamic_might_resched at ffffb70f0629fb58
         #4 [ffff80000a39b680] mutex_lock_killable at ffffb70f062a1388
         #5 [ffff80000a39b6a0] pcpu_alloc at ffffb70f0594460c
         #6 [ffff80000a39b750] __alloc_percpu_gfp at ffffb70f05944e68
         #7 [ffff80000a39b760] ovs_vport_cmd_new at ffffb70ee6961b90 [openvswitch]
         ...
      
        PID: 58682    TASK: ffff0005b2f0bf00  CPU: 0    COMMAND: "kworker/0:3"
         #0 [ffff80000a5d2f40] machine_kexec at ffffb70f056a0758
         #1 [ffff80000a5d2f70] __crash_kexec at ffffb70f057e2994
         #2 [ffff80000a5d3100] crash_kexec at ffffb70f057e2ad8
         #3 [ffff80000a5d3120] die at ffffb70f0628234c
         #4 [ffff80000a5d31e0] die_kernel_fault at ffffb70f062828a8
         #5 [ffff80000a5d3210] __do_kernel_fault at ffffb70f056a31f4
         #6 [ffff80000a5d3240] do_bad_area at ffffb70f056a32a4
         #7 [ffff80000a5d3260] do_translation_fault at ffffb70f062a9710
         #8 [ffff80000a5d3270] do_mem_abort at ffffb70f056a2f74
         #9 [ffff80000a5d32a0] el1_abort at ffffb70f06297dac
        #10 [ffff80000a5d32d0] el1h_64_sync_handler at ffffb70f06299b24
        #11 [ffff80000a5d3410] el1h_64_sync at ffffb70f056812dc
        #12 [ffff80000a5d3430] ovs_dp_upcall at ffffb70ee6963c84 [openvswitch]
        #13 [ffff80000a5d3470] ovs_dp_process_packet at ffffb70ee6963fdc [openvswitch]
        #14 [ffff80000a5d34f0] ovs_vport_receive at ffffb70ee6972c78 [openvswitch]
        #15 [ffff80000a5d36f0] netdev_port_receive at ffffb70ee6973948 [openvswitch]
        #16 [ffff80000a5d3720] netdev_frame_hook at ffffb70ee6973a28 [openvswitch]
        #17 [ffff80000a5d3730] __netif_receive_skb_core.constprop.0 at ffffb70f06079f90
      
      We moved the per cpu upcall counter allocation to the existing vport
      alloc and free functions to solve this.
      
      Fixes: 95637d91 ("net: openvswitch: release vport resources on failure")
      Fixes: 1933ea36
      
       ("net: openvswitch: Add support to count upcall packets")
      Signed-off-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Acked-by: default avatarAaron Conole <aconole@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de9df6c6
    • Eric Dumazet's avatar
      net: sched: move rtm_tca_policy declaration to include file · 886bc7d6
      Eric Dumazet authored
      rtm_tca_policy is used from net/sched/sch_api.c and net/sched/cls_api.c,
      thus should be declared in an include file.
      
      This fixes the following sparse warning:
      net/sched/sch_api.c:1434:25: warning: symbol 'rtm_tca_policy' was not declared. Should it be static?
      
      Fixes: e331473f
      
       ("net/sched: cls_api: add missing validation of netlink attributes")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      886bc7d6
    • Michal Schmidt's avatar
      ice: make writes to /dev/gnssX synchronous · bf15bb38
      Michal Schmidt authored
      The current ice driver's GNSS write implementation buffers writes and
      works through them asynchronously in a kthread. That's bad because:
       - The GNSS write_raw operation is supposed to be synchronous[1][2].
       - There is no upper bound on the number of pending writes.
         Userspace can submit writes much faster than the driver can process,
         consuming unlimited amounts of kernel memory.
      
      A patch that's currently on review[3] ("[v3,net] ice: Write all GNSS
      buffers instead of first one") would add one more problem:
       - The possibility of waiting for a very long time to flush the write
         work when doing rmmod, softlockups.
      
      To fix these issues, simplify the implementation: Drop the buffering,
      the write_work, and make the writes synchronous.
      
      I tested this with gpsd and ubxtool.
      
      [1] https://events19.linuxfoundation.org/wp-content/uploads/2017/12/The-GNSS-Subsystem-Johan-Hovold-Hovold-Consulting-AB.pdf
          "User interface" slide.
      [2] A comment in drivers/gnss/core.c:gnss_write():
              /* Ignoring O_NONBLOCK, write_raw() is synchronous. */
      [3] https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20230217120541.16745-1-karol.kolacinski@intel.com/
      
      Fixes: d6b98c8d
      
       ("ice: add write functionality for GNSS TTY")
      Signed-off-by: default avatarMichal Schmidt <mschmidt@redhat.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf15bb38
    • Eric Dumazet's avatar
      net: sched: add rcu annotations around qdisc->qdisc_sleeping · d636fc5d
      Eric Dumazet authored
      syzbot reported a race around qdisc->qdisc_sleeping [1]
      
      It is time we add proper annotations to reads and writes to/from
      qdisc->qdisc_sleeping.
      
      [1]
      BUG: KCSAN: data-race in dev_graft_qdisc / qdisc_lookup_rcu
      
      read to 0xffff8881286fc618 of 8 bytes by task 6928 on cpu 1:
      qdisc_lookup_rcu+0x192/0x2c0 net/sched/sch_api.c:331
      __tcf_qdisc_find+0x74/0x3c0 net/sched/cls_api.c:1174
      tc_get_tfilter+0x18f/0x990 net/sched/cls_api.c:2547
      rtnetlink_rcv_msg+0x7af/0x8c0 net/core/rtnetlink.c:6386
      netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2546
      rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6413
      netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
      netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365
      netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1913
      sock_sendmsg_nosec net/socket.c:724 [inline]
      sock_sendmsg net/socket.c:747 [inline]
      ____sys_sendmsg+0x375/0x4c0 net/socket.c:2503
      ___sys_sendmsg net/socket.c:2557 [inline]
      __sys_sendmsg+0x1e3/0x270 net/socket.c:2586
      __do_sys_sendmsg net/socket.c:2595 [inline]
      __se_sys_sendmsg net/socket.c:2593 [inline]
      __x64_sys_sendmsg+0x46/0x50 net/socket.c:2593
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      write to 0xffff8881286fc618 of 8 bytes by task 6912 on cpu 0:
      dev_graft_qdisc+0x4f/0x80 net/sched/sch_generic.c:1115
      qdisc_graft+0x7d0/0xb60 net/sched/sch_api.c:1103
      tc_modify_qdisc+0x712/0xf10 net/sched/sch_api.c:1693
      rtnetlink_rcv_msg+0x807/0x8c0 net/core/rtnetlink.c:6395
      netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2546
      rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6413
      netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
      netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365
      netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1913
      sock_sendmsg_nosec net/socket.c:724 [inline]
      sock_sendmsg net/socket.c:747 [inline]
      ____sys_sendmsg+0x375/0x4c0 net/socket.c:2503
      ___sys_sendmsg net/socket.c:2557 [inline]
      __sys_sendmsg+0x1e3/0x270 net/socket.c:2586
      __do_sys_sendmsg net/socket.c:2595 [inline]
      __se_sys_sendmsg net/socket.c:2593 [inline]
      __x64_sys_sendmsg+0x46/0x50 net/socket.c:2593
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 6912 Comm: syz-executor.5 Not tainted 6.4.0-rc3-syzkaller-00190-g0d85b27b0cc6 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/16/2023
      
      Fixes: 3a7d0d07
      
       ("net: sched: extend Qdisc with rcu")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Vlad Buslov <vladbu@nvidia.com>
      Acked-by: default avatarJamal Hadi <Salim&lt;jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d636fc5d
    • David S. Miller's avatar
      Merge branch 'rfs-lockless-annotate' · e3144ff5
      David S. Miller authored
      
      
      Eric Dumazet says:
      
      ====================
      rfs: annotate lockless accesses
      
      rfs runs without locks held, so we should annotate
      read and writes to shared variables.
      
      It should prevent compilers forcing writes
      in the following situation:
      
        if (var != val)
           var = val;
      
      A compiler could indeed simply avoid the conditional:
      
          var = val;
      
      This matters if var is shared between many cpus.
      
      v2: aligns one closing bracket (Simon)
          adds Fixes: tags (Jakub)
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3144ff5
    • Eric Dumazet's avatar
      rfs: annotate lockless accesses to RFS sock flow table · 5c3b74a9
      Eric Dumazet authored
      Add READ_ONCE()/WRITE_ONCE() on accesses to the sock flow table.
      
      This also prevents a (smart ?) compiler to remove the condition in:
      
      if (table->ents[index] != newval)
              table->ents[index] = newval;
      
      We need the condition to avoid dirtying a shared cache line.
      
      Fixes: fec5e652
      
       ("rfs: Receive Flow Steering")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5c3b74a9
    • Eric Dumazet's avatar
      rfs: annotate lockless accesses to sk->sk_rxhash · 1e5c647c
      Eric Dumazet authored
      Add READ_ONCE()/WRITE_ONCE() on accesses to sk->sk_rxhash.
      
      This also prevents a (smart ?) compiler to remove the condition in:
      
      if (sk->sk_rxhash != newval)
      	sk->sk_rxhash = newval;
      
      We need the condition to avoid dirtying a shared cache line.
      
      Fixes: fec5e652
      
       ("rfs: Receive Flow Steering")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e5c647c
    • Jakub Kicinski's avatar
      Merge tag 'for-net-2023-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth · ab39b113
      Jakub Kicinski authored
      Luiz Augusto von Dentz says:
      
      ====================
      bluetooth pull request for net:
      
       - Fixes to debugfs registration
       - Fix use-after-free in hci_remove_ltk/hci_remove_irk
       - Fixes to ISO channel support
       - Fix missing checks for invalid L2CAP DCID
       - Fix l2cap_disconnect_req deadlock
       - Add lock to protect HCI_UNREGISTER
      
      * tag 'for-net-2023-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
        Bluetooth: L2CAP: Add missing checks for invalid DCID
        Bluetooth: ISO: use correct CIS order in Set CIG Parameters event
        Bluetooth: ISO: don't try to remove CIG if there are bound CIS left
        Bluetooth: Fix l2cap_disconnect_req deadlock
        Bluetooth: hci_qca: fix debugfs registration
        Bluetooth: fix debugfs registration
        Bluetooth: hci_sync: add lock to protect HCI_UNREGISTER
        Bluetooth: Fix use-after-free in hci_remove_ltk/hci_remove_irk
        Bluetooth: ISO: Fix CIG auto-allocation to select configurable CIG
        Bluetooth: ISO: consider right CIS when removing CIG at cleanup
      ====================
      
      Link: https://lore.kernel.org/r/20230606003454.2392552-1-luiz.dentz@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ab39b113
    • Jakub Kicinski's avatar
      Merge tag 'nf-23-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 20c47646
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Missing nul-check in basechain hook netlink dump path, from Gavrilov Ilia.
      
      2) Fix bitwise register tracking, from Jeremy Sowden.
      
      3) Null pointer dereference when accessing conntrack helper,
         from Tijs Van Buggenhout.
      
      4) Add schedule point to ipset's call_ad, from Kuniyuki Iwashima.
      
      5) Incorrect boundary check when building chain blob.
      
      * tag 'nf-23-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables: out-of-bound check in chain blob
        netfilter: ipset: Add schedule point in call_ad().
        netfilter: conntrack: fix NULL pointer dereference in nf_confirm_cthelper
        netfilter: nft_bitwise: fix register tracking
        netfilter: nf_tables: Add null check for nla_nest_start_noflag() in nft_dump_basechain_hook()
      ====================
      
      Link: https://lore.kernel.org/r/20230606225851.67394-1-pablo@netfilter.org
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      20c47646
    • Jakub Kicinski's avatar
      Merge tag 'wireless-2023-06-06' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless · e684ab76
      Jakub Kicinski authored
      Kalle Valo says:
      
      ====================
      wireless fixes for v6.4
      
      Both rtw88 and rtw89 have a 802.11 powersave fix for a regression
      introduced in v6.0. mt76 fixes a race and a null pointer dereference.
      iwlwifi fixes an issue where not enough memory was allocated for a
      firmware event. And finally the stack has several smaller fixes all
      over.
      
      * tag 'wireless-2023-06-06' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
        wifi: cfg80211: fix locking in regulatory disconnect
        wifi: cfg80211: fix locking in sched scan stop work
        wifi: iwlwifi: mvm: Fix -Warray-bounds bug in iwl_mvm_wait_d3_notif()
        wifi: mac80211: fix switch count in EMA beacons
        wifi: mac80211: don't translate beacon/presp addrs
        wifi: mac80211: mlme: fix non-inheritence element
        wifi: cfg80211: reject bad AP MLD address
        wifi: mac80211: use correct iftype HE cap
        wifi: mt76: mt7996: fix possible NULL pointer dereference in mt7996_mac_write_txwi()
        wifi: rtw89: remove redundant check of entering LPS
        wifi: rtw89: correct PS calculation for SUPPORTS_DYNAMIC_PS
        wifi: rtw88: correct PS calculation for SUPPORTS_DYNAMIC_PS
        wifi: mt76: mt7615: fix possible race in mt7615_mac_sta_poll
      ====================
      
      Link: https://lore.kernel.org/r/20230606150817.EC133C433D2@smtp.kernel.org
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e684ab76
    • Brett Creeley's avatar
      virtio_net: use control_buf for coalesce params · accc1bf2
      Brett Creeley authored
      Commit 699b045a ("net: virtio_net: notifications coalescing
      support") added coalescing command support for virtio_net. However,
      the coalesce commands are using buffers on the stack, which is causing
      the device to see DMA errors. There should also be a complaint from
      check_for_stack() in debug_dma_map_xyz(). Fix this by adding and using
      coalesce params from the control_buf struct, which aligns with other
      commands.
      
      Cc: stable@vger.kernel.org
      Fixes: 699b045a
      
       ("net: virtio_net: notifications coalescing support")
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarAllen Hubbe <allen.hubbe@amd.com>
      Signed-off-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Reviewed-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Link: https://lore.kernel.org/r/20230605195925.51625-1-brett.creeley@amd.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      accc1bf2
    • Brett Creeley's avatar
      pds_core: Fix FW recovery detection · 4f48c303
      Brett Creeley authored
      Commit 523847df ("pds_core: add devcmd device interfaces") included
      initial support for FW recovery detection. Unfortunately, the ordering
      in pdsc_is_fw_good() was incorrect, which was causing FW recovery to be
      undetected by the driver. Fix this by making sure to update the cached
      fw_status by calling pdsc_is_fw_running() before setting the local FW
      gen.
      
      Fixes: 523847df
      
       ("pds_core: add devcmd device interfaces")
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230605195116.49653-1-brett.creeley@amd.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4f48c303
    • Eric Dumazet's avatar
      tcp: gso: really support BIG TCP · 82a01ab3
      Eric Dumazet authored
      We missed that tcp_gso_segment() was assuming skb->len was smaller than 65535 :
      
      oldlen = (u16)~skb->len;
      
      This part came with commit 0718bcc0 ("[NET]: Fix CHECKSUM_HW GSO problems.")
      
      This leads to wrong TCP checksum.
      
      Adapt the code to accept arbitrary packet length.
      
      v2:
        - use two csum_add() instead of csum_fold() (Alexander Duyck)
        - Change delta type to __wsum to reduce casts (Alexander Duyck)
      
      Fixes: 09f3d1a3
      
       ("ipv6/gso: remove temporary HBH/jumbo header")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230605161647.3624428-1-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      82a01ab3
    • Kuniyuki Iwashima's avatar
      ipv6: rpl: Fix Route of Death. · a2f4c143
      Kuniyuki Iwashima authored
      A remote DoS vulnerability of RPL Source Routing is assigned CVE-2023-2156.
      
      The Source Routing Header (SRH) has the following format:
      
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |  Next Header  |  Hdr Ext Len  | Routing Type  | Segments Left |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        | CmprI | CmprE |  Pad  |               Reserved                |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |                                                               |
        .                                                               .
        .                        Addresses[1..n]                        .
        .                                                               .
        |                                                               |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      
      The originator of an SRH places the first hop's IPv6 address in the IPv6
      header's IPv6 Destination Address and the second hop's IPv6 address as
      the first address in Addresses[1..n].
      
      The CmprI and CmprE fields indicate the number of prefix octets that are
      shared with the IPv6 Destination Address.  When CmprI or CmprE is not 0,
      Addresses[1..n] are compressed as follows:
      
        1..n-1 : (16 - CmprI) bytes
             n : (16 - CmprE) bytes
      
      Segments Left indicates the number of route segments remaining.  When the
      value is not zero, the SRH is forwarded to the next hop.  Its address
      is extracted from Addresses[n - Segment Left + 1] and swapped with IPv6
      Destination Address.
      
      When Segment Left is greater than or equal to 2, the size of SRH is not
      changed because Addresses[1..n-1] are decompressed and recompressed with
      CmprI.
      
      OTOH, when Segment Left changes from 1 to 0, the new SRH could have a
      different size because Addresses[1..n-1] are decompressed with CmprI and
      recompressed with CmprE.
      
      Let's say CmprI is 15 and CmprE is 0.  When we receive SRH with Segment
      Left >= 2, Addresses[1..n-1] have 1 byte for each, and Addresses[n] has
      16 bytes.  When Segment Left is 1, Addresses[1..n-1] is decompressed to
      16 bytes and not recompressed.  Finally, the new SRH will need more room
      in the header, and the size is (16 - 1) * (n - 1) bytes.
      
      Here the max value of n is 255 as Segment Left is u8, so in the worst case,
      we have to allocate 3825 bytes in the skb headroom.  However, now we only
      allocate a small fixed buffer that is IPV6_RPL_SRH_WORST_SWAP_SIZE (16 + 7
      bytes).  If the decompressed size overflows the room, skb_push() hits BUG()
      below [0].
      
      Instead of allocating the fixed buffer for every packet, let's allocate
      enough headroom only when we receive SRH with Segment Left 1.
      
      [0]:
      skbuff: skb_under_panic: text:ffffffff81c9f6e2 len:576 put:576 head:ffff8880070b5180 data:ffff8880070b4fb0 tail:0x70 end:0x140 dev:lo
      kernel BUG at net/core/skbuff.c:200!
      invalid opcode: 0000 [#1] PREEMPT SMP PTI
      CPU: 0 PID: 154 Comm: python3 Not tainted 6.4.0-rc4-00190-gc308e9ec0047 #7
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      RIP: 0010:skb_panic (net/core/skbuff.c:200)
      Code: 4f 70 50 8b 87 bc 00 00 00 50 8b 87 b8 00 00 00 50 ff b7 c8 00 00 00 4c 8b 8f c0 00 00 00 48 c7 c7 80 6e 77 82 e8 ad 8b 60 ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90
      RSP: 0018:ffffc90000003da0 EFLAGS: 00000246
      RAX: 0000000000000085 RBX: ffff8880058a6600 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffff88807dc1c540 RDI: ffff88807dc1c540
      RBP: ffffc90000003e48 R08: ffffffff82b392c8 R09: 00000000ffffdfff
      R10: ffffffff82a592e0 R11: ffffffff82b092e0 R12: ffff888005b1c800
      R13: ffff8880070b51b8 R14: ffff888005b1ca18 R15: ffff8880070b5190
      FS:  00007f4539f0b740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055670baf3000 CR3: 0000000005b0e000 CR4: 00000000007506f0
      PKRU: 55555554
      Call Trace:
       <IRQ>
       skb_push (net/core/skbuff.c:210)
       ipv6_rthdr_rcv (./include/linux/skbuff.h:2880 net/ipv6/exthdrs.c:634 net/ipv6/exthdrs.c:718)
       ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:437 (discriminator 5))
       ip6_input_finish (./include/linux/rcupdate.h:805 net/ipv6/ip6_input.c:483)
       __netif_receive_skb_one_core (net/core/dev.c:5494)
       process_backlog (./include/linux/rcupdate.h:805 net/core/dev.c:5934)
       __napi_poll (net/core/dev.c:6496)
       net_rx_action (net/core/dev.c:6565 net/core/dev.c:6696)
       __do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:572)
       do_softirq (kernel/softirq.c:472 kernel/softirq.c:459)
       </IRQ>
       <TASK>
       __local_bh_enable_ip (kernel/softirq.c:396)
       __dev_queue_xmit (net/core/dev.c:4272)
       ip6_finish_output2 (./include/net/neighbour.h:544 net/ipv6/ip6_output.c:134)
       rawv6_sendmsg (./include/net/dst.h:458 ./include/linux/netfilter.h:303 net/ipv6/raw.c:656 net/ipv6/raw.c:914)
       sock_sendmsg (net/socket.c:724 net/socket.c:747)
       __sys_sendto (net/socket.c:2144)
       __x64_sys_sendto (net/socket.c:2156 net/socket.c:2152 net/socket.c:2152)
       do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
       entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
      RIP: 0033:0x7f453a138aea
      Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
      RSP: 002b:00007ffcc212a1c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00007ffcc212a288 RCX: 00007f453a138aea
      RDX: 0000000000000060 RSI: 00007f4539084c20 RDI: 0000000000000003
      RBP: 00007f4538308e80 R08: 00007ffcc212a300 R09: 000000000000001c
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007f4539712d1b
       </TASK>
      Modules linked in:
      
      Fixes: 8610c7c6 ("net: ipv6: add support for rpl sr exthdr")
      Reported-by: Max VA
      Closes: https://www.interruptlabs.co.uk/articles/linux-ipv6-route-of-death
      
      
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20230605180617.67284-1-kuniyu@amazon.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a2f4c143
    • Jakub Kicinski's avatar
      netlink: specs: ethtool: fix random typos · f6ca5baf
      Jakub Kicinski authored
      Working on the code gen for C reveals typos in the ethtool spec
      as the compiler tries to find the names in the existing uAPI
      header. Fix the mistakes.
      
      Fixes: a353318e
      
       ("tools: ynl: populate most of the ethtool spec")
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Link: https://lore.kernel.org/r/20230605233257.843977-1-kuba@kernel.org
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f6ca5baf
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: out-of-bound check in chain blob · 08e42a0d
      Pablo Neira Ayuso authored
      Add current size of rule expressions to the boundary check.
      
      Fixes: 2c865a8a
      
       ("netfilter: nf_tables: add rule blob layout")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      08e42a0d
    • Kuniyuki Iwashima's avatar
      netfilter: ipset: Add schedule point in call_ad(). · 24e22789
      Kuniyuki Iwashima authored
      syzkaller found a repro that causes Hung Task [0] with ipset.  The repro
      first creates an ipset and then tries to delete a large number of IPs
      from the ipset concurrently:
      
        IPSET_ATTR_IPADDR_IPV4 : 172.20.20.187
        IPSET_ATTR_CIDR        : 2
      
      The first deleting thread hogs a CPU with nfnl_lock(NFNL_SUBSYS_IPSET)
      held, and other threads wait for it to be released.
      
      Previously, the same issue existed in set->variant->uadt() that could run
      so long under ip_set_lock(set).  Commit 5e29dc36
      
       ("netfilter: ipset:
      Rework long task execution when adding/deleting entries") tried to fix it,
      but the issue still exists in the caller with another mutex.
      
      While adding/deleting many IPs, we should release the CPU periodically to
      prevent someone from abusing ipset to hang the system.
      
      Note we need to increment the ipset's refcnt to prevent the ipset from
      being destroyed while rescheduling.
      
      [0]:
      INFO: task syz-executor174:268 blocked for more than 143 seconds.
            Not tainted 6.4.0-rc1-00145-gba79e9a73284 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      task:syz-executor174 state:D stack:0     pid:268   ppid:260    flags:0x0000000d
      Call trace:
       __switch_to+0x308/0x714 arch/arm64/kernel/process.c:556
       context_switch kernel/sched/core.c:5343 [inline]
       __schedule+0xd84/0x1648 kernel/sched/core.c:6669
       schedule+0xf0/0x214 kernel/sched/core.c:6745
       schedule_preempt_disabled+0x58/0xf0 kernel/sched/core.c:6804
       __mutex_lock_common kernel/locking/mutex.c:679 [inline]
       __mutex_lock+0x6fc/0xdb0 kernel/locking/mutex.c:747
       __mutex_lock_slowpath+0x14/0x20 kernel/locking/mutex.c:1035
       mutex_lock+0x98/0xf0 kernel/locking/mutex.c:286
       nfnl_lock net/netfilter/nfnetlink.c:98 [inline]
       nfnetlink_rcv_msg+0x480/0x70c net/netfilter/nfnetlink.c:295
       netlink_rcv_skb+0x1c0/0x350 net/netlink/af_netlink.c:2546
       nfnetlink_rcv+0x18c/0x199c net/netfilter/nfnetlink.c:658
       netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
       netlink_unicast+0x664/0x8cc net/netlink/af_netlink.c:1365
       netlink_sendmsg+0x6d0/0xa4c net/netlink/af_netlink.c:1913
       sock_sendmsg_nosec net/socket.c:724 [inline]
       sock_sendmsg net/socket.c:747 [inline]
       ____sys_sendmsg+0x4b8/0x810 net/socket.c:2503
       ___sys_sendmsg net/socket.c:2557 [inline]
       __sys_sendmsg+0x1f8/0x2a4 net/socket.c:2586
       __do_sys_sendmsg net/socket.c:2595 [inline]
       __se_sys_sendmsg net/socket.c:2593 [inline]
       __arm64_sys_sendmsg+0x80/0x94 net/socket.c:2593
       __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
       invoke_syscall+0x84/0x270 arch/arm64/kernel/syscall.c:52
       el0_svc_common+0x134/0x24c arch/arm64/kernel/syscall.c:142
       do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:193
       el0_svc+0x2c/0x7c arch/arm64/kernel/entry-common.c:637
       el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
       el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591
      
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Fixes: a7b4f989
      
       ("netfilter: ipset: IP set core support")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Acked-by: default avatarJozsef Kadlecsik <kadlec@netfilter.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      24e22789