Skip to content
  1. Aug 30, 2023
    • Eric Dumazet's avatar
      ipv4: fix data-races around inet->inet_id · 417e7ec0
      Eric Dumazet authored
      [ Upstream commit f866fbc8 ]
      
      UDP sendmsg() is lockless, so ip_select_ident_segs()
      can very well be run from multiple cpus [1]
      
      Convert inet->inet_id to an atomic_t, but implement
      a dedicated path for TCP, avoiding cost of a locked
      instruction (atomic_add_return())
      
      Note that this patch will cause a trivial merge conflict
      because we added inet->flags in net-next tree.
      
      v2: added missing change in
      drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c
      (David Ahern)
      
      [1]
      
      BUG: KCSAN: data-race in __ip_make_skb / __ip_make_skb
      
      read-write to 0xffff888145af952a of 2 bytes by task 7803 on cpu 1:
      ip_select_ident_segs include/net/ip.h:542 [inline]
      ip_select_ident include/net/ip.h:556 [inline]
      __ip_make_skb+0x844/0xc70 net/ipv4/ip_output.c:1446
      ip_make_skb+0x233/0x2c0 net/ipv4/ip_output.c:1560
      udp_sendmsg+0x1199/0x1250 net/ipv4/udp.c:1260
      inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:830
      sock_sendmsg_nosec net/socket.c:725 [inline]
      sock_sendmsg net/socket.c:748 [inline]
      ____sys_sendmsg+0x37c/0x4d0 net/socket.c:2494
      ___sys_sendmsg net/socket.c:2548 [inline]
      __sys_sendmmsg+0x269/0x500 net/socket.c:2634
      __do_sys_sendmmsg net/socket.c:2663 [inline]
      __se_sys_sendmmsg net/socket.c:2660 [inline]
      __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2660
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      read to 0xffff888145af952a of 2 bytes by task 7804 on cpu 0:
      ip_select_ident_segs include/net/ip.h:541 [inline]
      ip_select_ident include/net/ip.h:556 [inline]
      __ip_make_skb+0x817/0xc70 net/ipv4/ip_output.c:1446
      ip_make_skb+0x233/0x2c0 net/ipv4/ip_output.c:1560
      udp_sendmsg+0x1199/0x1250 net/ipv4/udp.c:1260
      inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:830
      sock_sendmsg_nosec net/socket.c:725 [inline]
      sock_sendmsg net/socket.c:748 [inline]
      ____sys_sendmsg+0x37c/0x4d0 net/socket.c:2494
      ___sys_sendmsg net/socket.c:2548 [inline]
      __sys_sendmmsg+0x269/0x500 net/socket.c:2634
      __do_sys_sendmmsg net/socket.c:2663 [inline]
      __se_sys_sendmmsg net/socket.c:2660 [inline]
      __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2660
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      value changed: 0x184d -> 0x184e
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 7804 Comm: syz-executor.1 Not tainted 6.5.0-rc6-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
      ==================================================================
      
      Fixes: 23f57406
      
       ("ipv4: avoid using shared IP generator for connected sockets")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      417e7ec0
    • Jakub Kicinski's avatar
      net: validate veth and vxcan peer ifindexes · 4af1fe64
      Jakub Kicinski authored
      [ Upstream commit f534f658 ]
      
      veth and vxcan need to make sure the ifindexes of the peer
      are not negative, core does not validate this.
      
      Using iproute2 with user-space-level checking removed:
      
      Before:
      
        # ./ip link add index 10 type veth peer index -1
        # ip link show
        1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
          link/ether 52:54:00:74:b2:03 brd ff:ff:ff:ff:ff:ff
        10: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
          link/ether 8a:90:ff:57:6d:5d brd ff:ff:ff:ff:ff:ff
        -1: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
          link/ether ae:ed:18:e6:fa:7f brd ff:ff:ff:ff:ff:ff
      
      Now:
      
        $ ./ip link add index 10 type veth peer index -1
        Error: ifindex can't be negative.
      
      This problem surfaced in net-next because an explicit WARN()
      was added, the root cause is older.
      
      Fixes: e6f8f1a7 ("veth: Allow to create peer link with given ifindex")
      Fixes: a8f820a3
      
       ("can: add Virtual CAN Tunnel driver (vxcan)")
      Reported-by: default avatar <syzbot+5ba06978f34abb058571@syzkaller.appspotmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4af1fe64
    • Ruan Jinjie's avatar
      net: bcmgenet: Fix return value check for fixed_phy_register() · afc9d3d2
      Ruan Jinjie authored
      [ Upstream commit 32bbe64a ]
      
      The fixed_phy_register() function returns error pointers and never
      returns NULL. Update the checks accordingly.
      
      Fixes: b0ba512e
      
       ("net: bcmgenet: enable driver to work without a device tree")
      Signed-off-by: default avatarRuan Jinjie <ruanjinjie@huawei.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Acked-by: default avatarDoug Berger <opendmb@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      afc9d3d2
    • Ruan Jinjie's avatar
      net: bgmac: Fix return value check for fixed_phy_register() · 029e491b
      Ruan Jinjie authored
      [ Upstream commit 23a14488 ]
      
      The fixed_phy_register() function returns error pointers and never
      returns NULL. Update the checks accordingly.
      
      Fixes: c25b23b8
      
       ("bgmac: register fixed PHY for ARM BCM470X / BCM5301X chipsets")
      Signed-off-by: default avatarRuan Jinjie <ruanjinjie@huawei.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      029e491b
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: fix handling of 802.1X PAE frames · ac259251
      Arınç ÜNAL authored
      [ Upstream commit e94b590a ]
      
      802.1X PAE frames are link-local frames, therefore they must be trapped to
      the CPU port. Currently, the MT753X switches treat 802.1X PAE frames as
      regular multicast frames, therefore flooding them to user ports. To fix
      this, set 802.1X PAE frames to be trapped to the CPU port(s).
      
      Fixes: b8f126a8
      
       ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ac259251
    • Ido Schimmel's avatar
      selftests: mlxsw: Fix test failure on Spectrum-4 · c6636072
      Ido Schimmel authored
      [ Upstream commit f520489e ]
      
      Remove assumptions about shared buffer cell size and instead query the
      cell size from devlink. Adjust the test to send small packets that fit
      inside a single cell.
      
      Tested on Spectrum-{1,2,3,4}.
      
      Fixes: 47354021
      
       ("mlxsw: spectrum: Extend to support Spectrum-4 ASIC")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/f7dfbf3c4d1cb23838d9eb99bab09afaa320c4ca.1692268427.git.petrm@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c6636072
    • Amit Cohen's avatar
      mlxsw: Fix the size of 'VIRT_ROUTER_MSB' · 1288f990
      Amit Cohen authored
      [ Upstream commit 348c976b ]
      
      The field 'virtual router' was extended to 12 bits in Spectrum-4.
      Therefore, the element 'MLXSW_AFK_ELEMENT_VIRT_ROUTER_MSB' needs 3 bits for
      Spectrum < 4 and 4 bits for Spectrum >= 4.
      
      The elements are stored in an internal storage scratchpad. Currently, the
      MSB is defined there as 3 bits. It means that for Spectrum-4, only 2K VRFs
      can be used for multicast routing, as the highest bit is not really used by
      the driver. Fix the definition of 'VIRT_ROUTER_MSB' to use 4 bits. Adjust
      the definitions of 'virtual router' field in the blocks accordingly - use
      '_avoid_size_check' for Spectrum-2 instead of for Spectrum-4. Fix the mask
      in parse function to use 4 bits.
      
      Fixes: 6d5d8ebb
      
       ("mlxsw: Rename virtual router flex key element")
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/79bed2b70f6b9ed58d4df02e9798a23da648015b.1692268427.git.petrm@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1288f990
    • Ido Schimmel's avatar
      mlxsw: reg: Fix SSPR register layout · 7134565a
      Ido Schimmel authored
      [ Upstream commit 0dc63b9c ]
      
      The two most significant bits of the "local_port" field in the SSPR
      register are always cleared since they are overwritten by the deprecated
      and overlapping "sub_port" field.
      
      On systems with more than 255 local ports (e.g., Spectrum-4), this
      results in the firmware maintaining invalid mappings between system port
      and local port. Specifically, two different systems ports (0x1 and
      0x101) point to the same local port (0x1), which eventually leads to
      firmware errors.
      
      Fix by removing the deprecated "sub_port" field.
      
      Fixes: fd24b29a
      
       ("mlxsw: reg: Align existing registers to use extended local_port field")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/9b909a3033c8d3d6f67f237306bef4411c5e6ae4.1692268427.git.petrm@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7134565a
    • Danielle Ratson's avatar
      mlxsw: pci: Set time stamp fields also when its type is MIRROR_UTC · 22f9b546
      Danielle Ratson authored
      [ Upstream commit bc2de151 ]
      
      Currently, in Spectrum-2 and above, time stamps are extracted from the CQE
      into the time stamp fields in 'struct mlxsw_skb_cb', only when the CQE
      time stamp type is UTC. The time stamps are read directly from the CQE and
      software can get the time stamp in UTC format using CQEv2.
      
      From Spectrum-4, the time stamps that are read from the CQE are allowed
      to be also from MIRROR_UTC type.
      
      Therefore, we get a warning [1] from the driver that the time stamp fields
      were not set, when LLDP control packet is sent.
      
      Allow the time stamp type to be MIRROR_UTC and set the time stamp in this
      case as well.
      
      [1]
       WARNING: CPU: 11 PID: 0 at drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c:1409 mlxsw_sp2_ptp_hwtstamp_fill+0x1f/0x70 [mlxsw_spectrum]
      [...]
       Call Trace:
        <IRQ>
        mlxsw_sp2_ptp_receive+0x3c/0x80 [mlxsw_spectrum]
        mlxsw_core_skb_receive+0x119/0x190 [mlxsw_core]
        mlxsw_pci_cq_tasklet+0x3c9/0x780 [mlxsw_pci]
        tasklet_action_common.constprop.0+0x9f/0x110
        __do_softirq+0xbb/0x296
        irq_exit_rcu+0x79/0xa0
        common_interrupt+0x86/0xa0
        </IRQ>
        <TASK>
      
      Fixes: 47354021
      
       ("mlxsw: spectrum: Extend to support Spectrum-4 ASIC")
      Signed-off-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/bcef4d044ef608a4e258d33a7ec0ecd91f480db5.1692268427.git.petrm@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      22f9b546
    • Lu Wei's avatar
      ipvlan: Fix a reference count leak warning in ipvlan_ns_exit() · 4496f6cc
      Lu Wei authored
      [ Upstream commit 043d5f68 ]
      
      There are two network devices(veth1 and veth3) in ns1, and ipvlan1 with
      L3S mode and ipvlan2 with L2 mode are created based on them as
      figure (1). In this case, ipvlan_register_nf_hook() will be called to
      register nf hook which is needed by ipvlans in L3S mode in ns1 and value
      of ipvl_nf_hook_refcnt is set to 1.
      
      (1)
                 ns1                           ns2
            ------------                  ------------
      
         veth1--ipvlan1 (L3S)
      
         veth3--ipvlan2 (L2)
      
      (2)
                 ns1                           ns2
            ------------                  ------------
      
         veth1--ipvlan1 (L3S)
      
               ipvlan2 (L2)                  veth3
           |                                  |
           |------->-------->--------->--------
                          migrate
      
      When veth3 migrates from ns1 to ns2 as figure (2), veth3 will register in
      ns2 and calls call_netdevice_notifiers with NETDEV_REGISTER event:
      
      dev_change_net_namespace
          call_netdevice_notifiers
              ipvlan_device_event
                  ipvlan_migrate_l3s_hook
                      ipvlan_register_nf_hook(newnet)      (I)
                      ipvlan_unregister_nf_hook(oldnet)    (II)
      
      In function ipvlan_migrate_l3s_hook(), ipvl_nf_hook_refcnt in ns1 is not 0
      since veth1 with ipvlan1 still in ns1, (I) and (II) will be called to
      register nf_hook in ns2 and unregister nf_hook in ns1. As a result,
      ipvl_nf_hook_refcnt in ns1 is decreased incorrectly and this in ns2
      is increased incorrectly. When the second net namespace is removed, a
      reference count leak warning in ipvlan_ns_exit() will be triggered.
      
      This patch add a check before ipvlan_migrate_l3s_hook() is called. The
      warning can be triggered as follows:
      
      $ ip netns add ns1
      $ ip netns add ns2
      $ ip netns exec ns1 ip link add veth1 type veth peer name veth2
      $ ip netns exec ns1 ip link add veth3 type veth peer name veth4
      $ ip netns exec ns1 ip link add ipv1 link veth1 type ipvlan mode l3s
      $ ip netns exec ns1 ip link add ipv2 link veth3 type ipvlan mode l2
      $ ip netns exec ns1 ip link set veth3 netns ns2
      $ ip net del ns2
      
      Fixes: 3133822f
      
       ("ipvlan: use pernet operations and restrict l3s hooks to master netns")
      Signed-off-by: default avatarLu Wei <luwei32@huawei.com>
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Link: https://lore.kernel.org/r/20230817145449.141827-1-luwei32@huawei.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4496f6cc
    • Eric Dumazet's avatar
      dccp: annotate data-races in dccp_poll() · 265ed382
      Eric Dumazet authored
      [ Upstream commit cba3f178 ]
      
      We changed tcp_poll() over time, bug never updated dccp.
      
      Note that we also could remove dccp instead of maintaining it.
      
      Fixes: 7c657876
      
       ("[DCCP]: Initial implementation")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20230818015820.2701595-1-edumazet@google.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      265ed382
    • Eric Dumazet's avatar
      sock: annotate data-races around prot->memory_pressure · b516a24f
      Eric Dumazet authored
      [ Upstream commit 76f33296 ]
      
      *prot->memory_pressure is read/writen locklessly, we need
      to add proper annotations.
      
      A recent commit added a new race, it is time to audit all accesses.
      
      Fixes: 2d0c88e8 ("sock: Fix misuse of sk_under_memory_pressure()")
      Fixes: 4d93df0a
      
       ("[SCTP]: Rewrite of sctp buffer management code")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Abel Wu <wuyun.abel@bytedance.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Link: https://lore.kernel.org/r/20230818015132.2699348-1-edumazet@google.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b516a24f
    • Vladimir Oltean's avatar
      net: dsa: felix: fix oversize frame dropping for always closed tc-taprio gates · cfee1799
      Vladimir Oltean authored
      [ Upstream commit d44036ca ]
      
      The blamed commit resolved a bug where frames would still get stuck at
      egress, even though they're smaller than the maxSDU[tc], because the
      driver did not take into account the extra 33 ns that the queue system
      needs for scheduling the frame.
      
      It now takes that into account, but the arithmetic that we perform in
      vsc9959_tas_remaining_gate_len_ps() is buggy, because we operate on
      64-bit unsigned integers, so gate_len_ns - VSC9959_TAS_MIN_GATE_LEN_NS
      may become a very large integer if gate_len_ns < 33 ns.
      
      In practice, this means that we've introduced a regression where all
      traffic class gates which are permanently closed will not get detected
      by the driver, and we won't enable oversize frame dropping for them.
      
      Before:
      mscc_felix 0000:00:00.5: port 0: max frame size 1526 needs 12400000 ps, 1152000 ps for mPackets at speed 1000
      mscc_felix 0000:00:00.5: port 0 tc 0 min gate len 1000000, sending all frames
      mscc_felix 0000:00:00.5: port 0 tc 1 min gate len 0, sending all frames
      mscc_felix 0000:00:00.5: port 0 tc 2 min gate len 0, sending all frames
      mscc_felix 0000:00:00.5: port 0 tc 3 min gate len 0, sending all frames
      mscc_felix 0000:00:00.5: port 0 tc 4 min gate len 0, sending all frames
      mscc_felix 0000:00:00.5: port 0 tc 5 min gate len 0, sending all frames
      mscc_felix 0000:00:00.5: port 0 tc 6 min gate len 0, sending all frames
      mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 5120 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 615 octets including FCS
      
      After:
      mscc_felix 0000:00:00.5: port 0: max frame size 1526 needs 12400000 ps, 1152000 ps for mPackets at speed 1000
      mscc_felix 0000:00:00.5: port 0 tc 0 min gate len 1000000, sending all frames
      mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
      mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
      mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
      mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
      mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
      mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
      mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 5120 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 615 octets including FCS
      
      Fixes: 11afdc65
      
       ("net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20230817120111.3522827-1-vladimir.oltean@nxp.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      cfee1799
    • Jiri Pirko's avatar
      devlink: add missing unregister linecard notification · b701b8d1
      Jiri Pirko authored
      [ Upstream commit 2ebbc975 ]
      
      Cited fixes commit introduced linecard notifications for register,
      however it didn't add them for unregister. Fix that by adding them.
      
      Fixes: c246f9b5
      
       ("devlink: add support to create line card and expose to user")
      Signed-off-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20230817125240.2144794-1-jiri@resnulli.us
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b701b8d1
    • Jakub Kicinski's avatar
      devlink: move code to a dedicated directory · 1375d206
      Jakub Kicinski authored
      [ Upstream commit f05bd8eb
      
       ]
      
      The devlink code is hard to navigate with 13kLoC in one file.
      I really like the way Michal split the ethtool into per-command
      files and core. It'd probably be too much to split it all up,
      but we can at least separate the core parts out of the per-cmd
      implementations and put it in a directory so that new commands
      can be separate files.
      
      Move the code, subsequent commit will do a partial split.
      
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Stable-dep-of: 2ebbc975
      
       ("devlink: add missing unregister linecard notification")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1375d206
    • Hariprasad Kelam's avatar
      octeontx2-af: SDP: fix receive link config · eaeef5c8
      Hariprasad Kelam authored
      [ Upstream commit 05f3d5bc ]
      
      On SDP interfaces, frame oversize and undersize errors are
      observed as driver is not considering packet sizes of all
      subscribers of the link before updating the link config.
      
      This patch fixes the same.
      
      Fixes: 9b7dd87a
      
       ("octeontx2-af: Support to modify min/max allowed packet lengths")
      Signed-off-by: default avatarHariprasad Kelam <hkelam@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/20230817063006.10366-1-hkelam@marvell.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      eaeef5c8
    • Zheng Yejian's avatar
      tracing: Fix memleak due to race between current_tracer and trace · 2cb0c037
      Zheng Yejian authored
      [ Upstream commit eecb91b9 ]
      
      Kmemleak report a leak in graph_trace_open():
      
        unreferenced object 0xffff0040b95f4a00 (size 128):
          comm "cat", pid 204981, jiffies 4301155872 (age 99771.964s)
          hex dump (first 32 bytes):
            e0 05 e7 b4 ab 7d 00 00 0b 00 01 00 00 00 00 00 .....}..........
            f4 00 01 10 00 a0 ff ff 00 00 00 00 65 00 10 00 ............e...
          backtrace:
            [<000000005db27c8b>] kmem_cache_alloc_trace+0x348/0x5f0
            [<000000007df90faa>] graph_trace_open+0xb0/0x344
            [<00000000737524cd>] __tracing_open+0x450/0xb10
            [<0000000098043327>] tracing_open+0x1a0/0x2a0
            [<00000000291c3876>] do_dentry_open+0x3c0/0xdc0
            [<000000004015bcd6>] vfs_open+0x98/0xd0
            [<000000002b5f60c9>] do_open+0x520/0x8d0
            [<00000000376c7820>] path_openat+0x1c0/0x3e0
            [<00000000336a54b5>] do_filp_open+0x14c/0x324
            [<000000002802df13>] do_sys_openat2+0x2c4/0x530
            [<0000000094eea458>] __arm64_sys_openat+0x130/0x1c4
            [<00000000a71d7881>] el0_svc_common.constprop.0+0xfc/0x394
            [<00000000313647bf>] do_el0_svc+0xac/0xec
            [<000000002ef1c651>] el0_svc+0x20/0x30
            [<000000002fd4692a>] el0_sync_handler+0xb0/0xb4
            [<000000000c309c35>] el0_sync+0x160/0x180
      
      The root cause is descripted as follows:
      
        __tracing_open() {  // 1. File 'trace' is being opened;
          ...
          *iter->trace = *tr->current_trace;  // 2. Tracer 'function_graph' is
                                              //    currently set;
          ...
          iter->trace->open(iter);  // 3. Call graph_trace_open() here,
                                    //    and memory are allocated in it;
          ...
        }
      
        s_start() {  // 4. The opened file is being read;
          ...
          *iter->trace = *tr->current_trace;  // 5. If tracer is switched to
                                              //    'nop' or others, then memory
                                              //    in step 3 are leaked!!!
          ...
        }
      
      To fix it, in s_start(), close tracer before switching then reopen the
      new tracer after switching. And some tracers like 'wakeup' may not update
      'iter->private' in some cases when reopen, then it should be cleared
      to avoid being mistakenly closed again.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20230817125539.1646321-1-zhengyejian1@huawei.com
      
      Fixes: d7350c3f
      
       ("tracing/core: make the read callbacks reentrants")
      Signed-off-by: default avatarZheng Yejian <zhengyejian1@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2cb0c037
    • Zheng Yejian's avatar
      tracing: Fix cpu buffers unavailable due to 'record_disabled' missed · 7d0c2b0d
      Zheng Yejian authored
      [ Upstream commit b71645d6 ]
      
      Trace ring buffer can no longer record anything after executing
      following commands at the shell prompt:
      
        # cd /sys/kernel/tracing
        # cat tracing_cpumask
        fff
        # echo 0 > tracing_cpumask
        # echo 1 > snapshot
        # echo fff > tracing_cpumask
        # echo 1 > tracing_on
        # echo "hello world" > trace_marker
        -bash: echo: write error: Bad file descriptor
      
      The root cause is that:
        1. After `echo 0 > tracing_cpumask`, 'record_disabled' of cpu buffers
           in 'tr->array_buffer.buffer' became 1 (see tracing_set_cpumask());
        2. After `echo 1 > snapshot`, 'tr->array_buffer.buffer' is swapped
           with 'tr->max_buffer.buffer', then the 'record_disabled' became 0
           (see update_max_tr());
        3. After `echo fff > tracing_cpumask`, the 'record_disabled' become -1;
      Then array_buffer and max_buffer are both unavailable due to value of
      'record_disabled' is not 0.
      
      To fix it, enable or disable both array_buffer and max_buffer at the same
      time in tracing_set_cpumask().
      
      Link: https://lkml.kernel.org/r/20230805033816.3284594-2-zhengyejian1@huawei.com
      
      Cc: <mhiramat@kernel.org>
      Cc: <vnagarnaik@google.com>
      Cc: <shuah@kernel.org>
      Fixes: 71babb27
      
       ("tracing: change CPU ring buffer state from tracing_cpumask")
      Signed-off-by: default avatarZheng Yejian <zhengyejian1@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7d0c2b0d
    • Andi Shyti's avatar
      drm/i915/gt: Support aux invalidation on all engines · 7e862cce
      Andi Shyti authored
      [ Upstream commit 6a35f22d ]
      
      Perform some refactoring with the purpose of keeping in one
      single place all the operations around the aux table
      invalidation.
      
      With this refactoring add more engines where the invalidation
      should be performed.
      
      Fixes: 972282c4
      
       ("drm/i915/gen12: Add aux table invalidate for all engines")
      Signed-off-by: default avatarAndi Shyti <andi.shyti@linux.intel.com>
      Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
      Cc: Matt Roper <matthew.d.roper@intel.com>
      Cc: <stable@vger.kernel.org> # v5.8+
      Reviewed-by: default avatarAndrzej Hajda <andrzej.hajda@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230725001950.1014671-8-andi.shyti@linux.intel.com
      (cherry picked from commit 76ff7789
      
      )
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7e862cce
    • Jonathan Cavitt's avatar
      drm/i915/gt: Poll aux invalidation register bit on invalidation · 8e3f138b
      Jonathan Cavitt authored
      [ Upstream commit 0fde2f23 ]
      
      For platforms that use Aux CCS, wait for aux invalidation to
      complete by checking the aux invalidation register bit is
      cleared.
      
      Fixes: 972282c4
      
       ("drm/i915/gen12: Add aux table invalidate for all engines")
      Signed-off-by: default avatarJonathan Cavitt <jonathan.cavitt@intel.com>
      Signed-off-by: default avatarAndi Shyti <andi.shyti@linux.intel.com>
      Cc: <stable@vger.kernel.org> # v5.8+
      Reviewed-by: default avatarNirmoy Das <nirmoy.das@intel.com>
      Reviewed-by: default avatarAndrzej Hajda <andrzej.hajda@intel.com>
      Reviewed-by: default avatarMatt Roper <matthew.d.roper@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230725001950.1014671-7-andi.shyti@linux.intel.com
      (cherry picked from commit d459c86f
      
      )
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8e3f138b
    • Jonathan Cavitt's avatar
      drm/i915/gt: Ensure memory quiesced before invalidation · 017d4404
      Jonathan Cavitt authored
      [ Upstream commit 78a6ccd6 ]
      
      All memory traffic must be quiesced before requesting
      an aux invalidation on platforms that use Aux CCS.
      
      Fixes: 972282c4
      
       ("drm/i915/gen12: Add aux table invalidate for all engines")
      Requires: a2a4aa0eef3b ("drm/i915: Add the gen12_needs_ccs_aux_inv helper")
      Signed-off-by: default avatarJonathan Cavitt <jonathan.cavitt@intel.com>
      Signed-off-by: default avatarAndi Shyti <andi.shyti@linux.intel.com>
      Cc: <stable@vger.kernel.org> # v5.8+
      Reviewed-by: default avatarNirmoy Das <nirmoy.das@intel.com>
      Reviewed-by: default avatarAndrzej Hajda <andrzej.hajda@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230725001950.1014671-4-andi.shyti@linux.intel.com
      (cherry picked from commit ad8ebf12
      
      )
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      017d4404
    • Andi Shyti's avatar
      drm/i915: Add the gen12_needs_ccs_aux_inv helper · c23126f2
      Andi Shyti authored
      [ Upstream commit b2f59e90
      
       ]
      
      We always assumed that a device might either have AUX or FLAT
      CCS, but this is an approximation that is not always true, e.g.
      PVC represents an exception.
      
      Set the basis for future finer selection by implementing a
      boolean gen12_needs_ccs_aux_inv() function that tells whether aux
      invalidation is needed or not.
      
      Currently PVC is the only exception to the above mentioned rule.
      
      Requires: 059ae7ae2a1c ("drm/i915/gt: Cleanup aux invalidation registers")
      Signed-off-by: default avatarAndi Shyti <andi.shyti@linux.intel.com>
      Cc: Matt Roper <matthew.d.roper@intel.com>
      Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
      Cc: <stable@vger.kernel.org> # v5.8+
      Reviewed-by: default avatarMatt Roper <matthew.d.roper@intel.com>
      Reviewed-by: default avatarAndrzej Hajda <andrzej.hajda@intel.com>
      Reviewed-by: default avatarNirmoy Das <nirmoy.das@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230725001950.1014671-3-andi.shyti@linux.intel.com
      (cherry picked from commit c827655b
      
      )
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c23126f2
    • Harald Freudenberger's avatar
      s390/zcrypt: fix reply buffer calculations for CCA replies · d4f5dcf6
      Harald Freudenberger authored
      [ Upstream commit 4cfca532
      
       ]
      
      The length information for available buffer space for CCA
      replies is covered with two fields in the T6 header prepended
      on each CCA reply: fromcardlen1 and fromcardlen2. The sum of
      these both values must not exceed the AP bus limit for this
      card (24KB for CEX8, 12KB CEX7 and older) minus the always
      present headers.
      
      The current code adjusted the fromcardlen2 value in case
      of exceeding the AP bus limit when there was a non-zero
      value given from userspace. Some tests now showed that this
      was the wrong assumption. Instead the userspace value given for
      this field should always be trusted and if the sum of the
      two fields exceeds the AP bus limit for this card the first
      field fromcardlen1 should be adjusted instead.
      
      So now the calculation is done with this new insight in mind.
      Also some additional checks for overflow have been introduced
      and some comments to provide some documentation for future
      maintainers of this complicated calculation code.
      
      Furthermore the 128 bytes of fix overhead which is used
      in the current code is not correct. Investigations showed
      that for a reply always the same two header structs are
      prepended before a possible payload. So this is also fixed
      with this patch.
      
      Signed-off-by: default avatarHarald Freudenberger <freude@linux.ibm.com>
      Reviewed-by: default avatarHolger Dengler <dengler@linux.ibm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d4f5dcf6
    • Yu Zhe's avatar
      s390/zcrypt: remove unnecessary (void *) conversions · 246d763b
      Yu Zhe authored
      [ Upstream commit 72c2112c
      
       ]
      
      Pointer variables of void * type do not require type cast.
      
      Signed-off-by: default avatarYu Zhe <yuzhe@nfschina.com>
      Reviewed-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Link: https://lore.kernel.org/r/20230303052155.21072-1-yuzhe@nfschina.com
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Stable-dep-of: 4cfca532
      
       ("s390/zcrypt: fix reply buffer calculations for CCA replies")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      246d763b
    • Eric Dumazet's avatar
      can: raw: fix lockdep issue in raw_release() · 40dafcab
      Eric Dumazet authored
      [ Upstream commit 11c9027c ]
      
      syzbot complained about a lockdep issue [1]
      
      Since raw_bind() and raw_setsockopt() first get RTNL
      before locking the socket, we must adopt the same order in raw_release()
      
      [1]
      WARNING: possible circular locking dependency detected
      6.5.0-rc1-syzkaller-00192-g78adb4bcf99e #0 Not tainted
      ------------------------------------------------------
      syz-executor.0/14110 is trying to acquire lock:
      ffff88804e4b6130 (sk_lock-AF_CAN){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1708 [inline]
      ffff88804e4b6130 (sk_lock-AF_CAN){+.+.}-{0:0}, at: raw_bind+0xb1/0xab0 net/can/raw.c:435
      
      but task is already holding lock:
      ffffffff8e3df368 (rtnl_mutex){+.+.}-{3:3}, at: raw_bind+0xa7/0xab0 net/can/raw.c:434
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (rtnl_mutex){+.+.}-{3:3}:
      __mutex_lock_common kernel/locking/mutex.c:603 [inline]
      __mutex_lock+0x181/0x1340 kernel/locking/mutex.c:747
      raw_release+0x1c6/0x9b0 net/can/raw.c:391
      __sock_release+0xcd/0x290 net/socket.c:654
      sock_close+0x1c/0x20 net/socket.c:1386
      __fput+0x3fd/0xac0 fs/file_table.c:384
      task_work_run+0x14d/0x240 kernel/task_work.c:179
      resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
      exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
      exit_to_user_mode_prepare+0x210/0x240 kernel/entry/common.c:204
      __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
      syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:297
      do_syscall_64+0x44/0xb0 arch/x86/entry/common.c:86
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      -> #0 (sk_lock-AF_CAN){+.+.}-{0:0}:
      check_prev_add kernel/locking/lockdep.c:3142 [inline]
      check_prevs_add kernel/locking/lockdep.c:3261 [inline]
      validate_chain kernel/locking/lockdep.c:3876 [inline]
      __lock_acquire+0x2e3d/0x5de0 kernel/locking/lockdep.c:5144
      lock_acquire kernel/locking/lockdep.c:5761 [inline]
      lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5726
      lock_sock_nested+0x3a/0xf0 net/core/sock.c:3492
      lock_sock include/net/sock.h:1708 [inline]
      raw_bind+0xb1/0xab0 net/can/raw.c:435
      __sys_bind+0x1ec/0x220 net/socket.c:1792
      __do_sys_bind net/socket.c:1803 [inline]
      __se_sys_bind net/socket.c:1801 [inline]
      __x64_sys_bind+0x72/0xb0 net/socket.c:1801
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      other info that might help us debug this:
      
      Possible unsafe locking scenario:
      
      CPU0 CPU1
      ---- ----
      lock(rtnl_mutex);
              lock(sk_lock-AF_CAN);
              lock(rtnl_mutex);
      lock(sk_lock-AF_CAN);
      
      *** DEADLOCK ***
      
      1 lock held by syz-executor.0/14110:
      
      stack backtrace:
      CPU: 0 PID: 14110 Comm: syz-executor.0 Not tainted 6.5.0-rc1-syzkaller-00192-g78adb4bcf99e #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023
      Call Trace:
      <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106
      check_noncircular+0x311/0x3f0 kernel/locking/lockdep.c:2195
      check_prev_add kernel/locking/lockdep.c:3142 [inline]
      check_prevs_add kernel/locking/lockdep.c:3261 [inline]
      validate_chain kernel/locking/lockdep.c:3876 [inline]
      __lock_acquire+0x2e3d/0x5de0 kernel/locking/lockdep.c:5144
      lock_acquire kernel/locking/lockdep.c:5761 [inline]
      lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5726
      lock_sock_nested+0x3a/0xf0 net/core/sock.c:3492
      lock_sock include/net/sock.h:1708 [inline]
      raw_bind+0xb1/0xab0 net/can/raw.c:435
      __sys_bind+0x1ec/0x220 net/socket.c:1792
      __do_sys_bind net/socket.c:1803 [inline]
      __se_sys_bind net/socket.c:1801 [inline]
      __x64_sys_bind+0x72/0xb0 net/socket.c:1801
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x7fd89007cb29
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fd890d2a0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
      RAX: ffffffffffffffda RBX: 00007fd89019bf80 RCX: 00007fd89007cb29
      RDX: 0000000000000010 RSI: 0000000020000040 RDI: 0000000000000003
      RBP: 00007fd8900c847a R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000000000b R14: 00007fd89019bf80 R15: 00007ffebf8124f8
      </TASK>
      
      Fixes: ee8b94c8
      
       ("can: raw: fix receiver memory leak")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Ziyang Xuan <william.xuanziyang@huawei.com>
      Cc: Oliver Hartkopp <socketcan@hartkopp.net>
      Cc: stable@vger.kernel.org
      Cc: Marc Kleine-Budde <mkl@pengutronix.de>
      Link: https://lore.kernel.org/all/20230720114438.172434-1-edumazet@google.com
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      40dafcab
    • Ziyang Xuan's avatar
      can: raw: fix receiver memory leak · 335987e2
      Ziyang Xuan authored
      [ Upstream commit ee8b94c8 ]
      
      Got kmemleak errors with the following ltp can_filter testcase:
      
      for ((i=1; i<=100; i++))
      do
              ./can_filter &
              sleep 0.1
      done
      
      ==============================================================
      [<00000000db4a4943>] can_rx_register+0x147/0x360 [can]
      [<00000000a289549d>] raw_setsockopt+0x5ef/0x853 [can_raw]
      [<000000006d3d9ebd>] __sys_setsockopt+0x173/0x2c0
      [<00000000407dbfec>] __x64_sys_setsockopt+0x61/0x70
      [<00000000fd468496>] do_syscall_64+0x33/0x40
      [<00000000b7e47d51>] entry_SYSCALL_64_after_hwframe+0x61/0xc6
      
      It's a bug in the concurrent scenario of unregister_netdevice_many()
      and raw_release() as following:
      
                   cpu0                                        cpu1
      unregister_netdevice_many(can_dev)
        unlist_netdevice(can_dev) // dev_get_by_index() return NULL after this
        net_set_todo(can_dev)
      						raw_release(can_socket)
      						  dev = dev_get_by_index(, ro->ifindex); // dev == NULL
      						  if (dev) { // receivers in dev_rcv_lists not free because dev is NULL
      						    raw_disable_allfilters(, dev, );
      						    dev_put(dev);
      						  }
      						  ...
      						  ro->bound = 0;
      						  ...
      
      call_netdevice_notifiers(NETDEV_UNREGISTER, )
        raw_notify(, NETDEV_UNREGISTER, )
          if (ro->bound) // invalid because ro->bound has been set 0
            raw_disable_allfilters(, dev, ); // receivers in dev_rcv_lists will never be freed
      
      Add a net_device pointer member in struct raw_sock to record bound
      can_dev, and use rtnl_lock to serialize raw_socket members between
      raw_bind(), raw_release(), raw_setsockopt() and raw_notify(). Use
      ro->dev to decide whether to free receivers in dev_rcv_lists.
      
      Fixes: 8d0caedb
      
       ("can: bcm/raw/isotp: use per module netdevice notifier")
      Reviewed-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Acked-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Link: https://lore.kernel.org/all/20230711011737.1969582-1-william.xuanziyang@huawei.com
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      335987e2
    • Zhang Yi's avatar
      jbd2: fix a race when checking checkpoint buffer busy · e5c768d8
      Zhang Yi authored
      [ Upstream commit 46f881b5
      
       ]
      
      Before removing checkpoint buffer from the t_checkpoint_list, we have to
      check both BH_Dirty and BH_Lock bits together to distinguish buffers
      have not been or were being written back. But __cp_buffer_busy() checks
      them separately, it first check lock state and then check dirty, the
      window between these two checks could be raced by writing back
      procedure, which locks buffer and clears buffer dirty before I/O
      completes. So it cannot guarantee checkpointing buffers been written
      back to disk if some error happens later. Finally, it may clean
      checkpoint transactions and lead to inconsistent filesystem.
      
      jbd2_journal_forget() and __journal_try_to_free_buffer() also have the
      same problem (journal_unmap_buffer() escape from this issue since it's
      running under the buffer lock), so fix them through introducing a new
      helper to try holding the buffer lock and remove really clean buffer.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
      Cc: stable@vger.kernel.org
      Suggested-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarZhang Yi <yi.zhang@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20230606135928.434610-6-yi.zhang@huaweicloud.com
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e5c768d8
    • Zhang Yi's avatar
      jbd2: remove journal_clean_one_cp_list() · 5fda50e2
      Zhang Yi authored
      [ Upstream commit b98dba27
      
       ]
      
      journal_clean_one_cp_list() and journal_shrink_one_cp_list() are almost
      the same, so merge them into journal_shrink_one_cp_list(), remove the
      nr_to_scan parameter, always scan and try to free the whole checkpoint
      list.
      
      Signed-off-by: default avatarZhang Yi <yi.zhang@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20230606135928.434610-4-yi.zhang@huaweicloud.com
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Stable-dep-of: 46f881b5
      
       ("jbd2: fix a race when checking checkpoint buffer busy")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5fda50e2
    • Zhang Yi's avatar
      jbd2: remove t_checkpoint_io_list · 8168c96c
      Zhang Yi authored
      [ Upstream commit be222553
      
       ]
      
      Since t_checkpoint_io_list was stop using in jbd2_log_do_checkpoint()
      now, it's time to remove the whole t_checkpoint_io_list logic.
      
      Signed-off-by: default avatarZhang Yi <yi.zhang@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20230606135928.434610-3-yi.zhang@huaweicloud.com
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Stable-dep-of: 46f881b5
      
       ("jbd2: fix a race when checking checkpoint buffer busy")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8168c96c
    • Jiaxun Yang's avatar
      MIPS: cpu-features: Use boot_cpu_type for CPU type based features · 1fa68a78
      Jiaxun Yang authored
      [ Upstream commit 5487a7b6
      
       ]
      
      Some CPU feature macros were using current_cpu_type to mark feature
      availability.
      
      However current_cpu_type will use smp_processor_id, which is prohibited
      under preemptable context.
      
      Since those features are all uniform on all CPUs in a SMP system, use
      boot_cpu_type instead of current_cpu_type to fix preemptable kernel.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJiaxun Yang <jiaxun.yang@flygoat.com>
      Signed-off-by: default avatarThomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1fa68a78
    • Jiaxun Yang's avatar
      MIPS: cpu-features: Enable octeon_cache by cpu_type · 92c568c8
      Jiaxun Yang authored
      [ Upstream commit f6415194
      
       ]
      
      cpu_has_octeon_cache was tied to 0 for generic cpu-features,
      whith this generic kernel built for octeon CPU won't boot.
      
      Just enable this flag by cpu_type. It won't hurt orther platforms
      because compiler will eliminate the code path on other processors.
      
      Signed-off-by: default avatarJiaxun Yang <jiaxun.yang@flygoat.com>
      Signed-off-by: default avatarThomas Bogendoerfer <tsbogend@alpha.franken.de>
      Stable-dep-of: 5487a7b6
      
       ("MIPS: cpu-features: Use boot_cpu_type for CPU type based features")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      92c568c8
    • Igor Mammedov's avatar
      PCI: acpiphp: Reassign resources on bridge if necessary · 3e4d038d
      Igor Mammedov authored
      [ Upstream commit 40613da5
      
       ]
      
      When using ACPI PCI hotplug, hotplugging a device with large BARs may fail
      if bridge windows programmed by firmware are not large enough.
      
      Reproducer:
        $ qemu-kvm -monitor stdio -M q35  -m 4G \
            -global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=on \
            -device id=rp1,pcie-root-port,bus=pcie.0,chassis=4 \
            disk_image
      
       wait till linux guest boots, then hotplug device:
         (qemu) device_add qxl,bus=rp1
      
       hotplug on guest side fails with:
         pci 0000:01:00.0: [1b36:0100] type 00 class 0x038000
         pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x03ffffff]
         pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x03ffffff]
         pci 0000:01:00.0: reg 0x18: [mem 0x00000000-0x00001fff]
         pci 0000:01:00.0: reg 0x1c: [io  0x0000-0x001f]
         pci 0000:01:00.0: BAR 0: no space for [mem size 0x04000000]
         pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x04000000]
         pci 0000:01:00.0: BAR 1: no space for [mem size 0x04000000]
         pci 0000:01:00.0: BAR 1: failed to assign [mem size 0x04000000]
         pci 0000:01:00.0: BAR 2: assigned [mem 0xfe800000-0xfe801fff]
         pci 0000:01:00.0: BAR 3: assigned [io  0x1000-0x101f]
         qxl 0000:01:00.0: enabling device (0000 -> 0003)
         Unable to create vram_mapping
         qxl: probe of 0000:01:00.0 failed with error -12
      
      However when using native PCIe hotplug
        '-global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off'
      it works fine, since kernel attempts to reassign unused resources.
      
      Use the same machinery as native PCIe hotplug to (re)assign resources.
      
      Link: https://lore.kernel.org/r/20230424191557.2464760-1-imammedo@redhat.com
      Signed-off-by: default avatarIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarRafael J. Wysocki <rafael@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3e4d038d
    • Daniel Vetter's avatar
      video/aperture: Move vga handling to pci function · 28916927
      Daniel Vetter authored
      [ Upstream commit f1d599d3
      
       ]
      
      A few reasons for this:
      
      - It's really the only one where this matters. I tried looking around,
        and I didn't find any non-pci vga-compatible controllers for x86
        (since that's the only platform where we had this until a few
        patches ago), where a driver participating in the aperture claim
        dance would interfere.
      
      - I also don't expect that any future bus anytime soon will
        not just look like pci towards the OS, that's been the case for like
        25+ years by now for practically everything (even non non-x86).
      
      - Also it's a bit funny if we have one part of the vga removal in the
        pci function, and the other in the generic one.
      
      v2: Rebase.
      
      v4:
      - fix Daniel's S-o-b address
      
      v5:
      - add back an S-o-b tag with Daniel's Intel address
      
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Signed-off-by: default avatarThomas Zimmermann <tzimmermann@suse.de>
      Cc: Thomas Zimmermann <tzimmermann@suse.de>
      Cc: Javier Martinez Canillas <javierm@redhat.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: linux-fbdev@vger.kernel.org
      Reviewed-by: default avatarJavier Martinez Canillas <javierm@redhat.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230406132109.32050-6-tzimmermann@suse.de
      Stable-dep-of: 5ae3716c
      
       ("video/aperture: Only remove sysfb on the default vga pci device")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      28916927
    • Daniel Vetter's avatar
      video/aperture: Only kick vgacon when the pdev is decoding vga · 4aad3b82
      Daniel Vetter authored
      [ Upstream commit 7450cd23
      
       ]
      
      Otherwise it's a bit silly, and we might throw out the driver for the
      screen the user is actually looking at. I haven't found a bug report
      for this case yet, but we did get bug reports for the analog case
      where we're throwing out the efifb driver.
      
      v2: Flip the check around to make it clear it's a special case for
      kicking out the vgacon driver only (Thomas)
      
      v4:
      - fixes to commit message
      - fix Daniel's S-o-b address
      
      v5:
      - add back an S-o-b tag with Daniel's Intel address
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216303
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Signed-off-by: default avatarThomas Zimmermann <tzimmermann@suse.de>
      Cc: Thomas Zimmermann <tzimmermann@suse.de>
      Cc: Javier Martinez Canillas <javierm@redhat.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: linux-fbdev@vger.kernel.org
      Reviewed-by: default avatarJavier Martinez Canillas <javierm@redhat.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230406132109.32050-5-tzimmermann@suse.de
      Stable-dep-of: 5ae3716c
      
       ("video/aperture: Only remove sysfb on the default vga pci device")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4aad3b82
    • Daniel Vetter's avatar
      drm/aperture: Remove primary argument · 437e99f2
      Daniel Vetter authored
      [ Upstream commit 62aeaeaa
      
       ]
      
      Only really pci devices have a business setting this - it's for
      figuring out whether the legacy vga stuff should be nuked too. And
      with the preceding two patches those are all using the pci version of
      this.
      
      Which means for all other callers primary == false and we can remove
      it now.
      
      v2:
      - Reorder to avoid compile fail (Thomas)
      - Include gma500, which retained it's called to the non-pci version.
      
      v4:
      - fix Daniel's S-o-b address
      
      v5:
      - add back an S-o-b tag with Daniel's Intel address
      
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Signed-off-by: default avatarThomas Zimmermann <tzimmermann@suse.de>
      Cc: Thomas Zimmermann <tzimmermann@suse.de>
      Cc: Javier Martinez Canillas <javierm@redhat.com>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Maxime Ripard <mripard@kernel.org>
      Cc: Deepak Rawat <drawat.floss@gmail.com>
      Cc: Neil Armstrong <neil.armstrong@linaro.org>
      Cc: Kevin Hilman <khilman@baylibre.com>
      Cc: Jerome Brunet <jbrunet@baylibre.com>
      Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
      Cc: Thierry Reding <thierry.reding@gmail.com>
      Cc: Jonathan Hunter <jonathanh@nvidia.com>
      Cc: Emma Anholt <emma@anholt.net>
      Cc: Helge Deller <deller@gmx.de>
      Cc: David Airlie <airlied@gmail.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: linux-hyperv@vger.kernel.org
      Cc: linux-amlogic@lists.infradead.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-tegra@vger.kernel.org
      Cc: linux-fbdev@vger.kernel.org
      Acked-by: default avatarMartin Blumenstingl <martin.blumenstingl@googlemail.com>
      Acked-by: default avatarThierry Reding <treding@nvidia.com>
      Reviewed-by: default avatarJavier Martinez Canillas <javierm@redhat.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230406132109.32050-4-tzimmermann@suse.de
      Stable-dep-of: 5ae3716c
      
       ("video/aperture: Only remove sysfb on the default vga pci device")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      437e99f2
    • Daniel Vetter's avatar
      drm/gma500: Use drm_aperture_remove_conflicting_pci_framebuffers · cccfcbb9
      Daniel Vetter authored
      [ Upstream commit 80e99398
      
       ]
      
      This one nukes all framebuffers, which is a bit much. In reality
      gma500 is igpu and never shipped with anything discrete, so there should
      not be any difference.
      
      v2: Unfortunately the framebuffer sits outside of the pci bars for
      gma500, and so only using the pci helpers won't be enough. Otoh if we
      only use non-pci helper, then we don't get the vga handling, and
      subsequent refactoring to untangle these special cases won't work.
      
      It's not pretty, but the simplest fix (since gma500 really is the only
      quirky pci driver like this we have) is to just have both calls.
      
      v4:
      - fix Daniel's S-o-b address
      
      v5:
      - add back an S-o-b tag with Daniel's Intel address
      
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Signed-off-by: default avatarThomas Zimmermann <tzimmermann@suse.de>
      Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
      Cc: Thomas Zimmermann <tzimmermann@suse.de>
      Cc: Javier Martinez Canillas <javierm@redhat.com>
      Reviewed-by: default avatarJavier Martinez Canillas <javierm@redhat.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230406132109.32050-2-tzimmermann@suse.de
      Stable-dep-of: 5ae3716c
      
       ("video/aperture: Only remove sysfb on the default vga pci device")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      cccfcbb9
    • Daniel Vetter's avatar
      fbdev/radeon: use pci aperture helpers · 6db53af1
      Daniel Vetter authored
      [ Upstream commit 9b539c4d
      
       ]
      
      It's not exactly the same since the open coded version doesn't set
      primary correctly. But that's a bugfix, so shouldn't hurt really.
      
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: linux-fbdev@vger.kernel.org
      Reviewed-by: default avatarThomas Zimmermann <tzimmermann@suse.de>
      Signed-off-by: default avatarThomas Zimmermann <tzimmermann@suse.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230111154112.90575-7-daniel.vetter@ffwll.ch
      Stable-dep-of: 5ae3716c
      
       ("video/aperture: Only remove sysfb on the default vga pci device")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6db53af1
    • Daniel Vetter's avatar
      drm/ast: Use drm_aperture_remove_conflicting_pci_framebuffers · cd1f889c
      Daniel Vetter authored
      [ Upstream commit c1ebead3
      
       ]
      
      It's just open coded and matches.
      
      Note that Thomas said that his version apparently failed for some
      reason, but hey maybe we should try again.
      
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Thomas Zimmermann <tzimmermann@suse.de>
      Cc: Javier Martinez Canillas <javierm@redhat.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: linux-fbdev@vger.kernel.org
      Tested-by: default avatarThomas Zimmmermann <tzimmermann@suse.de>
      Reviewed-by: default avatarThomas Zimmermann <tzimmermann@suse.de>
      Signed-off-by: default avatarThomas Zimmermann <tzimmermann@suse.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230111154112.90575-1-daniel.vetter@ffwll.ch
      Stable-dep-of: 5ae3716c
      
       ("video/aperture: Only remove sysfb on the default vga pci device")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      cd1f889c
    • Chuck Lever's avatar
      xprtrdma: Remap Receive buffers after a reconnect · 26ea8668
      Chuck Lever authored
      [ Upstream commit 895cedc1 ]
      
      On server-initiated disconnect, rpcrdma_xprt_disconnect() was DMA-
      unmapping the Receive buffers, but rpcrdma_post_recvs() neglected
      to remap them after a new connection had been established. The
      result was immediate failure of the new connection with the Receives
      flushing with LOCAL_PROT_ERR.
      
      Fixes: 671c450b
      
       ("xprtrdma: Fix oops in Receive handler after device removal")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      26ea8668
    • Fedor Pchelkin's avatar
      NFSv4: fix out path in __nfs4_get_acl_uncached · d9aac9cd
      Fedor Pchelkin authored
      [ Upstream commit f4e89f1a ]
      
      Another highly rare error case when a page allocating loop (inside
      __nfs4_get_acl_uncached, this time) is not properly unwound on error.
      Since pages array is allocated being uninitialized, need to free only
      lower array indices. NULL checks were useful before commit 62a1573f
      ("NFSv4 fix acl retrieval over krb5i/krb5p mounts") when the array had
      been initialized to zero on stack.
      
      Found by Linux Verification Center (linuxtesting.org).
      
      Fixes: 62a1573f
      
       ("NFSv4 fix acl retrieval over krb5i/krb5p mounts")
      Signed-off-by: default avatarFedor Pchelkin <pchelkin@ispras.ru>
      Reviewed-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d9aac9cd