Skip to content
  1. Jan 23, 2023
    • David S. Miller's avatar
      Merge branch 'ethtool-mac-merge' · 0ad999c1
      David S. Miller authored
      
      
      Vladimir Oltean say:
      
      ====================
      ethtool support for IEEE 802.3 MAC Merge layer
      
      Change log
      ----------
      
      v3->v4:
      - add missing opening bracket in ocelot_port_mm_irq()
      - moved cfg.verify_time range checking so that it actually takes place
        for the updated rather than old value
      v3 at:
      https://patchwork.kernel.org/project/netdevbpf/cover/20230117085947.2176464-1-vladimir.oltean@nxp.com/
      
      v2->v3:
      - made get_mm return int instead of void
      - deleted ETHTOOL_A_MM_SUPPORTED
      - renamed ETHTOOL_A_MM_ADD_FRAG_SIZE to ETHTOOL_A_MM_TX_MIN_FRAG_SIZE
      - introduced ETHTOOL_A_MM_RX_MIN_FRAG_SIZE
      - cleaned up documentation
      - rebased on top of PLCA changes
      - renamed ETHTOOL_STATS_SRC_* to ETHTOOL_MAC_STATS_SRC_*
      v2 at:
      https://patchwork.kernel.org/project/netdevbpf/cover/20230111161706.1465242-1-vladimir.oltean@nxp.com/
      
      v1->v2:
      I've decided to focus just on the MAC Merge layer for now, which is why
      I am able to submit this patch set as non-RFC.
      v1 (RFC) at:
      https://patchwork.kernel.org/project/netdevbpf/cover/20220816222920.1952936-1-vladimir.oltean@nxp.com/
      
      What is being introduced
      ------------------------
      
      TL;DR: a MAC Merge layer as defined by IEEE 802.3-2018, clause 99
      (interspersing of express traffic). This is controlled through ethtool
      netlink (ETHTOOL_MSG_MM_GET, ETHTOOL_MSG_MM_SET). The raw ethtool
      commands are posted here:
      https://patchwork.kernel.org/project/netdevbpf/cover/20230111153638.1454687-1-vladimir.oltean@nxp.com/
      
      The MAC Merge layer has its own statistics counters
      (ethtool --include-statistics --show-mm swp0) as well as two member
      MACs, the statistics of which can be queried individually, through a new
      ethtool netlink attribute, corresponding to:
      
      $ ethtool -I --show-pause eno2 --src aggregate
      $ ethtool -S eno2 --groups eth-mac eth-phy eth-ctrl rmon -- --src pmac
      
      The core properties of the MAC Merge layer are described in great detail
      in patches 02/12 and 03/12. They can be viewed in "make htmldocs" format.
      
      Devices for which the API is supported
      --------------------------------------
      
      I decided to start with the Ethernet switch on NXP LS1028A (Felix)
      because of the smaller patch set. I also have support for the ENETC
      controller pending.
      
      I would like to get confirmation that the UAPI being proposed here will
      not restrict any use cases known by other hardware vendors.
      
      Why is support for preemptible traffic classes not here?
      --------------------------------------------------------
      
      There is legitimate concern whether the 802.1Q portion of the standard
      (which traffic classes go to the eMAC and which to the pMAC) should be
      modeled in Linux using tc or using another UAPI. I think that is
      stalling the entire series, but should be discussed separately instead.
      Removing FP adminStatus support makes me confident enough to submit this
      patch set without an RFC tag (meaning: I wouldn't mind if it was merged
      as is).
      
      What is submitted here is sufficient for an LLDP daemon to do its job.
      I've patched openlldp to advertise and configure frame preemption:
      https://github.com/vladimiroltean/openlldp/tree/frame-preemption-v3
      
      In case someone wants to try it out, here are some commands I've used.
      
       # Configure the interfaces to receive and transmit LLDP Data Units
       lldptool -L -i eno0 adminStatus=rxtx
       lldptool -L -i swp0 adminStatus=rxtx
       # Enable the transmission of certain TLVs on switch's interface
       lldptool -T -i eno0 -V addEthCap enableTx=yes
       lldptool -T -i swp0 -V addEthCap enableTx=yes
       # Query LLDP statistics on switch's interface
       lldptool -S -i swp0
       # Query the received neighbor TLVs
       lldptool -i swp0 -t -n -V addEthCap
       Additional Ethernet Capabilities TLV
               Preemption capability supported
               Preemption capability enabled
               Preemption capability active
               Additional fragment size: 60 octets
      
      So using this patch set, lldpad will be able to advertise and configure
      frame preemption, but still, no data packet will be sent as preemptible
      over the link, because there is no UAPI to control which traffic classes
      are sent as preemptible and which as express.
      
      Preemptable or preemptible?
      ---------------------------
      
      IEEE 802.3 uses "preemptable" throughout. IEEE 802.1Q uses "preemptible"
      throughout. Because the definition of "preemptible" falls under 802.1Q's
      jurisdiction and 802.3 just references it, I went with the 802.1Q naming
      even where supporting an 802.3 feature. Also, checkpatch agrees with this.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ad999c1
    • Vladimir Oltean's avatar
      net: ethtool: netlink: introduce ethnl_update_bool() · 7c494a77
      Vladimir Oltean authored
      
      
      Due to the fact that the kernel-side data structures have been carried
      over from the ioctl-based ethtool, we are now in the situation where we
      have an ethnl_update_bool32() function, but the plain function that
      operates on a boolean value kept in an actual u8 netlink attribute
      doesn't exist.
      
      With new ethtool features that are exposed solely over netlink, the
      kernel data structures will use the "bool" type, so we will need this
      kind of helper. Introduce it now; it's needed for things like
      verify-disabled for the MAC merge configuration.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c494a77
    • Wei Fang's avatar
      net: fec: Use page_pool_put_full_page when freeing rx buffers · e38553bd
      Wei Fang authored
      The page_pool_release_page was used when freeing rx buffers, and this
      function just unmaps the page (if mapped) and does not recycle the page.
      So after hundreds of down/up the eth0, the system will out of memory.
      For more details, please refer to the following reproduce steps and
      bug logs. To solve this issue and refer to the doc of page pool, the
      page_pool_put_full_page should be used to replace page_pool_release_page.
      Because this API will try to recycle the page if the page refcnt equal to
      1. After testing 20000 times, the issue can not be reproduced anymore
      (about testing 391 times the issue will occur on i.MX8MN-EVK before).
      
      Reproduce steps:
      Create the test script and run the script. The script content is as
      follows:
      LOOPS=20000
      i=1
      while [ $i -le $LOOPS ]
      do
          echo "TINFO:ENET $curface up and down test $i times"
          org_macaddr=$(cat /sys/class/net/eth0/address)
          ifconfig eth0 down
          ifconfig eth0  hw ether $org_macaddr up
          i=$(expr $i + 1)
      done
      sleep 5
      if cat /sys/class/net/eth0/operstate | grep 'up';then
          echo "TEST PASS"
      else
          echo "TEST FAIL"
      fi
      
      Bug detail logs:
      TINFO:ENET  up and down test 391 times
      [  850.471205] Qualcomm Atheros AR8031/AR8033 30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=30be0000.ethernet-1:00, irq=POLL)
      [  853.535318] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
      [  853.541694] fec 30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
      [  870.590531] page_pool_release_retry() stalled pool shutdown 199 inflight 60 sec
      [  931.006557] page_pool_release_retry() stalled pool shutdown 199 inflight 120 sec
      TINFO:ENET  up and down test 392 times
      [  991.426544] page_pool_release_retry() stalled pool shutdown 192 inflight 181 sec
      [ 1051.838531] page_pool_release_retry() stalled pool shutdown 170 inflight 241 sec
      [ 1093.751217] Qualcomm Atheros AR8031/AR8033 30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=30be0000.ethernet-1:00, irq=POLL)
      [ 1096.446520] page_pool_release_retry() stalled pool shutdown 308 inflight 60 sec
      [ 1096.831245] fec 30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
      [ 1096.839092] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
      [ 1112.254526] page_pool_release_retry() stalled pool shutdown 103 inflight 302 sec
      [ 1156.862533] page_pool_release_retry() stalled pool shutdown 308 inflight 120 sec
      [ 1172.674516] page_pool_release_retry() stalled pool shutdown 103 inflight 362 sec
      [ 1217.278532] page_pool_release_retry() stalled pool shutdown 308 inflight 181 sec
      TINFO:ENET  up and down test 393 times
      [ 1233.086535] page_pool_release_retry() stalled pool shutdown 103 inflight 422 sec
      [ 1277.698513] page_pool_release_retry() stalled pool shutdown 308 inflight 241 sec
      [ 1293.502525] page_pool_release_retry() stalled pool shutdown 86 inflight 483 sec
      [ 1338.110518] page_pool_release_retry() stalled pool shutdown 308 inflight 302 sec
      [ 1353.918540] page_pool_release_retry() stalled pool shutdown 32 inflight 543 sec
      [ 1361.179205] Qualcomm Atheros AR8031/AR8033 30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=30be0000.ethernet-1:00, irq=POLL)
      [ 1364.255298] fec 30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
      [ 1364.263189] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
      [ 1371.998532] page_pool_release_retry() stalled pool shutdown 310 inflight 60 sec
      [ 1398.530542] page_pool_release_retry() stalled pool shutdown 308 inflight 362 sec
      [ 1414.334539] page_pool_release_retry() stalled pool shutdown 16 inflight 604 sec
      [ 1432.414520] page_pool_release_retry() stalled pool shutdown 310 inflight 120 sec
      [ 1458.942523] page_pool_release_retry() stalled pool shutdown 308 inflight 422 sec
      [ 1474.750521] page_pool_release_retry() stalled pool shutdown 16 inflight 664 sec
      TINFO:ENET  up and down test 394 times
      [ 1492.830522] page_pool_release_retry() stalled pool shutdown 310 inflight 181 sec
      [ 1519.358519] page_pool_release_retry() stalled pool shutdown 308 inflight 483 sec
      [ 1535.166545] page_pool_release_retry() stalled pool shutdown 2 inflight 724 sec
      [ 1537.090278] eth_test2.sh invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=0
      [ 1537.101192] CPU: 3 PID: 2379 Comm: eth_test2.sh Tainted: G         C         6.1.1+g56321e101aca #1
      [ 1537.110249] Hardware name: NXP i.MX8MNano EVK board (DT)
      [ 1537.115561] Call trace:
      [ 1537.118005]  dump_backtrace.part.0+0xe0/0xf0
      [ 1537.122289]  show_stack+0x18/0x40
      [ 1537.125608]  dump_stack_lvl+0x64/0x80
      [ 1537.129276]  dump_stack+0x18/0x34
      [ 1537.132592]  dump_header+0x44/0x208
      [ 1537.136083]  oom_kill_process+0x2b4/0x2c0
      [ 1537.140097]  out_of_memory+0xe4/0x594
      [ 1537.143766]  __alloc_pages+0xb68/0xd00
      [ 1537.147521]  alloc_pages+0xac/0x160
      [ 1537.151013]  __get_free_pages+0x14/0x40
      [ 1537.154851]  pgd_alloc+0x1c/0x30
      [ 1537.158082]  mm_init+0xf8/0x1d0
      [ 1537.161228]  mm_alloc+0x48/0x60
      [ 1537.164368]  alloc_bprm+0x7c/0x240
      [ 1537.167777]  do_execveat_common.isra.0+0x70/0x240
      [ 1537.172486]  __arm64_sys_execve+0x40/0x54
      [ 1537.176502]  invoke_syscall+0x48/0x114
      [ 1537.180255]  el0_svc_common.constprop.0+0xcc/0xec
      [ 1537.184964]  do_el0_svc+0x2c/0xd0
      [ 1537.188280]  el0_svc+0x2c/0x84
      [ 1537.191340]  el0t_64_sync_handler+0xf4/0x120
      [ 1537.195613]  el0t_64_sync+0x18c/0x190
      [ 1537.199334] Mem-Info:
      [ 1537.201620] active_anon:342 inactive_anon:10343 isolated_anon:0
      [ 1537.201620]  active_file:54 inactive_file:112 isolated_file:0
      [ 1537.201620]  unevictable:0 dirty:0 writeback:0
      [ 1537.201620]  slab_reclaimable:2620 slab_unreclaimable:7076
      [ 1537.201620]  mapped:1489 shmem:2473 pagetables:466
      [ 1537.201620]  sec_pagetables:0 bounce:0
      [ 1537.201620]  kernel_misc_reclaimable:0
      [ 1537.201620]  free:136672 free_pcp:96 free_cma:129241
      [ 1537.240419] Node 0 active_anon:1368kB inactive_anon:41372kB active_file:216kB inactive_file:5052kB unevictable:0kB isolated(anon):0kB isolated(file):0kB s
      [ 1537.271422] Node 0 DMA free:541636kB boost:0kB min:30000kB low:37500kB high:45000kB reserved_highatomic:0KB active_anon:1368kB inactive_anon:41372kB actiB
      [ 1537.300219] lowmem_reserve[]: 0 0 0 0
      [ 1537.303929] Node 0 DMA: 1015*4kB (UMEC) 743*8kB (UMEC) 417*16kB (UMEC) 235*32kB (UMEC) 116*64kB (UMEC) 25*128kB (UMEC) 4*256kB (UC) 2*512kB (UC) 0*1024kBB
      [ 1537.323938] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
      [ 1537.332708] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=32768kB
      [ 1537.341292] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
      [ 1537.349776] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=64kB
      [ 1537.358087] 2939 total pagecache pages
      [ 1537.361876] 0 pages in swap cache
      [ 1537.365229] Free swap  = 0kB
      [ 1537.368147] Total swap = 0kB
      [ 1537.371065] 516096 pages RAM
      [ 1537.373959] 0 pages HighMem/MovableOnly
      [ 1537.377834] 17302 pages reserved
      [ 1537.381103] 163840 pages cma reserved
      [ 1537.384809] 0 pages hwpoisoned
      [ 1537.387902] Tasks state (memory values in pages):
      [ 1537.392652] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
      [ 1537.401356] [    201]   993   201     1130       72    45056        0             0 rpcbind
      [ 1537.409772] [    202]     0   202     4529     1640    77824        0          -250 systemd-journal
      [ 1537.418861] [    222]     0   222     4691      801    69632        0         -1000 systemd-udevd
      [ 1537.427787] [    248]   994   248    20914      130    65536        0             0 systemd-timesyn
      [ 1537.436884] [    497]     0   497      620       31    49152        0             0 atd
      [ 1537.444938] [    500]     0   500      854       77    53248        0             0 crond
      [ 1537.453165] [    503]   997   503     1470      160    49152        0          -900 dbus-daemon
      [ 1537.461908] [    505]     0   505      633       24    40960        0             0 firmwared
      [ 1537.470491] [    513]     0   513     2507      180    61440        0             0 ofonod
      [ 1537.478800] [    514]   990   514    69640      137    81920        0             0 parsec
      [ 1537.487120] [    533]     0   533      599       39    40960        0             0 syslogd
      [ 1537.495518] [    534]     0   534     4546      148    65536        0             0 systemd-logind
      [ 1537.504560] [    535]     0   535      690       24    45056        0             0 tee-supplicant
      [ 1537.513564] [    540]   996   540     2769      168    61440        0             0 systemd-network
      [ 1537.522680] [    566]     0   566     3878      228    77824        0             0 connmand
      [ 1537.531168] [    645]   998   645     1538      133    57344        0             0 avahi-daemon
      [ 1537.540004] [    646]   998   646     1461       64    57344        0             0 avahi-daemon
      [ 1537.548846] [    648]   992   648      781       41    45056        0             0 rpc.statd
      [ 1537.557415] [    650] 64371   650      590       23    45056        0             0 ninfod
      [ 1537.565754] [    653] 61563   653      555       24    45056        0             0 rdisc
      [ 1537.573971] [    655]     0   655   374569     2999   290816        0          -999 containerd
      [ 1537.582621] [    658]     0   658     1311       20    49152        0             0 agetty
      [ 1537.590922] [    663]     0   663     1529       97    49152        0             0 login
      [ 1537.599138] [    666]     0   666     3430      202    69632        0             0 wpa_supplicant
      [ 1537.608147] [    667]     0   667     2344       96    61440        0             0 systemd-userdbd
      [ 1537.617240] [    677]     0   677     2964      314    65536        0           100 systemd
      [ 1537.625651] [    679]     0   679     3720      646    73728        0           100 (sd-pam)
      [ 1537.634138] [    687]     0   687     1289      403    45056        0             0 sh
      [ 1537.642108] [    789]     0   789      970       93    45056        0             0 eth_test2.sh
      [ 1537.650955] [   2355]     0  2355     2346       94    61440        0             0 systemd-userwor
      [ 1537.660046] [   2356]     0  2356     2346       94    61440        0             0 systemd-userwor
      [ 1537.669137] [   2358]     0  2358     2346       95    57344        0             0 systemd-userwor
      [ 1537.678258] [   2379]     0  2379      970       93    45056        0             0 eth_test2.sh
      [ 1537.687098] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-0.slice/user@0.service,tas0
      [ 1537.703009] Out of memory: Killed process 679 ((sd-pam)) total-vm:14880kB, anon-rss:2584kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:72kB oom_score_ad0
      [ 1553.246526] page_pool_release_retry() stalled pool shutdown 310 inflight 241 sec
      
      Fixes: 95698ff6
      
       ("net: fec: using page pool to manage RX buffers")
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Reviewed-by: default avatarshenwei wang <Shenwei.wang@nxp.com>
      Reviewed-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e38553bd
  2. Jan 21, 2023
    • Paolo Abeni's avatar
      net: fix UaF in netns ops registration error path · 71ab9c3e
      Paolo Abeni authored
      If net_assign_generic() fails, the current error path in ops_init() tries
      to clear the gen pointer slot. Anyway, in such error path, the gen pointer
      itself has not been modified yet, and the existing and accessed one is
      smaller than the accessed index, causing an out-of-bounds error:
      
       BUG: KASAN: slab-out-of-bounds in ops_init+0x2de/0x320
       Write of size 8 at addr ffff888109124978 by task modprobe/1018
      
       CPU: 2 PID: 1018 Comm: modprobe Not tainted 6.2.0-rc2.mptcp_ae5ac65fbed5+ #1641
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc37 04/01/2014
       Call Trace:
        <TASK>
        dump_stack_lvl+0x6a/0x9f
        print_address_description.constprop.0+0x86/0x2b5
        print_report+0x11b/0x1fb
        kasan_report+0x87/0xc0
        ops_init+0x2de/0x320
        register_pernet_operations+0x2e4/0x750
        register_pernet_subsys+0x24/0x40
        tcf_register_action+0x9f/0x560
        do_one_initcall+0xf9/0x570
        do_init_module+0x190/0x650
        load_module+0x1fa5/0x23c0
        __do_sys_finit_module+0x10d/0x1b0
        do_syscall_64+0x58/0x80
        entry_SYSCALL_64_after_hwframe+0x72/0xdc
       RIP: 0033:0x7f42518f778d
       Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48
             89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
             ff 73 01 c3 48 8b 0d cb 56 2c 00 f7 d8 64 89 01 48
       RSP: 002b:00007fff96869688 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
       RAX: ffffffffffffffda RBX: 00005568ef7f7c90 RCX: 00007f42518f778d
       RDX: 0000000000000000 RSI: 00005568ef41d796 RDI: 0000000000000003
       RBP: 00005568ef41d796 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000
       R13: 00005568ef7f7d30 R14: 0000000000040000 R15: 0000000000000000
        </TASK>
      
      This change addresses the issue by skipping the gen pointer
      de-reference in the mentioned error-path.
      
      Found by code inspection and verified with explicit error injection
      on a kasan-enabled kernel.
      
      Fixes: d266935a
      
       ("net: fix UAF issue in nfqnl_nf_hook_drop() when ops_init() failed")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/cec4e0f3bb2c77ac03a6154a8508d3930beb5f0f.1674154348.git.pabeni@redhat.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      71ab9c3e
    • Yoshihiro Shimoda's avatar
      net: ethernet: renesas: rswitch: Fix ethernet-ports handling · fd941bd6
      Yoshihiro Shimoda authored
      If one of ports in the ethernet-ports was disabled, this driver
      failed to probe all ports. So, fix it.
      
      Fixes: 3590918b
      
       ("net: ethernet: renesas: Add support for "Ethernet Switch"")
      Signed-off-by: default avatarYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Link: https://lore.kernel.org/r/20230120001959.1059850-1-yoshihiro.shimoda.uh@renesas.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fd941bd6
    • Haiyang Zhang's avatar
      net: mana: Fix IRQ name - add PCI and queue number · 20e3028c
      Haiyang Zhang authored
      The PCI and queue number info is missing in IRQ names.
      
      Add PCI and queue number to IRQ names, to allow CPU affinity
      tuning scripts to work.
      
      Cc: stable@vger.kernel.org
      Fixes: ca9c54d2
      
       ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Reviewed-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Link: https://lore.kernel.org/r/1674161950-19708-1-git-send-email-haiyangz@microsoft.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      20e3028c
    • Eric Dumazet's avatar
      netlink: prevent potential spectre v1 gadgets · f0950402
      Eric Dumazet authored
      Most netlink attributes are parsed and validated from
      __nla_validate_parse() or validate_nla()
      
          u16 type = nla_type(nla);
      
          if (type == 0 || type > maxtype) {
              /* error or continue */
          }
      
      @type is then used as an array index and can be used
      as a Spectre v1 gadget.
      
      array_index_nospec() can be used to prevent leaking
      content of kernel memory to malicious users.
      
      This should take care of vast majority of netlink uses,
      but an audit is needed to take care of others where
      validation is not yet centralized in core netlink functions.
      
      Fixes: bfa83a9e
      
       ("[NETLINK]: Type-safe netlink messages/attributes interface")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20230119110150.2678537-1-edumazet@google.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f0950402
    • Linus Torvalds's avatar
      Merge tag 'net-6.2-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 5deaa985
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from wireless, bluetooth, bpf and netfilter.
      
        Current release - regressions:
      
         - Revert "net: team: use IFF_NO_ADDRCONF flag to prevent ipv6
           addrconf", fix nsna_ping mode of team
      
         - wifi: mt76: fix bugs in Rx queue handling and DMA mapping
      
         - eth: mlx5:
            - add missing mutex_unlock in error reporter
            - protect global IPsec ASO with a lock
      
        Current release - new code bugs:
      
         - rxrpc: fix wrong error return in rxrpc_connect_call()
      
        Previous releases - regressions:
      
         - bluetooth: hci_sync: fix use of HCI_OP_LE_READ_BUFFER_SIZE_V2
      
         - wifi:
            - mac80211: fix crashes on Rx due to incorrect initialization of
              rx->link and rx->link_sta
            - mac80211: fix bugs in iTXQ conversion - Tx stalls, incorrect
              aggregation handling, crashes
            - brcmfmac: fix regression for Broadcom PCIe wifi devices
            - rndis_wlan: prevent buffer overflow in rndis_query_oid
      
         - netfilter: conntrack: handle tcp challenge acks during connection
           reuse
      
         - sched: avoid grafting on htb_destroy_class_offload when destroying
      
         - virtio-net: correctly enable callback during start_xmit, fix stalls
      
         - tcp: avoid the lookup process failing to get sk in ehash table
      
         - ipa: disable ipa interrupt during suspend
      
         - eth: stmmac: enable all safety features by default
      
        Previous releases - always broken:
      
         - bpf:
            - fix pointer-leak due to insufficient speculative store bypass
              mitigation (Spectre v4)
            - skip task with pid=1 in send_signal_common() to avoid a splat
            - fix BPF program ID information in BPF_AUDIT_UNLOAD as well as
              PERF_BPF_EVENT_PROG_UNLOAD events
            - fix potential deadlock in htab_lock_bucket from same bucket
              index but different map_locked index
      
         - bluetooth:
            - fix a buffer overflow in mgmt_mesh_add()
            - hci_qca: fix driver shutdown on closed serdev
            - ISO: fix possible circular locking dependency
            - CIS: hci_event: fix invalid wait context
      
         - wifi: brcmfmac: fixes for survey dump handling
      
         - mptcp: explicitly specify sock family at subflow creation time
      
         - netfilter: nft_payload: incorrect arithmetics when fetching VLAN
           header bits
      
         - tcp: fix rate_app_limited to default to 1
      
         - l2tp: close all race conditions in l2tp_tunnel_register()
      
         - eth: mlx5: fixes for QoS config and eswitch configuration
      
         - eth: enetc: avoid deadlock in enetc_tx_onestep_tstamp()
      
         - eth: stmmac: fix invalid call to mdiobus_get_phy()
      
        Misc:
      
         - ethtool: add netlink attr in rss get reply only if the value is not
           empty"
      
      * tag 'net-6.2-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
        Revert "Merge branch 'octeontx2-af-CPT'"
        tcp: fix rate_app_limited to default to 1
        bnxt: Do not read past the end of test names
        net: stmmac: enable all safety features by default
        octeontx2-af: add mbox to return CPT_AF_FLT_INT info
        octeontx2-af: update cpt lf alloc mailbox
        octeontx2-af: restore rxc conf after teardown sequence
        octeontx2-af: optimize cpt pf identification
        octeontx2-af: modify FLR sequence for CPT
        octeontx2-af: add mbox for CPT LF reset
        octeontx2-af: recover CPT engine when it gets fault
        net: dsa: microchip: ksz9477: port map correction in ALU table entry register
        selftests/net: toeplitz: fix race on tpacket_v3 block close
        net/ulp: use consistent error code when blocking ULP
        octeontx2-pf: Fix the use of GFP_KERNEL in atomic context on rt
        tcp: avoid the lookup process failing to get sk in ehash table
        Revert "net: team: use IFF_NO_ADDRCONF flag to prevent ipv6 addrconf"
        MAINTAINERS: add networking entries for Willem
        net: sched: gred: prevent races when adding offloads to stats
        l2tp: prevent lockdep issue in l2tp_tunnel_register()
        ...
      5deaa985
    • Jakub Kicinski's avatar
      Revert "Merge branch 'octeontx2-af-CPT'" · 45a919bb
      Jakub Kicinski authored
      This reverts commit b4fbf0b2, reversing
      changes made to 6c977c5c
      
      .
      
      This seems like net-next material.
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      45a919bb
  3. Jan 20, 2023
    • David Morley's avatar
      tcp: fix rate_app_limited to default to 1 · 300b655d
      David Morley authored
      The initial default value of 0 for tp->rate_app_limited was incorrect,
      since a flow is indeed application-limited until it first sends
      data. Fixing the default to be 1 is generally correct but also
      specifically will help user-space applications avoid using the initial
      tcpi_delivery_rate value of 0 that persists until the connection has
      some non-zero bandwidth sample.
      
      Fixes: eb8329e0
      
       ("tcp: export data delivery rate")
      Suggested-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid Morley <morleyd@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Tested-by: default avatarDavid Morley <morleyd@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      300b655d
    • Kees Cook's avatar
      bnxt: Do not read past the end of test names · d3e599c0
      Kees Cook authored
      
      
      Test names were being concatenated based on a offset beyond the end of
      the first name, which tripped the buffer overflow detection logic:
      
       detected buffer overflow in strnlen
       [...]
       Call Trace:
       bnxt_ethtool_init.cold+0x18/0x18
      
      Refactor struct hwrm_selftest_qlist_output to use an actual array,
      and adjust the concatenation to use snprintf() rather than a series of
      strncat() calls.
      
      Reported-by: default avatarNiklas Cassel <Niklas.Cassel@wdc.com>
      Link: https://lore.kernel.org/lkml/Y8F%2F1w1AZTvLglFX@x1-carbon/
      Tested-by: default avatarNiklas Cassel <Niklas.Cassel@wdc.com>
      Fixes: eb513658
      
       ("bnxt_en: Add basic ethtool -t selftest support.")
      Cc: Michael Chan <michael.chan@broadcom.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: netdev@vger.kernel.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Reviewed-by: default avatarNiklas Cassel <niklas.cassel@wdc.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d3e599c0
    • Andrew Halaney's avatar
      net: stmmac: enable all safety features by default · fdfc76a1
      Andrew Halaney authored
      In the original implementation of dwmac5
      commit 8bf993a5 ("net: stmmac: Add support for DWMAC5 and implement Safety Features")
      all safety features were enabled by default.
      
      Later it seems some implementations didn't have support for all the
      features, so in
      commit 5ac712dc ("net: stmmac: enable platform specific safety features")
      the safety_feat_cfg structure was added to the callback and defined for
      some platforms to selectively enable these safety features.
      
      The problem is that only certain platforms were given that software
      support. If the automotive safety package bit is set in the hardware
      features register the safety feature callback is called for the platform,
      and for platforms that didn't get a safety_feat_cfg defined this results
      in the following NULL pointer dereference:
      
      [    7.933303] Call trace:
      [    7.935812]  dwmac5_safety_feat_config+0x20/0x170 [stmmac]
      [    7.941455]  __stmmac_open+0x16c/0x474 [stmmac]
      [    7.946117]  stmmac_open+0x38/0x70 [stmmac]
      [    7.950414]  __dev_open+0x100/0x1dc
      [    7.954006]  __dev_change_flags+0x18c/0x204
      [    7.958297]  dev_change_flags+0x24/0x6c
      [    7.962237]  do_setlink+0x2b8/0xfa4
      [    7.965827]  __rtnl_newlink+0x4ec/0x840
      [    7.969766]  rtnl_newlink+0x50/0x80
      [    7.973353]  rtnetlink_rcv_msg+0x12c/0x374
      [    7.977557]  netlink_rcv_skb+0x5c/0x130
      [    7.981500]  rtnetlink_rcv+0x18/0x2c
      [    7.985172]  netlink_unicast+0x2e8/0x340
      [    7.989197]  netlink_sendmsg+0x1a8/0x420
      [    7.993222]  ____sys_sendmsg+0x218/0x280
      [    7.997249]  ___sys_sendmsg+0xac/0x100
      [    8.001103]  __sys_sendmsg+0x84/0xe0
      [    8.004776]  __arm64_sys_sendmsg+0x24/0x30
      [    8.008983]  invoke_syscall+0x48/0x114
      [    8.012840]  el0_svc_common.constprop.0+0xcc/0xec
      [    8.017665]  do_el0_svc+0x38/0xb0
      [    8.021071]  el0_svc+0x2c/0x84
      [    8.024212]  el0t_64_sync_handler+0xf4/0x120
      [    8.028598]  el0t_64_sync+0x190/0x194
      
      Go back to the original behavior, if the automotive safety package
      is found to be supported in hardware enable all the features unless
      safety_feat_cfg is passed in saying this particular platform only
      supports a subset of the features.
      
      Fixes: 5ac712dc
      
       ("net: stmmac: enable platform specific safety features")
      Reported-by: default avatarNing Cai <ncai@quicinc.com>
      Signed-off-by: default avatarAndrew Halaney <ahalaney@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fdfc76a1
    • David S. Miller's avatar
      Merge branch 'octeontx2-af-CPT' · b4fbf0b2
      David S. Miller authored
      
      
      Srujana Challa says:
      
      ====================
      octeontx2-af: Miscellaneous changes for CPT
      
      This patchset consists of miscellaneous changes for CPT.
      - Adds a new mailbox to reset the requested CPT LF.
      - Modify FLR sequence as per HW team suggested.
      - Adds support to recover CPT engines when they gets fault.
      - Updates CPT inbound inline IPsec configuration mailbox,
        as per new generation of the OcteonTX2 chips.
      - Adds a new mailbox to return CPT FLT Interrupt info.
      
      ---
      v2:
      - Addressed a review comment.
      v1:
      - Dropped patch "octeontx2-af: Fix interrupt name strings completely"
        to submit to net.
      ---
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4fbf0b2
    • Srujana Challa's avatar
      octeontx2-af: add mbox to return CPT_AF_FLT_INT info · 8299ffe3
      Srujana Challa authored
      
      
      CPT HW would trigger the CPT AF FLT interrupt when CPT engines
      hits some uncorrectable errors and AF is the one which receives
      the interrupt and recovers the engines.
      This patch adds a mailbox for CPT VFs to request for CPT faulted
      and recovered engines info.
      
      Signed-off-by: default avatarSrujana Challa <schalla@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8299ffe3
    • Srujana Challa's avatar
      octeontx2-af: update cpt lf alloc mailbox · c0688ec0
      Srujana Challa authored
      
      
      The CN10K CPT coprocessor contains a context processor
      to accelerate updates to the IPsec security association
      contexts. The context processor contains a context cache.
      This patch updates CPT LF ALLOC mailbox to config ctx_ilen
      requested by VFs. CPT_LF_ALLOC:ctx_ilen is the size of
      initial context fetch.
      
      Signed-off-by: default avatarSrujana Challa <schalla@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0688ec0
    • Nithin Dabilpuram's avatar
      octeontx2-af: restore rxc conf after teardown sequence · d5b2e0a2
      Nithin Dabilpuram authored
      
      
      CN10K CPT coprocessor includes a component named RXC which
      is responsible for reassembly of inner IP packets. RXC has
      the feature to evict oldest entries based on age/threshold.
      The age/threshold is being set to minimum values to evict
      all entries at the time of teardown.
      This patch adds code to restore timeout and threshold config
      after teardown sequence is complete as it is global config.
      
      Signed-off-by: default avatarNithin Dabilpuram <ndabilpuram@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d5b2e0a2
    • Srujana Challa's avatar
      octeontx2-af: optimize cpt pf identification · 9adb04ff
      Srujana Challa authored
      
      
      Optimize CPT PF identification in mbox handling for faster
      mbox response by doing it at AF driver probe instead of
      every mbox message.
      
      Signed-off-by: default avatarSrujana Challa <schalla@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9adb04ff
    • Srujana Challa's avatar
      octeontx2-af: modify FLR sequence for CPT · 1286c50a
      Srujana Challa authored
      
      
      On OcteonTX2 platform CPT instruction enqueue is only
      possible via LMTST operations.
      The existing FLR sequence mentioned in HRM requires
      a dummy LMTST to CPT but LMTST can't be submitted from
      AF driver. So, HW team provided a new sequence to avoid
      dummy LMTST. This patch adds code for the same.
      
      Signed-off-by: default avatarSrujana Challa <schalla@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1286c50a
    • Srujana Challa's avatar
      octeontx2-af: add mbox for CPT LF reset · f58cf765
      Srujana Challa authored
      
      
      On OcteonTX2 SoC, the admin function (AF) is the only one with all
      priviliges to configure HW and alloc resources, PFs and it's VFs
      have to request AF via mailbox for all their needs.
      This patch adds a new mailbox for CPT VFs to request for CPT LF
      reset.
      
      Signed-off-by: default avatarSrujana Challa <schalla@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f58cf765
    • Srujana Challa's avatar
      octeontx2-af: recover CPT engine when it gets fault · 07ea567d
      Srujana Challa authored
      
      
      When CPT engine has uncorrectable errors, it will get halted and
      must be disabled and re-enabled. This patch adds code for the same.
      
      Signed-off-by: default avatarSrujana Challa <schalla@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07ea567d
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-fixes-for-v6.2-3-2023-01-19' of... · 4a0c7a68
      Linus Torvalds authored
      Merge tag 'perf-tools-fixes-for-v6.2-3-2023-01-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull perf tools fixes from Arnaldo Carvalho de Melo:
      
       - Prevent reading into undefined memory in the expression lexer,
         accounting for a trailer backslash followed by the null byte.
      
       - Fix file mode when copying files to the build id cache, the problem
         happens when the cache directory is in a different file system than
         the file being cached, otherwise the mode was preserved as only a
         hard link would be done to save space.
      
       - Fix a related build-id 'perf test' entry that checked that permission
         when caching PE (Portable Executable) files, used when profiling
         Windows executables under wine.
      
       - Sync the tools/ copies of kvm headers, build_bug.h, socket.h and
         arm64's cputype.h with the kernel sources.
      
      * tag 'perf-tools-fixes-for-v6.2-3-2023-01-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
        perf test build-id: Fix test check for PE file
        perf buildid-cache: Fix the file mode with copyfile() while adding file to build-id cache
        perf expr: Prevent normalize() from reading into undefined memory in the expression lexer
        tools headers: Syncronize linux/build_bug.h with the kernel sources
        perf beauty: Update copy of linux/socket.h with the kernel sources
        tools headers arm64: Sync arm64's cputype.h with the kernel sources
        tools kvm headers arm64: Update KVM header from the kernel sources
        tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
        tools headers UAPI: Sync linux/kvm.h with the kernel sources
      4a0c7a68
    • Linus Torvalds's avatar
      Merge tag 'printk-for-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux · d368967c
      Linus Torvalds authored
      Pull printk fixes from Petr Mladek:
      
       - Prevent a potential deadlock when configuring kgdb console
      
       - Fix a kernel doc warning
      
      * tag 'printk-for-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
        kernel/printk/printk.c: Fix W=1 kernel-doc warning
        tty: serial: kgdboc: fix mutex locking order for configure_kgdboc()
      d368967c
    • Linus Torvalds's avatar
      Merge tag 's390-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · a03df4ec
      Linus Torvalds authored
      Pull s390 build fix from Heiko Carstens:
      
       - Workaround invalid gcc-11 out of bounds read warning caused by s390's
         S390_lowcore definition. This happens only with gcc 11.1.0 and
         11.2.0.
      
         The code which causes this warning will be gone with the next merge
         window. Therefore just replace the memcpy() with a for loop to get
         rid of the warning.
      
      * tag 's390-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390: workaround invalid gcc-11 out of bounds read warning
      a03df4ec
    • Linus Torvalds's avatar
      Merge tag 'slab-for-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab · 46f0cba3
      Linus Torvalds authored
      Pull slab fix from Vlastimil Babka:
       "Just a single fix, since the lkp report originally for a slub-tiny
        commit ended up being a gcov/compiler bug:
      
         - periodically resched in SLAB's drain_freelist(), by David Rientjes"
      
      * tag 'slab-for-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
        mm, slab: periodically resched in drain_freelist()
      46f0cba3
    • Linus Torvalds's avatar
      Merge tag 'zonefs-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs · 081edded
      Linus Torvalds authored
      Pull zonefs fix from Damien Le Moal:
      
       - A single patch to fix sync write operations to detect and handle
         errors due to external zone corruptions resulting in writes at
         invalid location, from me.
      
      * tag 'zonefs-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
        zonefs: Detect append writes at invalid locations
      081edded
    • Rakesh Sankaranarayanan's avatar
      net: dsa: microchip: ksz9477: port map correction in ALU table entry register · 6c977c5c
      Rakesh Sankaranarayanan authored
      ALU table entry 2 register in KSZ9477 have bit positions reserved for
      forwarding port map. This field is referred in ksz9477_fdb_del() for
      clearing forward port map and alu table.
      
      But current fdb_del refer ALU table entry 3 register for accessing forward
      port map. Update ksz9477_fdb_del() to get forward port map from correct
      alu table entry register.
      
      With this bug, issue can be observed while deleting static MAC entries.
      Delete any specific MAC entry using "bridge fdb del" command. This should
      clear all the specified MAC entries. But it is observed that entries with
      self static alone are retained.
      
      Tested on LAN9370 EVB since ksz9477_fdb_del() is used common across
      LAN937x and KSZ series.
      
      Fixes: b987e98e
      
       ("dsa: add DSA switch driver for Microchip KSZ9477")
      Signed-off-by: default avatarRakesh Sankaranarayanan <rakesh.sankaranarayanan@microchip.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Link: https://lore.kernel.org/r/20230118174735.702377-1-rakesh.sankaranarayanan@microchip.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6c977c5c
    • Willem de Bruijn's avatar
      selftests/net: toeplitz: fix race on tpacket_v3 block close · 90384824
      Willem de Bruijn authored
      Avoid race between process wakeup and tpacket_v3 block timeout.
      
      The test waits for cfg_timeout_msec for packets to arrive. Packets
      arrive in tpacket_v3 rings, which pass packets ("frames") to the
      process in batches ("blocks"). The sk waits for req3.tp_retire_blk_tov
      msec to release a block.
      
      Set the block timeout lower than the process waiting time, else
      the process may find that no block has been released by the time it
      scans the socket list. Convert to a ring of more than one, smaller,
      blocks with shorter timeouts. Blocks must be page aligned, so >= 64KB.
      
      Fixes: 5ebfb4cc
      
       ("selftests/net: toeplitz test")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/20230118151847.4124260-1-willemdebruijn.kernel@gmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      90384824
    • Paolo Abeni's avatar
      net/ulp: use consistent error code when blocking ULP · 8ccc9936
      Paolo Abeni authored
      
      
      The referenced commit changed the error code returned by the kernel
      when preventing a non-established socket from attaching the ktls
      ULP. Before to such a commit, the user-space got ENOTCONN instead
      of EINVAL.
      
      The existing self-tests depend on such error code, and the change
      caused a failure:
      
        RUN           global.non_established ...
       tls.c:1673:non_established:Expected errno (22) == ENOTCONN (107)
       non_established: Test failed at step #3
                FAIL  global.non_established
      
      In the unlikely event existing applications do the same, address
      the issue by restoring the prior error code in the above scenario.
      
      Note that the only other ULP performing similar checks at init
      time - smc_ulp_ops - also fails with ENOTCONN when trying to attach
      the ULP to a non-established socket.
      
      Reported-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Fixes: 2c02d41d
      
       ("net/ulp: prevent ULP without clone op from entering the LISTEN status")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Link: https://lore.kernel.org/r/7bb199e7a93317fb6f8bf8b9b2dc71c18f337cde.1674042685.git.pabeni@redhat.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8ccc9936
  4. Jan 19, 2023
    • Paolo Abeni's avatar
      Merge tag 'mlx5-fixes-2023-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 5c312574
      Paolo Abeni authored
      
      
      Saeed Mahameed says:
      
      ====================
      
      This series provides bug fixes to mlx5 driver.
      
      * tag 'mlx5-fixes-2023-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
        net: mlx5: eliminate anonymous module_init & module_exit
        net/mlx5: E-switch, Fix switchdev mode after devlink reload
        net/mlx5e: Protect global IPsec ASO
        net/mlx5e: Remove optimization which prevented update of ESN state
        net/mlx5e: Set decap action based on attr for sample
        net/mlx5e: QoS, Fix wrongfully setting parent_element_id on MODIFY_SCHEDULING_ELEMENT
        net/mlx5: E-switch, Fix setting of reserved fields on MODIFY_SCHEDULING_ELEMENT
        net/mlx5e: Remove redundant xsk pointer check in mlx5e_mpwrq_validate_xsk
        net/mlx5e: Avoid false lock dependency warning on tc_ht even more
        net/mlx5: fix missing mutex_unlock in mlx5_fw_fatal_reporter_err_work()
      ====================
      
      Link: https://lore.kernel.org/r/
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      5c312574
    • Petr Mladek's avatar
      21493c6e
    • Kevin Hao's avatar
      octeontx2-pf: Fix the use of GFP_KERNEL in atomic context on rt · 55ba18dc
      Kevin Hao authored
      The commit 4af1b64f ("octeontx2-pf: Fix lmtst ID used in aura
      free") uses the get/put_cpu() to protect the usage of percpu pointer
      in ->aura_freeptr() callback, but it also unnecessarily disable the
      preemption for the blockable memory allocation. The commit 87b93b67
      ("octeontx2-pf: Avoid use of GFP_KERNEL in atomic context") tried to
      fix these sleep inside atomic warnings. But it only fix the one for
      the non-rt kernel. For the rt kernel, we still get the similar warnings
      like below.
        BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
        in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
        preempt_count: 1, expected: 0
        RCU nest depth: 0, expected: 0
        3 locks held by swapper/0/1:
         #0: ffff800009fc5fe8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x24/0x30
         #1: ffff000100c276c0 (&mbox->lock){+.+.}-{3:3}, at: otx2_init_hw_resources+0x8c/0x3a4
         #2: ffffffbfef6537e0 (&cpu_rcache->lock){+.+.}-{2:2}, at: alloc_iova_fast+0x1ac/0x2ac
        Preemption disabled at:
        [<ffff800008b1908c>] otx2_rq_aura_pool_init+0x14c/0x284
        CPU: 20 PID: 1 Comm: swapper/0 Tainted: G        W          6.2.0-rc3-rt1-yocto-preempt-rt #1
        Hardware name: Marvell OcteonTX CN96XX board (DT)
        Call trace:
         dump_backtrace.part.0+0xe8/0xf4
         show_stack+0x20/0x30
         dump_stack_lvl+0x9c/0xd8
         dump_stack+0x18/0x34
         __might_resched+0x188/0x224
         rt_spin_lock+0x64/0x110
         alloc_iova_fast+0x1ac/0x2ac
         iommu_dma_alloc_iova+0xd4/0x110
         __iommu_dma_map+0x80/0x144
         iommu_dma_map_page+0xe8/0x260
         dma_map_page_attrs+0xb4/0xc0
         __otx2_alloc_rbuf+0x90/0x150
         otx2_rq_aura_pool_init+0x1c8/0x284
         otx2_init_hw_resources+0xe4/0x3a4
         otx2_open+0xf0/0x610
         __dev_open+0x104/0x224
         __dev_change_flags+0x1e4/0x274
         dev_change_flags+0x2c/0x7c
         ic_open_devs+0x124/0x2f8
         ip_auto_config+0x180/0x42c
         do_one_initcall+0x90/0x4dc
         do_basic_setup+0x10c/0x14c
         kernel_init_freeable+0x10c/0x13c
         kernel_init+0x2c/0x140
         ret_from_fork+0x10/0x20
      
      Of course, we can shuffle the get/put_cpu() to only wrap the invocation
      of ->aura_freeptr() as what commit 87b93b67 does. But there are only
      two ->aura_freeptr() callbacks, otx2_aura_freeptr() and
      cn10k_aura_freeptr(). There is no usage of perpcu variable in the
      otx2_aura_freeptr() at all, so the get/put_cpu() seems redundant to it.
      We can move the get/put_cpu() into the corresponding callback which
      really has the percpu variable usage and avoid the sprinkling of
      get/put_cpu() in several places.
      
      Fixes: 4af1b64f
      
       ("octeontx2-pf: Fix lmtst ID used in aura free")
      Signed-off-by: default avatarKevin Hao <haokexin@gmail.com>
      Link: https://lore.kernel.org/r/20230118071300.3271125-1-haokexin@gmail.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      55ba18dc
    • Jason Xing's avatar
      tcp: avoid the lookup process failing to get sk in ehash table · 3f4ca5fa
      Jason Xing authored
      While one cpu is working on looking up the right socket from ehash
      table, another cpu is done deleting the request socket and is about
      to add (or is adding) the big socket from the table. It means that
      we could miss both of them, even though it has little chance.
      
      Let me draw a call trace map of the server side.
         CPU 0                           CPU 1
         -----                           -----
      tcp_v4_rcv()                  syn_recv_sock()
                                  inet_ehash_insert()
                                  -> sk_nulls_del_node_init_rcu(osk)
      __inet_lookup_established()
                                  -> __sk_nulls_add_node_rcu(sk, list)
      
      Notice that the CPU 0 is receiving the data after the final ack
      during 3-way shakehands and CPU 1 is still handling the final ack.
      
      Why could this be a real problem?
      This case is happening only when the final ack and the first data
      receiving by different CPUs. Then the server receiving data with
      ACK flag tries to search one proper established socket from ehash
      table, but apparently it fails as my map shows above. After that,
      the server fetches a listener socket and then sends a RST because
      it finds a ACK flag in the skb (data), which obeys RST definition
      in RFC 793.
      
      Besides, Eric pointed out there's one more race condition where it
      handles tw socket hashdance. Only by adding to the tail of the list
      before deleting the old one can we avoid the race if the reader has
      already begun the bucket traversal and it would possibly miss the head.
      
      Many thanks to Eric for great help from beginning to end.
      
      Fixes: 5e0724d0
      
       ("tcp/dccp: fix hashdance race for passive sessions")
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarJason Xing <kernelxing@tencent.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/lkml/20230112065336.41034-1-kerneljasonxing@gmail.com/
      Link: https://lore.kernel.org/r/20230118015941.1313-1-kerneljasonxing@gmail.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      3f4ca5fa
    • Xin Long's avatar
      Revert "net: team: use IFF_NO_ADDRCONF flag to prevent ipv6 addrconf" · 4fb58ac3
      Xin Long authored
      This reverts commit 0aa64df3.
      
      Currently IFF_NO_ADDRCONF is used to prevent all ipv6 addrconf for the
      slave ports of team, bonding and failover devices and it means no ipv6
      packets can be sent out through these slave ports. However, for team
      device, "nsna_ping" link_watch requires ipv6 addrconf. Otherwise, the
      link will be marked failure. This patch removes the IFF_NO_ADDRCONF
      flag set for team port, and we will fix the original issue in another
      patch, as Jakub suggested.
      
      Fixes: 0aa64df3
      
       ("net: team: use IFF_NO_ADDRCONF flag to prevent ipv6 addrconf")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Link: https://lore.kernel.org/r/63e09531fc47963d2e4eff376653d3db21b97058.1673980932.git.lucien.xin@gmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4fb58ac3
    • Jakub Kicinski's avatar
      MAINTAINERS: add networking entries for Willem · e0be11a8
      Jakub Kicinski authored
      
      
      We often have to ping Willem asking for reviews of patches
      because he doesn't get included in the CC list. Add MAINTAINERS
      entries for some of the areas he covers so that ./scripts/ will
      know to add him.
      
      Acked-by: default avatarWillem de Bruijn <willemdebruijn.kernel@gmail.com>
      Link: https://lore.kernel.org/r/20230117190141.60795-1-kuba@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e0be11a8
    • Jakub Kicinski's avatar
      net: sched: gred: prevent races when adding offloads to stats · 339346d4
      Jakub Kicinski authored
      
      
      Naresh reports seeing a warning that gred is calling
      u64_stats_update_begin() with preemption enabled.
      Arnd points out it's coming from _bstats_update().
      
      We should be holding the qdisc lock when writing
      to stats, they are also updated from the datapath.
      
      Reported-by: default avatarLinux Kernel Functional Testing <lkft@linaro.org>
      Link: https://lore.kernel.org/all/CA+G9fYsTr9_r893+62u6UGD3dVaCE-kN9C-Apmb2m=hxjc1Cqg@mail.gmail.com/
      Fixes: e49efd52
      
       ("net: sched: gred: support reporting stats from offloads")
      Link: https://lore.kernel.org/r/20230113044137.1383067-1-kuba@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      339346d4
    • Jakub Kicinski's avatar
      Merge tag 'wireless-2023-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless · edb5b63e
      Jakub Kicinski authored
      
      
      Kalle Valo says:
      
      ====================
      wireless fixes for v6.2
      
      Third set of fixes for v6.2. This time most of them are for drivers,
      only one revert for mac80211. For an important mt76 fix we had to
      cherry pick two commits from wireless-next.
      
      * tag 'wireless-2023-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
        Revert "wifi: mac80211: fix memory leak in ieee80211_if_add()"
        wifi: mt76: dma: fix a regression in adding rx buffers
        wifi: mt76: handle possible mt76_rx_token_consume failures
        wifi: mt76: dma: do not increment queue head if mt76_dma_add_buf fails
        wifi: rndis_wlan: Prevent buffer overflow in rndis_query_oid
        wifi: brcmfmac: fix regression for Broadcom PCIe wifi devices
        wifi: brcmfmac: avoid NULL-deref in survey dump for 2G only device
        wifi: brcmfmac: avoid handling disabled channels for survey dump
      ====================
      
      Link: https://lore.kernel.org/r/20230118073749.AF061C433EF@smtp.kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      edb5b63e
    • Linus Torvalds's avatar
      Merge tag 'for-linus-2023011801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid · 7287904c
      Linus Torvalds authored
      Pull HID fixes from Jiri Kosina:
      
       - fixes for potential empty list handling in HID core (Pietro Borrello)
      
       - fix for NULL pointer dereference in betop driver that could be
         triggered by malicious device (Pietro Borrello)
      
       - fixes for handling calibration data preventing division by zero in
         Playstation driver (Roderick Colenbrander)
      
       - fix for memory leak on error path in amd-sfh driver (Basavaraj
         Natikar)
      
       - other few assorted small fixes and device ID-specific handling
      
      * tag 'for-linus-2023011801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
        HID: betop: check shape of output reports
        HID: playstation: sanity check DualSense calibration data.
        HID: playstation: sanity check DualShock4 calibration data.
        HID: uclogic: Add support for XP-PEN Deco 01 V2
        HID: revert CHERRY_MOUSE_000C quirk
        HID: check empty report_list in bigben_probe()
        HID: check empty report_list in hid_validate_values()
        HID: amd_sfh: Fix warning unwind goto
        HID: intel_ish-hid: Add check for ishtp_dma_tx_map
      7287904c
    • Linus Torvalds's avatar
      Merge tag 'affs-for-6.2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 7026172b
      Linus Torvalds authored
      Pull affs fix from David Sterba:
       "One minor fix for a KCSAN report"
      
      * tag 'affs-for-6.2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        affs: initialize fsdata in affs_truncate()
      7026172b
    • Linus Torvalds's avatar
      Merge tag 'erofs-for-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs · 5fbad44d
      Linus Torvalds authored
      Pull erofs fixes from Gao Xiang:
       "Two patches fixes issues reported by syzbot, one fixes a missing
        `domain_id` mount option in documentation and a minor cleanup:
      
         - Fix wrong iomap->length calculation post EOF, which could cause a
           WARN_ON in iomap_iter_done() (Siddh)
      
         - Fix improper kvcalloc() use with __GFP_NOFAIL (me)
      
         - Add missing `domain_id` mount option in documentation (Jingbo)
      
         - Clean up fscache option parsing (Jingbo)"
      
      * tag 'erofs-for-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
        erofs: clean up parsing of fscache related options
        erofs: add documentation for 'domain_id' mount option
        erofs: fix kvcalloc() misuse with __GFP_NOFAIL
        erofs/zmap.c: Fix incorrect offset calculation
      5fbad44d
    • Linus Torvalds's avatar
      Merge tag 'loongarch-fixes-6.2-1' of... · 84bd7e08
      Linus Torvalds authored
      Merge tag 'loongarch-fixes-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
      
      Pull LoongArch fixes from Huacai Chen:
       "Fix a missing elf_hwcap, fix some stack unwinder bugs and two trivial
        cleanups"
      
      * tag 'loongarch-fixes-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
        LoongArch: Add generic ex-handler unwind in prologue unwinder
        LoongArch: Strip guess unwinder out from prologue unwinder
        LoongArch: Use correct sp value to get graph addr in stack unwinders
        LoongArch: Get frame info in unwind_start() when regs is not available
        LoongArch: Adjust PC value when unwind next frame in unwinder
        LoongArch: Simplify larch_insn_gen_xxx implementation
        LoongArch: Use common function sign_extend64()
        LoongArch: Add HWCAP_LOONGARCH_CPUCFG to elf_hwcap
      84bd7e08