Skip to content
  1. Feb 01, 2022
  2. Jan 31, 2022
    • Wen Gu's avatar
      net/smc: Forward wakeup to smc socket waitqueue after fallback · 341adeec
      Wen Gu authored
      When we replace TCP with SMC and a fallback occurs, there may be
      some socket waitqueue entries remaining in smc socket->wq, such
      as eppoll_entries inserted by userspace applications.
      
      After the fallback, data flows over TCP/IP and only clcsocket->wq
      will be woken up. Applications can't be notified by the entries
      which were inserted in smc socket->wq before fallback. So we need
      a mechanism to wake up smc socket->wq at the same time if some
      entries remaining in it.
      
      The current workaround is to transfer the entries from smc socket->wq
      to clcsock->wq during the fallback. But this may cause a crash
      like this:
      
       general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] PREEMPT SMP PTI
       CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G E     5.16.0+ #107
       RIP: 0010:__wake_up_common+0x65/0x170
       Call Trace:
        <IRQ>
        __wake_up_common_lock+0x7a/0xc0
        sock_def_readable+0x3c/0x70
        tcp_data_queue+0x4a7/0xc40
        tcp_rcv_established+0x32f/0x660
        ? sk_filter_trim_cap+0xcb/0x2e0
        tcp_v4_do_rcv+0x10b/0x260
        tcp_v4_rcv+0xd2a/0xde0
        ip_protocol_deliver_rcu+0x3b/0x1d0
        ip_local_deliver_finish+0x54/0x60
        ip_local_deliver+0x6a/0x110
        ? tcp_v4_early_demux+0xa2/0x140
        ? tcp_v4_early_demux+0x10d/0x140
        ip_sublist_rcv_finish+0x49/0x60
        ip_sublist_rcv+0x19d/0x230
        ip_list_rcv+0x13e/0x170
        __netif_receive_skb_list_core+0x1c2/0x240
        netif_receive_skb_list_internal+0x1e6/0x320
        napi_complete_done+0x11d/0x190
        mlx5e_napi_poll+0x163/0x6b0 [mlx5_core]
        __napi_poll+0x3c/0x1b0
        net_rx_action+0x27c/0x300
        __do_softirq+0x114/0x2d2
        irq_exit_rcu+0xb4/0xe0
        common_interrupt+0xba/0xe0
        </IRQ>
        <TASK>
      
      The crash is caused by privately transferring waitqueue entries from
      smc socket->wq to clcsock->wq. The owners of these entries, such as
      epoll, have no idea that the entries have been transferred to a
      different socket wait queue and still use original waitqueue spinlock
      (smc socket->wq.wait.lock) to make the entries operation exclusive,
      but it doesn't work. The operations to the entries, such as removing
      from the waitqueue (now is clcsock->wq after fallback), may cause a
      crash when clcsock waitqueue is being iterated over at the moment.
      
      This patch tries to fix this by no longer transferring wait queue
      entries privately, but introducing own implementations of clcsock's
      callback functions in fallback situation. The callback functions will
      forward the wakeup to smc socket->wq if clcsock->wq is actually woken
      up and smc socket->wq has remaining entries.
      
      Fixes: 2153bd1e
      
       ("net/smc: Transfer remaining wait queue entries during fallback")
      Suggested-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Acked-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      341adeec
  3. Jan 28, 2022
    • Jisheng Zhang's avatar
      net: stmmac: properly handle with runtime pm in stmmac_dvr_remove() · 64495203
      Jisheng Zhang authored
      There are two issues with runtime pm handling in stmmac_dvr_remove():
      
      1. the mac is runtime suspended before stopping dma and rx/tx. We
      need to ensure the device is properly resumed back.
      
      2. the stmmaceth clk enable/disable isn't balanced in both exit and
      error handling code path. Take the exit code path for example, when we
      unbind the driver or rmmod the driver module, the mac is runtime
      suspended as said above, so the stmmaceth clk is disabled, but
      	stmmac_dvr_remove()
      	  stmmac_remove_config_dt()
      	    clk_disable_unprepare()
      CCF will complain this time. The error handling code path suffers
      from the similar situtaion.
      
      Here are kernel warnings in error handling code path on Allwinner D1
      platform:
      
      [    1.604695] ------------[ cut here ]------------
      [    1.609328] bus-emac already disabled
      [    1.613015] WARNING: CPU: 0 PID: 38 at drivers/clk/clk.c:952 clk_core_disable+0xcc/0xec
      [    1.621039] CPU: 0 PID: 38 Comm: kworker/u2:1 Not tainted 5.14.0-rc4#1
      [    1.627653] Hardware name: Allwinner D1 NeZha (DT)
      [    1.632443] Workqueue: events_unbound deferred_probe_work_func
      [    1.638286] epc : clk_core_disable+0xcc/0xec
      [    1.642561]  ra : clk_core_disable+0xcc/0xec
      [    1.646835] epc : ffffffff8023c2ec ra : ffffffff8023c2ec sp : ffffffd00411bb10
      [    1.654054]  gp : ffffffff80ec9988 tp : ffffffe00143a800 t0 : ffffffff80ed6a6f
      [    1.661272]  t1 : ffffffff80ed6a60 t2 : 0000000000000000 s0 : ffffffe001509e00
      [    1.668489]  s1 : 0000000000000001 a0 : 0000000000000019 a1 : ffffffff80e80bd8
      [    1.675707]  a2 : 00000000ffffefff a3 : 00000000000000f4 a4 : 0000000000000002
      [    1.682924]  a5 : 0000000000000001 a6 : 0000000000000030 a7 : 00000000028f5c29
      [    1.690141]  s2 : 0000000000000800 s3 : ffffffe001375000 s4 : ffffffe01fdf7a80
      [    1.697358]  s5 : ffffffe001375010 s6 : ffffffff8001fc10 s7 : ffffffffffffffff
      [    1.704577]  s8 : 0000000000000001 s9 : ffffffff80ecb248 s10: ffffffe001b80000
      [    1.711794]  s11: ffffffe001b80760 t3 : 0000000000000062 t4 : ffffffffffffffff
      [    1.719012]  t5 : ffffffff80e0f6d8 t6 : ffffffd00411b8f0
      [    1.724321] status: 8000000201800100 badaddr: 0000000000000000 cause: 0000000000000003
      [    1.732233] [<ffffffff8023c2ec>] clk_core_disable+0xcc/0xec
      [    1.737810] [<ffffffff80240430>] clk_disable+0x38/0x78
      [    1.742956] [<ffffffff8001fc0c>] worker_thread+0x1a8/0x4d8
      [    1.748451] [<ffffffff8031a500>] stmmac_remove_config_dt+0x1c/0x4c
      [    1.754646] [<ffffffff8031c8ec>] sun8i_dwmac_probe+0x378/0x82c
      [    1.760484] [<ffffffff8001fc0c>] worker_thread+0x1a8/0x4d8
      [    1.765975] [<ffffffff8029a6c8>] platform_probe+0x64/0xf0
      [    1.771382] [<ffffffff8029833c>] really_probe.part.0+0x8c/0x30c
      [    1.777305] [<ffffffff8029865c>] __driver_probe_device+0xa0/0x148
      [    1.783402] [<ffffffff8029873c>] driver_probe_device+0x38/0x138
      [    1.789324] [<ffffffff802989cc>] __device_attach_driver+0xd0/0x170
      [    1.795508] [<ffffffff802988f8>] __driver_attach_async_helper+0xbc/0xc0
      [    1.802125] [<ffffffff802965ac>] bus_for_each_drv+0x68/0xb4
      [    1.807701] [<ffffffff80298d1c>] __device_attach+0xd8/0x184
      [    1.813277] [<ffffffff802967b0>] bus_probe_device+0x98/0xbc
      [    1.818852] [<ffffffff80297904>] deferred_probe_work_func+0x90/0xd4
      [    1.825122] [<ffffffff8001f8b8>] process_one_work+0x1e4/0x390
      [    1.830872] [<ffffffff8001fd80>] worker_thread+0x31c/0x4d8
      [    1.836362] [<ffffffff80026bf4>] kthreadd+0x94/0x188
      [    1.841335] [<ffffffff80026bf4>] kthreadd+0x94/0x188
      [    1.846304] [<ffffffff8001fa60>] process_one_work+0x38c/0x390
      [    1.852054] [<ffffffff80026564>] kthread+0x124/0x160
      [    1.857021] [<ffffffff8002643c>] set_kthread_struct+0x5c/0x60
      [    1.862770] [<ffffffff80001f08>] ret_from_syscall_rejected+0x8/0xc
      [    1.868956] ---[ end trace 8d5c6046255f84a0 ]---
      [    1.873675] ------------[ cut here ]------------
      [    1.878366] bus-emac already unprepared
      [    1.882378] WARNING: CPU: 0 PID: 38 at drivers/clk/clk.c:810 clk_core_unprepare+0xe4/0x168
      [    1.890673] CPU: 0 PID: 38 Comm: kworker/u2:1 Tainted: G        W	5.14.0-rc4 #1
      [    1.898674] Hardware name: Allwinner D1 NeZha (DT)
      [    1.903464] Workqueue: events_unbound deferred_probe_work_func
      [    1.909305] epc : clk_core_unprepare+0xe4/0x168
      [    1.913840]  ra : clk_core_unprepare+0xe4/0x168
      [    1.918375] epc : ffffffff8023d6cc ra : ffffffff8023d6cc sp : ffffffd00411bb10
      [    1.925593]  gp : ffffffff80ec9988 tp : ffffffe00143a800 t0 : 0000000000000002
      [    1.932811]  t1 : ffffffe01f743be0 t2 : 0000000000000040 s0 : ffffffe001509e00
      [    1.940029]  s1 : 0000000000000001 a0 : 000000000000001b a1 : ffffffe00143a800
      [    1.947246]  a2 : 0000000000000000 a3 : 00000000000000f4 a4 : 0000000000000001
      [    1.954463]  a5 : 0000000000000000 a6 : 0000000005fce2a5 a7 : 0000000000000001
      [    1.961680]  s2 : 0000000000000800 s3 : ffffffff80afeb90 s4 : ffffffe01fdf7a80
      [    1.968898]  s5 : ffffffe001375010 s6 : ffffffff8001fc10 s7 : ffffffffffffffff
      [    1.976115]  s8 : 0000000000000001 s9 : ffffffff80ecb248 s10: ffffffe001b80000
      [    1.983333]  s11: ffffffe001b80760 t3 : ffffffff80b39120 t4 : 0000000000000001
      [    1.990550]  t5 : 0000000000000000 t6 : ffffffe001600002
      [    1.995859] status: 8000000201800120 badaddr: 0000000000000000 cause: 0000000000000003
      [    2.003771] [<ffffffff8023d6cc>] clk_core_unprepare+0xe4/0x168
      [    2.009609] [<ffffffff802403a0>] clk_unprepare+0x24/0x3c
      [    2.014929] [<ffffffff8031a508>] stmmac_remove_config_dt+0x24/0x4c
      [    2.021125] [<ffffffff8031c8ec>] sun8i_dwmac_probe+0x378/0x82c
      [    2.026965] [<ffffffff8001fc0c>] worker_thread+0x1a8/0x4d8
      [    2.032463] [<ffffffff8029a6c8>] platform_probe+0x64/0xf0
      [    2.037871] [<ffffffff8029833c>] really_probe.part.0+0x8c/0x30c
      [    2.043795] [<ffffffff8029865c>] __driver_probe_device+0xa0/0x148
      [    2.049892] [<ffffffff8029873c>] driver_probe_device+0x38/0x138
      [    2.055815] [<ffffffff802989cc>] __device_attach_driver+0xd0/0x170
      [    2.061999] [<ffffffff802988f8>] __driver_attach_async_helper+0xbc/0xc0
      [    2.068616] [<ffffffff802965ac>] bus_for_each_drv+0x68/0xb4
      [    2.074193] [<ffffffff80298d1c>] __device_attach+0xd8/0x184
      [    2.079769] [<ffffffff802967b0>] bus_probe_device+0x98/0xbc
      [    2.085345] [<ffffffff80297904>] deferred_probe_work_func+0x90/0xd4
      [    2.091616] [<ffffffff8001f8b8>] process_one_work+0x1e4/0x390
      [    2.097367] [<ffffffff8001fd80>] worker_thread+0x31c/0x4d8
      [    2.102858] [<ffffffff80026bf4>] kthreadd+0x94/0x188
      [    2.107830] [<ffffffff80026bf4>] kthreadd+0x94/0x188
      [    2.112800] [<ffffffff8001fa60>] process_one_work+0x38c/0x390
      [    2.118551] [<ffffffff80026564>] kthread+0x124/0x160
      [    2.123520] [<ffffffff8002643c>] set_kthread_struct+0x5c/0x60
      [    2.129268] [<ffffffff80001f08>] ret_from_syscall_rejected+0x8/0xc
      [    2.135455] ---[ end trace 8d5c6046255f84a1 ]---
      
      Fixes: 5ec55823
      
       ("net: stmmac: add clocks management for gmac driver")
      Signed-off-by: default avatarJisheng Zhang <jszhang@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      64495203
    • David S. Miller's avatar
      Merge tag 'ieee802154-for-net-2022-01-28' of... · 010a2a66
      David S. Miller authored
      Merge tag 'ieee802154-for-net-2022-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan
      
      
      
      Stefan Schmidt says:
      
      ====================
      pull-request: ieee802154 for net 2022-01-28
      
      An update from ieee802154 for your *net* tree.
      
      A bunch of fixes in drivers, all from Miquel Raynal.
      Clarifying the default channel in hwsim, leak fixes in at86rf230 and ca8210 as
      well as a symbol duration fix for mcr20a. Topping up the driver fixes with
      better error codes in nl802154 and a cleanup in MAINTAINERS for an orphaned
      driver.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      010a2a66
    • Haiyue Wang's avatar
      gve: fix the wrong AdminQ buffer queue index check · 1f84a945
      Haiyue Wang authored
      The 'tail' and 'head' are 'unsigned int' type free-running count, when
      'head' is overflow, the 'int i (= tail) < u32 head' will be false:
      
      Only '- loop 0: idx = 63' result is shown, so it needs to use 'int' type
      to compare, it can handle the overflow correctly.
      
      typedef uint32_t u32;
      
      int main()
      {
              u32 tail, head;
              int stail, shead;
              int i, loop;
      
              tail = 0xffffffff;
              head = 0x00000000;
      
              for (i = tail, loop = 0; i < head; i++) {
                      unsigned int idx = i & 63;
      
                      printf("+ loop %d: idx = %u\n", loop++, idx);
              }
      
              stail = tail;
              shead = head;
              for (i = stail, loop = 0; i < shead; i++) {
                      unsigned int idx = i & 63;
      
                      printf("- loop %d: idx = %u\n", loop++, idx);
              }
      
              return 0;
      }
      
      Fixes: 5cdad90d
      
       ("gve: Batch AQ commands for creating and destroying queues.")
      Signed-off-by: default avatarHaiyue Wang <haiyue.wang@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f84a945
    • David S. Miller's avatar
      Merge branch 'ax25-fixes' · 501c8f5e
      David S. Miller authored
      
      
      Duoming Zhou says:
      
      ====================
      ax25: fix NPD and UAF bugs when detaching ax25 device
      
      There are NPD and UAF bugs when detaching ax25 device, we
      use lock and refcount to mitigate these bugs.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      501c8f5e
    • Duoming Zhou's avatar
      ax25: add refcount in ax25_dev to avoid UAF bugs · d01ffb9e
      Duoming Zhou authored
      
      
      If we dereference ax25_dev after we call kfree(ax25_dev) in
      ax25_dev_device_down(), it will lead to concurrency UAF bugs.
      There are eight syscall functions suffer from UAF bugs, include
      ax25_bind(), ax25_release(), ax25_connect(), ax25_ioctl(),
      ax25_getname(), ax25_sendmsg(), ax25_getsockopt() and
      ax25_info_show().
      
      One of the concurrency UAF can be shown as below:
      
        (USE)                       |    (FREE)
                                    |  ax25_device_event
                                    |    ax25_dev_device_down
      ax25_bind                     |    ...
        ...                         |      kfree(ax25_dev)
        ax25_fillin_cb()            |    ...
          ax25_fillin_cb_from_dev() |
        ...                         |
      
      The root cause of UAF bugs is that kfree(ax25_dev) in
      ax25_dev_device_down() is not protected by any locks.
      When ax25_dev, which there are still pointers point to,
      is released, the concurrency UAF bug will happen.
      
      This patch introduces refcount into ax25_dev in order to
      guarantee that there are no pointers point to it when ax25_dev
      is released.
      
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d01ffb9e
    • Duoming Zhou's avatar
      ax25: improve the incomplete fix to avoid UAF and NPD bugs · 4e0f718d
      Duoming Zhou authored
      The previous commit 1ade48d0
      
       ("ax25: NPD bug when detaching
      AX25 device") introduce lock_sock() into ax25_kill_by_device to
      prevent NPD bug. But the concurrency NPD or UAF bug will occur,
      when lock_sock() or release_sock() dereferences the ax25_cb->sock.
      
      The NULL pointer dereference bug can be shown as below:
      
      ax25_kill_by_device()        | ax25_release()
                                   |   ax25_destroy_socket()
                                   |     ax25_cb_del()
        ...                        |     ...
                                   |     ax25->sk=NULL;
        lock_sock(s->sk); //(1)    |
        s->ax25_dev = NULL;        |     ...
        release_sock(s->sk); //(2) |
        ...                        |
      
      The root cause is that the sock is set to null before dereference
      site (1) or (2). Therefore, this patch extracts the ax25_cb->sock
      in advance, and uses ax25_list_lock to protect it, which can synchronize
      with ax25_cb_del() and ensure the value of sock is not null before
      dereference sites.
      
      The concurrency UAF bug can be shown as below:
      
      ax25_kill_by_device()        | ax25_release()
                                   |   ax25_destroy_socket()
        ...                        |   ...
                                   |   sock_put(sk); //FREE
        lock_sock(s->sk); //(1)    |
        s->ax25_dev = NULL;        |   ...
        release_sock(s->sk); //(2) |
        ...                        |
      
      The root cause is that the sock is released before dereference
      site (1) or (2). Therefore, this patch uses sock_hold() to increase
      the refcount of sock and uses ax25_list_lock to protect it, which
      can synchronize with ax25_cb_del() in ax25_destroy_socket() and
      ensure the sock wil not be released before dereference sites.
      
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e0f718d
    • Yuji Ishikawa's avatar
      net: stmmac: dwmac-visconti: No change to ETHER_CLOCK_SEL for unexpected speed request. · 928d6fe9
      Yuji Ishikawa authored
      
      
      Variable clk_sel_val is not initialized in the default case of the first switch statement.
      In that case, the function should return immediately without any changes to the hardware.
      
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Fixes: b38dd98f
      
       ("net: stmmac: Add Toshiba Visconti SoCs glue driver")
      Signed-off-by: default avatarYuji Ishikawa <yuji2.ishikawa@toshiba.co.jp>
      Reviewed-by: default avatarNobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      928d6fe9
    • Raju Rangoju's avatar
      net: amd-xgbe: ensure to reset the tx_timer_active flag · 7674b7b5
      Raju Rangoju authored
      Ensure to reset the tx_timer_active flag in xgbe_stop(),
      otherwise a port restart may result in tx timeout due to
      uncleared flag.
      
      Fixes: c635eaac
      
       ("amd-xgbe: Remove Tx coalescing")
      Co-developed-by: default avatarSudheesh Mavila <sudheesh.mavila@amd.com>
      Signed-off-by: default avatarSudheesh Mavila <sudheesh.mavila@amd.com>
      Signed-off-by: default avatarRaju Rangoju <Raju.Rangoju@amd.com>
      Acked-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Link: https://lore.kernel.org/r/20220127060222.453371-1-Raju.Rangoju@amd.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7674b7b5
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 33d12dc9
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      1) Remove leftovers from flowtable modules, from Geert Uytterhoeven.
      
      2) Missing refcount increment of conntrack template in nft_ct,
         from Florian Westphal.
      
      3) Reduce nft_zone selftest time, also from Florian.
      
      4) Add selftest to cover stateless NAT on fragments, from Florian Westphal.
      
      5) Do not set net_device when for reject packets from the bridge path,
         from Phil Sutter.
      
      6) Cancel register tracking info on nft_byteorder operations.
      
      7) Extend nft_concat_range selftest to cover set reload with no elements,
         from Florian Westphal.
      
      8) Remove useless update of pointer in chain blob builder, reported
         by kbuild test robot.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
        netfilter: nf_tables: remove assignment with no effect in chain blob builder
        selftests: nft_concat_range: add test for reload with no element add/del
        netfilter: nft_byteorder: track register operations
        netfilter: nft_reject_bridge: Fix for missing reply from prerouting
        selftests: netfilter: check stateless nat udp checksum fixup
        selftests: netfilter: reduce zone stress test running time
        netfilter: nft_ct: fix use after free when attaching zone template
        netfilter: Remove flowtable relics
      ====================
      
      Link: https://lore.kernel.org/r/20220127235235.656931-1-pablo@netfilter.org
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      33d12dc9
    • Shyam Sundar S K's avatar
      net: amd-xgbe: Fix skb data length underflow · 5aac9108
      Shyam Sundar S K authored
      There will be BUG_ON() triggered in include/linux/skbuff.h leading to
      intermittent kernel panic, when the skb length underflow is detected.
      
      Fix this by dropping the packet if such length underflows are seen
      because of inconsistencies in the hardware descriptors.
      
      Fixes: 622c36f1
      
       ("amd-xgbe: Fix jumbo MTU processing on newer hardware")
      Suggested-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarShyam Sundar S K <Shyam-sundar.S-k@amd.com>
      Acked-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Link: https://lore.kernel.org/r/20220127092003.2812745-1-Shyam-sundar.S-k@amd.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5aac9108
    • Linus Torvalds's avatar
      Merge tag 'net-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 23a46422
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from netfilter and can.
      
        Current release - new code bugs:
      
         - tcp: add a missing sk_defer_free_flush() in tcp_splice_read()
      
         - tcp: add a stub for sk_defer_free_flush(), fix CONFIG_INET=n
      
         - nf_tables: set last expression in register tracking area
      
         - nft_connlimit: fix memleak if nf_ct_netns_get() fails
      
         - mptcp: fix removing ids bitmap setting
      
         - bonding: use rcu_dereference_rtnl when getting active slave
      
         - fix three cases of sleep in atomic context in drivers: lan966x, gve
      
         - handful of build fixes for esoteric drivers after netdev->dev_addr
           was made const
      
        Previous releases - regressions:
      
         - revert "ipv6: Honor all IPv6 PIO Valid Lifetime values", it broke
           Linux compatibility with USGv6 tests
      
         - procfs: show net device bound packet types
      
         - ipv4: fix ip option filtering for locally generated fragments
      
         - phy: ...
      23a46422
    • Tim Yi's avatar
      net: bridge: vlan: fix memory leak in __allowed_ingress · fd20d973
      Tim Yi authored
      When using per-vlan state, if vlan snooping and stats are disabled,
      untagged or priority-tagged ingress frame will go to check pvid state.
      If the port state is forwarding and the pvid state is not
      learning/forwarding, untagged or priority-tagged frame will be dropped
      but skb memory is not freed.
      Should free skb when __allowed_ingress returns false.
      
      Fixes: a580c76d
      
       ("net: bridge: vlan: add per-vlan state")
      Signed-off-by: default avatarTim Yi <tim.yi@pica8.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Link: https://lore.kernel.org/r/20220127074953.12632-1-tim.yi@pica8.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fd20d973
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: remove assignment with no effect in chain blob builder · b07f4137
      Pablo Neira Ayuso authored
      
      
      cppcheck possible warnings:
      
      >> net/netfilter/nf_tables_api.c:2014:2: warning: Assignment of function parameter has no effect outside the function. Did you forget dereferencing it? [uselessAssignmentPtrArg]
          ptr += offsetof(struct nft_rule_dp, data);
          ^
      
      Reported-by: default avatarkernel test robot <yujie.liu@intel.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      b07f4137
    • Menglong Dong's avatar
      net: socket: rename SKB_DROP_REASON_SOCKET_FILTER · 364df53c
      Menglong Dong authored
      
      
      Rename SKB_DROP_REASON_SOCKET_FILTER, which is used
      as the reason of skb drop out of socket filter before
      it's part of a released kernel. It will be used for
      more protocols than just TCP in future series.
      
      Signed-off-by: default avatarMenglong Dong <imagedong@tencent.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/all/20220127091308.91401-2-imagedong@tencent.com/
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      364df53c
    • Eric Dumazet's avatar
      ipv4: remove sparse error in ip_neigh_gw4() · 3c42b201
      Eric Dumazet authored
      ./include/net/route.h:373:48: warning: incorrect type in argument 2 (different base types)
      ./include/net/route.h:373:48:    expected unsigned int [usertype] key
      ./include/net/route.h:373:48:    got restricted __be32 [usertype] daddr
      
      Fixes: 5c9f7c1d
      
       ("ipv4: Add helpers for neigh lookup for nexthop")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20220127013404.1279313-1-eric.dumazet@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3c42b201
    • Jakub Kicinski's avatar
      Merge branch 'ipv4-less-uses-of-shared-ip-generator' · 3ede6465
      Jakub Kicinski authored
      Eric Dumazet says:
      
      ====================
      ipv4: less uses of shared IP generator
      
      From: Eric Dumazet <edumazet@google.com>
      
      We keep receiving research reports based on linux IPID generation.
      
      Before breaking part of the Internet by switching to pure
      random generator, this series reduces the need for the
      shared IP generator for TCP sockets.
      ====================
      
      Link: https://lore.kernel.org/r/20220127011022.1274803-1-eric.dumazet@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3ede6465
    • Eric Dumazet's avatar
      ipv4: avoid using shared IP generator for connected sockets · 23f57406
      Eric Dumazet authored
      ip_select_ident_segs() has been very conservative about using
      the connected socket private generator only for packets with IP_DF
      set, claiming it was needed for some VJ compression implementations.
      
      As mentioned in this referenced document, this can be abused.
      (Ref: Off-Path TCP Exploits of the Mixed IPID Assignment)
      
      Before switching to pure random IPID generation and possibly hurt
      some workloads, lets use the private inet socket generator.
      
      Not only this will remove one vulnerability, this will also
      improve performance of TCP flows using pmtudisc==IP_PMTUDISC_DONT
      
      Fixes: 73f156a6
      
       ("inetpeer: get rid of ip_id_count")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Reported-by: default avatarRay Che <xijiache@gmail.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      23f57406
    • Eric Dumazet's avatar
      ipv4: tcp: send zero IPID in SYNACK messages · 970a5a3e
      Eric Dumazet authored
      In commit 431280ee ("ipv4: tcp: send zero IPID for RST and
      ACK sent in SYN-RECV and TIME-WAIT state") we took care of some
      ctl packets sent by TCP.
      
      It turns out we need to use a similar strategy for SYNACK packets.
      
      By default, they carry IP_DF and IPID==0, but there are ways
      to ask them to use the hashed IP ident generator and thus
      be used to build off-path attacks.
      (Ref: Off-Path TCP Exploits of the Mixed IPID Assignment)
      
      One of this way is to force (before listener is started)
      echo 1 >/proc/sys/net/ipv4/ip_no_pmtu_disc
      
      Another way is using forged ICMP ICMP_FRAG_NEEDED
      with a very small MTU (like 68) to force a false return from
      ip_dont_fragment()
      
      In this patch, ip_build_and_send_pkt() uses the following
      heuristics.
      
      1) Most SYNACK packets are smaller than IPV4_MIN_MTU and therefore
      can use IP_DF regardless of the listener or route pmtu setting.
      
      2) In case the SYNACK packet is bigger than IPV4_MIN_MTU,
      we use prandom_u32() generator instead of the IPv4 hashed ident one.
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarRay Che <xijiache@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Cc: Geoff Alexander <alexandg@cs.unm.edu>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      970a5a3e
  4. Jan 27, 2022