Skip to content
  1. Dec 02, 2021
    • Eric Dumazet's avatar
      ipv4: convert fib_num_tclassid_users to atomic_t · 213f5f8f
      Eric Dumazet authored
      Before commit faa041a4 ("ipv4: Create cleanup helper for fib_nh")
      changes to net->ipv4.fib_num_tclassid_users were protected by RTNL.
      
      After the change, this is no longer the case, as free_fib_info_rcu()
      runs after rcu grace period, without rtnl being held.
      
      Fixes: faa041a4
      
       ("ipv4: Create cleanup helper for fib_nh")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: David Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      213f5f8f
    • Eric Dumazet's avatar
      net: avoid uninit-value from tcp_conn_request · a37a0ee4
      Eric Dumazet authored
      A recent change triggers a KMSAN warning, because request
      sockets do not initialize @sk_rx_queue_mapping field.
      
      Add sk_rx_queue_update() helper to make our intent clear.
      
      BUG: KMSAN: uninit-value in sk_rx_queue_set include/net/sock.h:1922 [inline]
      BUG: KMSAN: uninit-value in tcp_conn_request+0x3bcc/0x4dc0 net/ipv4/tcp_input.c:6922
       sk_rx_queue_set include/net/sock.h:1922 [inline]
       tcp_conn_request+0x3bcc/0x4dc0 net/ipv4/tcp_input.c:6922
       tcp_v4_conn_request+0x218/0x2a0 net/ipv4/tcp_ipv4.c:1528
       tcp_rcv_state_process+0x2c5/0x3290 net/ipv4/tcp_input.c:6406
       tcp_v4_do_rcv+0xb4e/0x1330 net/ipv4/tcp_ipv4.c:1738
       tcp_v4_rcv+0x468d/0x4ed0 net/ipv4/tcp_ipv4.c:2100
       ip_protocol_deliver_rcu+0x760/0x10b0 net/ipv4/ip_input.c:204
       ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ip_local_deliver+0x584/0x8c0 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:460 [inline]
       ip_sublist_rcv_finish net/ipv4/ip_input.c:551 [inline]
       ip_list_rcv_finish net/ipv4/ip_input.c:601 [inline]
       ip_sublist_rcv+0x11fd/0x1520 net/ipv4/ip_input.c:609
       ip_list_rcv+0x95f/0x9a0 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5505 [inline]
       __netif_receive_skb_list_core+0xe34/0x1240 net/core/dev.c:5553
       __netif_receive_skb_list+0x7fc/0x960 net/core/dev.c:5605
       netif_receive_skb_list_internal+0x868/0xde0 net/core/dev.c:5696
       gro_normal_list net/core/dev.c:5850 [inline]
       napi_complete_done+0x579/0xdd0 net/core/dev.c:6587
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0x17b6/0x2350 drivers/net/virtio_net.c:1557
       __napi_poll+0x14e/0xbc0 net/core/dev.c:7020
       napi_poll net/core/dev.c:7087 [inline]
       net_rx_action+0x824/0x1880 net/core/dev.c:7174
       __do_softirq+0x1fe/0x7eb kernel/softirq.c:558
       invoke_softirq+0xa4/0x130 kernel/softirq.c:432
       __irq_exit_rcu kernel/softirq.c:636 [inline]
       irq_exit_rcu+0x76/0x130 kernel/softirq.c:648
       common_interrupt+0xb6/0xd0 arch/x86/kernel/irq.c:240
       asm_common_interrupt+0x1e/0x40
       smap_restore arch/x86/include/asm/smap.h:67 [inline]
       get_shadow_origin_ptr mm/kmsan/instrumentation.c:31 [inline]
       __msan_metadata_ptr_for_load_1+0x28/0x30 mm/kmsan/instrumentation.c:63
       tomoyo_check_acl+0x1b0/0x630 security/tomoyo/domain.c:173
       tomoyo_path_permission security/tomoyo/file.c:586 [inline]
       tomoyo_check_open_permission+0x61f/0xe10 security/tomoyo/file.c:777
       tomoyo_file_open+0x24f/0x2d0 security/tomoyo/tomoyo.c:311
       security_file_open+0xb1/0x1f0 security/security.c:1635
       do_dentry_open+0x4e4/0x1bf0 fs/open.c:809
       vfs_open+0xaf/0xe0 fs/open.c:957
       do_open fs/namei.c:3426 [inline]
       path_openat+0x52f1/0x5dd0 fs/namei.c:3559
       do_filp_open+0x306/0x760 fs/namei.c:3586
       do_sys_openat2+0x263/0x8f0 fs/open.c:1212
       do_sys_open fs/open.c:1228 [inline]
       __do_sys_open fs/open.c:1236 [inline]
       __se_sys_open fs/open.c:1232 [inline]
       __x64_sys_open+0x314/0x380 fs/open.c:1232
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Uninit was created at:
       __alloc_pages+0xbc7/0x10a0 mm/page_alloc.c:5409
       alloc_pages+0x8a5/0xb80
       alloc_slab_page mm/slub.c:1810 [inline]
       allocate_slab+0x287/0x1c20 mm/slub.c:1947
       new_slab mm/slub.c:2010 [inline]
       ___slab_alloc+0xbdf/0x1e90 mm/slub.c:3039
       __slab_alloc mm/slub.c:3126 [inline]
       slab_alloc_node mm/slub.c:3217 [inline]
       slab_alloc mm/slub.c:3259 [inline]
       kmem_cache_alloc+0xbb3/0x11c0 mm/slub.c:3264
       reqsk_alloc include/net/request_sock.h:91 [inline]
       inet_reqsk_alloc+0xaf/0x8b0 net/ipv4/tcp_input.c:6712
       tcp_conn_request+0x910/0x4dc0 net/ipv4/tcp_input.c:6852
       tcp_v4_conn_request+0x218/0x2a0 net/ipv4/tcp_ipv4.c:1528
       tcp_rcv_state_process+0x2c5/0x3290 net/ipv4/tcp_input.c:6406
       tcp_v4_do_rcv+0xb4e/0x1330 net/ipv4/tcp_ipv4.c:1738
       tcp_v4_rcv+0x468d/0x4ed0 net/ipv4/tcp_ipv4.c:2100
       ip_protocol_deliver_rcu+0x760/0x10b0 net/ipv4/ip_input.c:204
       ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ip_local_deliver+0x584/0x8c0 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:460 [inline]
       ip_sublist_rcv_finish net/ipv4/ip_input.c:551 [inline]
       ip_list_rcv_finish net/ipv4/ip_input.c:601 [inline]
       ip_sublist_rcv+0x11fd/0x1520 net/ipv4/ip_input.c:609
       ip_list_rcv+0x95f/0x9a0 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5505 [inline]
       __netif_receive_skb_list_core+0xe34/0x1240 net/core/dev.c:5553
       __netif_receive_skb_list+0x7fc/0x960 net/core/dev.c:5605
       netif_receive_skb_list_internal+0x868/0xde0 net/core/dev.c:5696
       gro_normal_list net/core/dev.c:5850 [inline]
       napi_complete_done+0x579/0xdd0 net/core/dev.c:6587
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0x17b6/0x2350 drivers/net/virtio_net.c:1557
       __napi_poll+0x14e/0xbc0 net/core/dev.c:7020
       napi_poll net/core/dev.c:7087 [inline]
       net_rx_action+0x824/0x1880 net/core/dev.c:7174
       __do_softirq+0x1fe/0x7eb kernel/softirq.c:558
      
      Fixes: 342159ee
      
       ("net: avoid dirtying sk->sk_rx_queue_mapping")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20211130182939.2584764-1-eric.dumazet@gmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a37a0ee4
    • Eric Dumazet's avatar
      net: annotate data-races on txq->xmit_lock_owner · 7a10d8c8
      Eric Dumazet authored
      syzbot found that __dev_queue_xmit() is reading txq->xmit_lock_owner
      without annotations.
      
      No serious issue there, let's document what is happening there.
      
      BUG: KCSAN: data-race in __dev_queue_xmit / __dev_queue_xmit
      
      write to 0xffff888139d09484 of 4 bytes by interrupt on cpu 0:
       __netif_tx_unlock include/linux/netdevice.h:4437 [inline]
       __dev_queue_xmit+0x948/0xf70 net/core/dev.c:4229
       dev_queue_xmit_accel+0x19/0x20 net/core/dev.c:4265
       macvlan_queue_xmit drivers/net/macvlan.c:543 [inline]
       macvlan_start_xmit+0x2b3/0x3d0 drivers/net/macvlan.c:567
       __netdev_start_xmit include/linux/netdevice.h:4987 [inline]
       netdev_start_xmit include/linux/netdevice.h:5001 [inline]
       xmit_one+0x105/0x2f0 net/core/dev.c:3590
       dev_hard_start_xmit+0x72/0x120 net/core/dev.c:3606
       sch_direct_xmit+0x1b2/0x7c0 net/sched/sch_generic.c:342
       __dev_xmit_skb+0x83d/0x1370 net/core/dev.c:3817
       __dev_queue_xmit+0x590/0xf70 net/core/dev.c:4194
       dev_queue_xmit+0x13/0x20 net/core/dev.c:4259
       neigh_hh_output include/net/neighbour.h:511 [inline]
       neigh_output include/net/neighbour.h:525 [inline]
       ip6_finish_output2+0x995/0xbb0 net/ipv6/ip6_output.c:126
       __ip6_finish_output net/ipv6/ip6_output.c:191 [inline]
       ip6_finish_output+0x444/0x4c0 net/ipv6/ip6_output.c:201
       NF_HOOK_COND include/linux/netfilter.h:296 [inline]
       ip6_output+0x10e/0x210 net/ipv6/ip6_output.c:224
       dst_output include/net/dst.h:450 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ndisc_send_skb+0x486/0x610 net/ipv6/ndisc.c:508
       ndisc_send_rs+0x3b0/0x3e0 net/ipv6/ndisc.c:702
       addrconf_rs_timer+0x370/0x540 net/ipv6/addrconf.c:3898
       call_timer_fn+0x2e/0x240 kernel/time/timer.c:1421
       expire_timers+0x116/0x240 kernel/time/timer.c:1466
       __run_timers+0x368/0x410 kernel/time/timer.c:1734
       run_timer_softirq+0x2e/0x60 kernel/time/timer.c:1747
       __do_softirq+0x158/0x2de kernel/softirq.c:558
       __irq_exit_rcu kernel/softirq.c:636 [inline]
       irq_exit_rcu+0x37/0x70 kernel/softirq.c:648
       sysvec_apic_timer_interrupt+0x3e/0xb0 arch/x86/kernel/apic/apic.c:1097
       asm_sysvec_apic_timer_interrupt+0x12/0x20
      
      read to 0xffff888139d09484 of 4 bytes by interrupt on cpu 1:
       __dev_queue_xmit+0x5e3/0xf70 net/core/dev.c:4213
       dev_queue_xmit_accel+0x19/0x20 net/core/dev.c:4265
       macvlan_queue_xmit drivers/net/macvlan.c:543 [inline]
       macvlan_start_xmit+0x2b3/0x3d0 drivers/net/macvlan.c:567
       __netdev_start_xmit include/linux/netdevice.h:4987 [inline]
       netdev_start_xmit include/linux/netdevice.h:5001 [inline]
       xmit_one+0x105/0x2f0 net/core/dev.c:3590
       dev_hard_start_xmit+0x72/0x120 net/core/dev.c:3606
       sch_direct_xmit+0x1b2/0x7c0 net/sched/sch_generic.c:342
       __dev_xmit_skb+0x83d/0x1370 net/core/dev.c:3817
       __dev_queue_xmit+0x590/0xf70 net/core/dev.c:4194
       dev_queue_xmit+0x13/0x20 net/core/dev.c:4259
       neigh_resolve_output+0x3db/0x410 net/core/neighbour.c:1523
       neigh_output include/net/neighbour.h:527 [inline]
       ip6_finish_output2+0x9be/0xbb0 net/ipv6/ip6_output.c:126
       __ip6_finish_output net/ipv6/ip6_output.c:191 [inline]
       ip6_finish_output+0x444/0x4c0 net/ipv6/ip6_output.c:201
       NF_HOOK_COND include/linux/netfilter.h:296 [inline]
       ip6_output+0x10e/0x210 net/ipv6/ip6_output.c:224
       dst_output include/net/dst.h:450 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ndisc_send_skb+0x486/0x610 net/ipv6/ndisc.c:508
       ndisc_send_rs+0x3b0/0x3e0 net/ipv6/ndisc.c:702
       addrconf_rs_timer+0x370/0x540 net/ipv6/addrconf.c:3898
       call_timer_fn+0x2e/0x240 kernel/time/timer.c:1421
       expire_timers+0x116/0x240 kernel/time/timer.c:1466
       __run_timers+0x368/0x410 kernel/time/timer.c:1734
       run_timer_softirq+0x2e/0x60 kernel/time/timer.c:1747
       __do_softirq+0x158/0x2de kernel/softirq.c:558
       __irq_exit_rcu kernel/softirq.c:636 [inline]
       irq_exit_rcu+0x37/0x70 kernel/softirq.c:648
       sysvec_apic_timer_interrupt+0x8d/0xb0 arch/x86/kernel/apic/apic.c:1097
       asm_sysvec_apic_timer_interrupt+0x12/0x20
       kcsan_setup_watchpoint+0x94/0x420 kernel/kcsan/core.c:443
       folio_test_anon include/linux/page-flags.h:581 [inline]
       PageAnon include/linux/page-flags.h:586 [inline]
       zap_pte_range+0x5ac/0x10e0 mm/memory.c:1347
       zap_pmd_range mm/memory.c:1467 [inline]
       zap_pud_range mm/memory.c:1496 [inline]
       zap_p4d_range mm/memory.c:1517 [inline]
       unmap_page_range+0x2dc/0x3d0 mm/memory.c:1538
       unmap_single_vma+0x157/0x210 mm/memory.c:1583
       unmap_vmas+0xd0/0x180 mm/memory.c:1615
       exit_mmap+0x23d/0x470 mm/mmap.c:3170
       __mmput+0x27/0x1b0 kernel/fork.c:1113
       mmput+0x3d/0x50 kernel/fork.c:1134
       exit_mm+0xdb/0x170 kernel/exit.c:507
       do_exit+0x608/0x17a0 kernel/exit.c:819
       do_group_exit+0xce/0x180 kernel/exit.c:929
       get_signal+0xfc3/0x1550 kernel/signal.c:2852
       arch_do_signal_or_restart+0x8c/0x2e0 arch/x86/kernel/signal.c:868
       handle_signal_work kernel/entry/common.c:148 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
       exit_to_user_mode_prepare+0x113/0x190 kernel/entry/common.c:207
       __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
       syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:300
       do_syscall_64+0x50/0xd0 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x00000000 -> 0xffffffff
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 28712 Comm: syz-executor.0 Tainted: G        W         5.16.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20211130170155.2331929-1-eric.dumazet@gmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7a10d8c8
    • Zhou Qingyang's avatar
      octeontx2-af: Fix a memleak bug in rvu_mbox_init() · e07a097b
      Zhou Qingyang authored
      In rvu_mbox_init(), mbox_regions is not freed or passed out
      under the switch-default region, which could lead to a memory leak.
      
      Fix this bug by changing 'return err' to 'goto free_regions'.
      
      This bug was found by a static analyzer. The analysis employs
      differential checking to identify inconsistent security operations
      (e.g., checks or kfrees) between two code paths and confirms that the
      inconsistent operations are not recovered in the current function or
      the callers, so they constitute bugs.
      
      Note that, as a bug found by static analysis, it can be a false
      positive or hard to trigger. Multiple researchers have cross-reviewed
      the bug.
      
      Builds with CONFIG_OCTEONTX2_AF=y show no new warnings,
      and our static analyzer no longer warns about this code.
      
      Fixes: 98c56111
      
       (“octeontx2-af: cn10k: Add mbox support for CN10K platform”)
      Signed-off-by: default avatarZhou Qingyang <zhou1615@umn.edu>
      Link: https://lore.kernel.org/r/20211130165039.192426-1-zhou1615@umn.edu
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e07a097b
    • Zhou Qingyang's avatar
      net/mlx4_en: Fix an use-after-free bug in mlx4_en_try_alloc_resources() · addad764
      Zhou Qingyang authored
      In mlx4_en_try_alloc_resources(), mlx4_en_copy_priv() is called and
      tmp->tx_cq will be freed on the error path of mlx4_en_copy_priv().
      After that mlx4_en_alloc_resources() is called and there is a dereference
      of &tmp->tx_cq[t][i] in mlx4_en_alloc_resources(), which could lead to
      a use after free problem on failure of mlx4_en_copy_priv().
      
      Fix this bug by adding a check of mlx4_en_copy_priv()
      
      This bug was found by a static analyzer. The analysis employs
      differential checking to identify inconsistent security operations
      (e.g., checks or kfrees) between two code paths and confirms that the
      inconsistent operations are not recovered in the current function or
      the callers, so they constitute bugs.
      
      Note that, as a bug found by static analysis, it can be a false
      positive or hard to trigger. Multiple researchers have cross-reviewed
      the bug.
      
      Builds with CONFIG_MLX4_EN=m show no new warnings,
      and our static analyzer no longer warns about this code.
      
      Fixes: ec25bc04
      
       ("net/mlx4_en: Add resilience in low memory systems")
      Signed-off-by: default avatarZhou Qingyang <zhou1615@umn.edu>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/20211130164438.190591-1-zhou1615@umn.edu
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      addad764
    • Stephen Suryaputra's avatar
      vrf: Reset IPCB/IP6CB when processing outbound pkts in vrf dev xmit · ee201011
      Stephen Suryaputra authored
      IPCB/IP6CB need to be initialized when processing outbound v4 or v6 pkts
      in the codepath of vrf device xmit function so that leftover garbage
      doesn't cause futher code that uses the CB to incorrectly process the
      pkt.
      
      One occasion of the issue might occur when MPLS route uses the vrf
      device as the outgoing device such as when the route is added using "ip
      -f mpls route add <label> dev <vrf>" command.
      
      The problems seems to exist since day one. Hence I put the day one
      commits on the Fixes tags.
      
      Fixes: 193125db ("net: Introduce VRF device driver")
      Fixes: 35402e31
      
       ("net: Add IPv6 support to VRF device")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarStephen Suryaputra <ssuryaextr@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20211130162637.3249-1-ssuryaextr@gmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ee201011
    • Zhou Qingyang's avatar
      net: qlogic: qlcnic: Fix a NULL pointer dereference in qlcnic_83xx_add_rings() · e2dabc4f
      Zhou Qingyang authored
      In qlcnic_83xx_add_rings(), the indirect function of
      ahw->hw_ops->alloc_mbx_args will be called to allocate memory for
      cmd.req.arg, and there is a dereference of it in qlcnic_83xx_add_rings(),
      which could lead to a NULL pointer dereference on failure of the
      indirect function like qlcnic_83xx_alloc_mbx_args().
      
      Fix this bug by adding a check of alloc_mbx_args(), this patch
      imitates the logic of mbx_cmd()'s failure handling.
      
      This bug was found by a static analyzer. The analysis employs
      differential checking to identify inconsistent security operations
      (e.g., checks or kfrees) between two code paths and confirms that the
      inconsistent operations are not recovered in the current function or
      the callers, so they constitute bugs.
      
      Note that, as a bug found by static analysis, it can be a false
      positive or hard to trigger. Multiple researchers have cross-reviewed
      the bug.
      
      Builds with CONFIG_QLCNIC=m show no new warnings, and our
      static analyzer no longer warns about this code.
      
      Fixes: 7f966452
      
       ("qlcnic: 83xx memory map and HW access routine")
      Signed-off-by: default avatarZhou Qingyang <zhou1615@umn.edu>
      Link: https://lore.kernel.org/r/20211130110848.109026-1-zhou1615@umn.edu
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e2dabc4f
  2. Dec 01, 2021
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-2021-12-01' of... · 3968e3ca
      David S. Miller authored
      
      Merge tag 'wireless-drivers-2021-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
      
      Kalle Valo says:
      
      ====================
      wireless-drivers fixes for v5.16
      
      First set of fixes for v5.16. Mostly crash and driver initialisation
      fixes, the fix for rtw89 being most important.
      
      iwlwifi
      
      * compiler, lockdep and smatch warning fixes
      
      * fix for a rare driver initialisation failure
      
      * fix a memory leak
      
      rtw89
      
      * fix const buffer modification causing a kernel crash
      
      mt76
      
      * fix null pointer access
      
      * fix idr leak
      
      rt2x00
      
      * fix driver initialisation errors, a regression since v5.2-rc1
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3968e3ca
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2021-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 4326d04f
      David S. Miller authored
      
      
      Saeed Mahameed says:
      
      ====================
      mlx5 fixes 2021-11-30
      
      This series provides bug fixes to mlx5 driver.
      Please pull and let me know if there is any problem.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4326d04f
    • David S. Miller's avatar
      Merge branch 'mv88e6xxx-fixes' · 74b95b07
      David S. Miller authored
      
      
      Marek Behún says:
      
      ====================
      mv88e6xxx fixes (mainly 88E6393X family)
      
      sending v2 of these fixes.
      
      Original cover letter:
      
      So I managed to discovered how to fix inband AN for 2500base-x mode on
      88E6393x (Amethyst) family.
      
      This series fixes application of erratum 4.8, adds fix for erratum 5.2,
      adds support for completely disablign SerDes receiver / transmitter,
      fixes inband AN for 2500base-x mode by using 1000base-x mode and simply
      changing frequeny to 3.125 GHz, all this for 88E6393X.
      
      The last commit fixes linking when link partner has AN disabled and the
      device invokes the AN bypass feature. Currently we fail to link in this
      case.
      
      Changes since v1:
      - fixed wrong operator in patch 3 (thanks Russell)
      - added more comments about why BMCR_ANENABLE is used in patch 6 (thanks
        Russell)
      - updated some return statements from
           if (something)
             return func();
           return 0;
        to
           if (something)
             err = func();
           return err;
        (err is set to 0 before the condition)
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      74b95b07
    • Marek Behún's avatar
      net: dsa: mv88e6xxx: Link in pcs_get_state() if AN is bypassed · ede359d8
      Marek Behún authored
      Function mv88e6xxx_serdes_pcs_get_state() currently does not report link
      up if AN is enabled, Link bit is set, but Speed and Duplex Resolved bit
      is not set, which testing shows is the case for when auto-negotiation
      was bypassed (we have AN enabled but link partner does not).
      
      An example of such link partner is Marvell 88X3310 PHY, when put into
      the mode where host interface changes between 10gbase-r, 5gbase-r,
      2500base-x and sgmii according to copper speed. The 88X3310 does not
      enable AN in 2500base-x, and so SerDes on mv88e6xxx currently does not
      link with it.
      
      Fix this.
      
      Fixes: a5a6858b
      
       ("net: dsa: mv88e6xxx: extend phylink to Serdes PHYs")
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ede359d8
    • Marek Behún's avatar
      net: dsa: mv88e6xxx: Fix inband AN for 2500base-x on 88E6393X family · 163000db
      Marek Behún authored
      Inband AN is broken on Amethyst in 2500base-x mode when set by standard
      mechanism (via cmode).
      
      (There probably is some weird setting done by default in the switch for
       this mode that make it cycle in some state or something, because when
       the peer is the mvneta controller, it receives link change interrupts
       every ~0.3ms, but the link is always down.)
      
      Get around this by configuring the PCS mode to 1000base-x (where inband
      AN works), and then changing the SerDes frequency while SerDes
      transmitter and receiver are disabled, before enabling SerDes PHY. After
      disabling SerDes PHY, change the PCS mode back to 2500base-x, to avoid
      confusing the device (if we leave it at 1000base-x PCS mode but with
      different frequency, and then change cmode to sgmii, the device won't
      change the frequency because it thinks it already has the correct one).
      
      The register which changes the frequency is undocumented. I discovered
      it by going through all registers in the ranges 4.f000-4.f100 and
      1e.8000-1e.8200 for all SerDes cmodes (sgmii, 1000base-x, 2500base-x,
      5gbase-r, 10gbase-r, usxgmii) and filtering out registers that didn't
      make sense (the value was the same for modes which have different
      frequency). The result of this was:
      
          reg   sgmii 1000base-x 2500base-x 5gbase-r 10gbase-r usxgmii
        04.f002  005b       0058       0059     005c      005d    005f
        04.f076  3000       0000       1000     4000      5000    7000
        04.f07c  0950       0950       1850     0550      0150    0150
        1e.8000  0059       0059       0058     0055      0051    0051
        1e.8140  0e20       0e20       0e28     0e21      0e42    0e42
      
      Register 04.f002 is the documented Port Operational Confiuration
      register, it's last 3 bits select PCS type, so changing this register
      also changes the frequency to the appropriate value.
      
      Registers 04.f076 and 04.f07c are not writable.
      
      Undocumented register 1e.8000 was the one: changing bits 3:0 from 9 to 8
      changed SerDes frequency to 3.125 GHz, while leaving the value of PCS
      mode in register 04.f002.2:0 at 1000base-x. Inband autonegotiation
      started working correctly.
      
      (I didn't try anything with register 1e.8140 since 1e.8000 solved the
       problem.)
      
      Since I don't have documentation for this register 1e.8000.3:0, I am
      using the constants without names, but my hypothesis is that this
      register selects PHY frequency. If in the future I have access to an
      oscilloscope able to handle these frequencies, I will try to test this
      hypothesis.
      
      Fixes: de776d0d
      
       ("net: dsa: mv88e6xxx: add support for mv88e6393x family")
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      163000db
    • Marek Behún's avatar
      net: dsa: mv88e6xxx: Add fix for erratum 5.2 of 88E6393X family · 93fd8207
      Marek Behún authored
      Add fix for erratum 5.2 of the 88E6393X (Amethyst) family: for 10gbase-r
      mode, some undocumented registers need to be written some special
      values.
      
      Fixes: de776d0d
      
       ("net: dsa: mv88e6xxx: add support for mv88e6393x family")
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93fd8207
    • Marek Behún's avatar
      net: dsa: mv88e6xxx: Save power by disabling SerDes trasmitter and receiver · 7527d662
      Marek Behún authored
      
      
      Save power on 88E6393X by disabling SerDes receiver and transmitter
      after SerDes is SerDes is disabled.
      
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Cc: stable@vger.kernel.org # de776d0d
      
       ("net: dsa: mv88e6xxx: add support for mv88e6393x family")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7527d662
    • Marek Behún's avatar
      net: dsa: mv88e6xxx: Drop unnecessary check in mv88e6393x_serdes_erratum_4_6() · 8c3318b4
      Marek Behún authored
      
      
      The check for lane is unnecessary, since the function is called only
      with allowed lane argument.
      
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c3318b4
    • Marek Behún's avatar
      net: dsa: mv88e6xxx: Fix application of erratum 4.8 for 88E6393X · 21635d92
      Marek Behún authored
      According to SERDES scripts for 88E6393X, erratum 4.8 has to be applied
      every time before SerDes is powered on.
      
      Split the code for erratum 4.8 into separate function and call it in
      mv88e6393x_serdes_power().
      
      Fixes: de776d0d
      
       ("net: dsa: mv88e6xxx: add support for mv88e6393x family")
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      21635d92
    • Ben Ben-Ishay's avatar
      net/mlx5e: SHAMPO, Fix constant expression result · 8c8cf038
      Ben Ben-Ishay authored
      mlx5e_build_shampo_hd_umr uses counters i and index incorrectly
      as unsigned, thus the err state err_unmap could stuck in endless loop.
      Change i to int to solve the first issue.
      Reduce index check to solve the second issue, the caller function
      validates that index could not rotate.
      
      Fixes: 64509b05
      
       ("net/mlx5e: Add data path for SHAMPO feature")
      Signed-off-by: default avatarBen Ben-Ishay <benishay@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      8c8cf038
    • Aya Levin's avatar
      net/mlx5: Fix access to a non-supported register · 502e82b9
      Aya Levin authored
      Validate MRTC register is supported before triggering a delayed work
      which accesses it.
      
      Fixes: 5a1023de
      
       ("net/mlx5: Add periodic update of host time to firmware")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarGal Pressman <gal@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      502e82b9
    • Gal Pressman's avatar
      net/mlx5: Fix too early queueing of log timestamp work · 924cc463
      Gal Pressman authored
      The log timestamp work should not be queued before the command interface
      is initialized, move it to a later stage in the init flow.
      
      Fixes: 5a1023de
      
       ("net/mlx5: Add periodic update of host time to firmware")
      Signed-off-by: default avatarGal Pressman <gal@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      924cc463
    • Amir Tzin's avatar
      net/mlx5: Fix use after free in mlx5_health_wait_pci_up · 76091b0f
      Amir Tzin authored
      The device health recovery flow calls mlx5_health_wait_pci_up() which
      queries the device for FW_RESET timeout after freeing the device
      timeouts structure on mlx5_function_teardown(). Fix this bug by moving
      timeouts structure init/cleanup to the device's init/uninit phases.
      Since it is necessary to reset default software timeouts on function
      reload, extract setting of defaults values from mlx5_tout_init() and
      call mlx5_tout_set_def_val() directly from mlx5_function_setup().
      
      Fixes: 5945e1ad
      
       ("net/mlx5: Read timeout values from init segment")
      Reported by: Niklas Schnelle <schnelle@linux.ibm.com>
      Signed-off-by: default avatarAmir Tzin <amirtz@nvidia.com>
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      76091b0f
    • Maor Dickman's avatar
      net/mlx5: E-Switch, Use indirect table only if all destinations support it · e219440d
      Maor Dickman authored
      When adding rule with multiple destinations, indirect table is used for all of
      the destinations if at least one of the destinations support it, this can cause
      creation of invalid indirect tables for the destinations that doesn't support it.
      
      Fixed it by using indirect table only if all destinations support it.
      
      Fixes: a508728a
      
       ("net/mlx5e: VF tunnel RX traffic offloading")
      Signed-off-by: default avatarMaor Dickman <maord@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      e219440d
    • Dmytro Linkin's avatar
      net/mlx5: E-Switch, Check group pointer before reading bw_share value · 5c4e8ae7
      Dmytro Linkin authored
      If log_esw_max_sched_depth is not supported group pointer of the vport
      is NULL. Hence, check the pointer before reading bw_share value.
      
      Fixes: 0fe132ea
      
       ("net/mlx5: E-switch, Allow to add vports to rate groups")
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      5c4e8ae7
    • Mark Bloch's avatar
      net/mlx5: E-Switch, fix single FDB creation on BlueField · 43a0696f
      Mark Bloch authored
      Always use MLX5_FLOW_TABLE_OTHER_VPORT flag when creating egress ACL
      table for single FDB. Not doing so on BlueField will make firmware fail
      the command. On BlueField the E-Switch manager is the ECPF (vport 0xFFFE)
      which is filled in the flow table creation command but as the
      other_vport field wasn't set the firmware complains about a bad parameter.
      
      This is different from a regular HCA where the E-Switch manager vport is
      the PF (vport 0x0). Passing MLX5_FLOW_TABLE_OTHER_VPORT will make the
      firmware happy both on BlueField and on regular HCAs without special
      condition for each.
      
      This fixes the bellow firmware syndrome:
      mlx5_cmd_check:819:(pid 571): CREATE_FLOW_TABLE(0x930) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x754a4)
      
      Fixes: db202995
      
       ("net/mlx5: E-Switch, add logic to enable shared FDB")
      Signed-off-by: default avatarMark Bloch <mbloch@nvidia.com>
      Reviewed-by: default avatarMaor Gottlieb <maorg@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      43a0696f
    • Dmytro Linkin's avatar
      net/mlx5: E-switch, Respect BW share of the new group · 1e59b32e
      Dmytro Linkin authored
      To enable transmit schduler on vport FW require non-zero configuration
      for vport's TSAR. If vport added to the group which has configured BW
      share value and TX rate values of the vport are zero, then scheduler
      wouldn't be enabled on this vport.
      Fix that by calling BW normalization if BW share of the new group is
      configured.
      
      Fixes: 0fe132ea
      
       ("net/mlx5: E-switch, Allow to add vports to rate groups")
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      1e59b32e
    • Maor Gottlieb's avatar
      net/mlx5: Lag, Fix recreation of VF LAG · ffdf4531
      Maor Gottlieb authored
      Driver needs to nullify the port select attributes of the LAG when
      port selection is destroyed, otherwise it breaks recreation of the
      LAG.
      It fixes the below kernel oops:
      
       [  587.906377] BUG: kernel NULL pointer dereference, address: 0000000000000008
       [  587.908843] #PF: supervisor read access in kernel mode
       [  587.910730] #PF: error_code(0x0000) - not-present page
       [  587.912580] PGD 0 P4D 0
       [  587.913632] Oops: 0000 [#1] SMP PTI
       [  587.914644] CPU: 5 PID: 165 Comm: kworker/u20:5 Tainted: G           OE     5.9.0_mlnx #1
       [  587.916152] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       [  587.918332] Workqueue: mlx5_lag mlx5_do_bond_work [mlx5_core]
       [  587.919479] RIP: 0010:mlx5_del_flow_rules+0x10/0x270 [mlx5_core]
       [  587.920568] mlx5_core 0000:08:00.1 enp8s0f1: Link up
       [  587.920680] Code: c0 09 80 a0 e8 cf 42 a4 e0 48 c7 c3 f4 ff ff ff e8 8a 88 dd e0 e9 ab fe ff ff 0f 1f 44 00 00 41 56 41 55 49 89 fd 41 54 55 53 <48> 8b 47 08 48 8b 68 28 48 85 ed 74 2e 48 8d 7d 38 e8 6a 64 34 e1
       [  587.925116] bond0: (slave enp8s0f1): Enslaving as an active interface with an up link
       [  587.930415] RSP: 0018:ffffc9000048fd88 EFLAGS: 00010282
       [  587.930417] RAX: ffff88846c14fac0 RBX: ffff88846cddcb80 RCX: 0000000080400007
       [  587.930417] RDX: 0000000080400008 RSI: ffff88846cddcb80 RDI: 0000000000000000
       [  587.930419] RBP: ffff88845fd80140 R08: 0000000000000001 R09: ffffffffa074ba00
       [  587.938132] R10: ffff88846c14fec0 R11: 0000000000000001 R12: ffff88846c122f10
       [  587.939473] R13: 0000000000000000 R14: 0000000000000001 R15: ffff88846d7a0000
       [  587.940800] FS:  0000000000000000(0000) GS:ffff88846fa80000(0000) knlGS:0000000000000000
       [  587.942416] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [  587.943536] CR2: 0000000000000008 CR3: 000000000240a002 CR4: 0000000000770ee0
       [  587.944904] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       [  587.946308] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       [  587.947639] PKRU: 55555554
       [  587.948236] Call Trace:
       [  587.948834]  mlx5_lag_destroy_definer.isra.3+0x16/0x90 [mlx5_core]
       [  587.950033]  mlx5_lag_destroy_definers+0x5b/0x80 [mlx5_core]
       [  587.951128]  mlx5_deactivate_lag+0x6e/0x80 [mlx5_core]
       [  587.952146]  mlx5_do_bond+0x150/0x450 [mlx5_core]
       [  587.953086]  mlx5_do_bond_work+0x3e/0x50 [mlx5_core]
       [  587.954086]  process_one_work+0x1eb/0x3e0
       [  587.954899]  worker_thread+0x2d/0x3c0
       [  587.955656]  ? process_one_work+0x3e0/0x3e0
       [  587.956493]  kthread+0x115/0x130
       [  587.957174]  ? kthread_park+0x90/0x90
       [  587.957929]  ret_from_fork+0x1f/0x30
       [  587.973055] ---[ end trace 71ccd6eca89f5513 ]---
      
      Fixes: b7267869
      
       ("net/mlx5: Lag, add support to create/destroy/modify port selection")
      Signed-off-by: default avatarMaor Gottlieb <maorg@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      ffdf4531
    • Moshe Shemesh's avatar
      net/mlx5: Move MODIFY_RQT command to ignore list in internal error state · e45c0b34
      Moshe Shemesh authored
      When the device is in internal error state, command interface isn't
      accessible and the driver decides which commands to fail and which
      to ignore.
      
      Move the MODIFY_RQT command to the ignore list in order to avoid
      the following redundant warning messages in internal error state:
      
      mlx5_core 0000:82:00.1: mlx5e_rss_disable:419:(pid 23754): Failed to redirect RQT 0x0 to drop RQ 0xc00848: err = -5
      mlx5_core 0000:82:00.1: mlx5e_rx_res_channels_deactivate:598:(pid 23754): Failed to redirect direct RQT 0x1 to drop RQ 0xc00848 (channel 0): err = -5
      mlx5_core 0000:82:00.1: mlx5e_rx_res_channels_deactivate:607:(pid 23754): Failed to redirect XSK RQT 0x19 to drop RQ 0xc00848 (channel 0): err = -5
      
      Fixes: 43ec0f41
      
       ("net/mlx5e: Hide all implementation details of mlx5e_rx_res")
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      e45c0b34
    • Tariq Toukan's avatar
      net/mlx5e: Sync TIR params updates against concurrent create/modify · 4cce2ccf
      Tariq Toukan authored
      Transport Interface Receive (TIR) objects perform the packet processing and
      reassembly and is also responsible for demultiplexing the packets into the
      different RQs.
      
      There are certain TIR context attributes that propagate to the pointed RQs
      and applied to them (like packet_merge offloads (LRO/SHAMPO) and
      tunneled_offload_en).  When TIRs do not agree on attributes values, a "last
      one wins" policy is applied.  Hence, if not synced properly, a race between
      TIR params update and a concurrent TIR create/modify operation might yield
      to a mismatch between the shadow parameters in SW and the actual applied
      state of the RQs in HW.
      
      tunneled_offload_en is a fixed attribute per profile, while packet merge
      offload state might be toggled and get out-of-sync. When this happens,
      packet_merge offload might be working although not requested, or the
      opposite.
      
      All updates to packet_merge state and all create/modify operations of
      regular redirection/steering TIRs are done under the same priv->state_lock,
      so they do not run in parallel, and no race is possible.
      
      However, there are other kind of TIRs (acceleration offloads TIRs, like TLS
      TIRs) which are created on demand for each new connection without holding
      the coarse priv->state_lock, hence might race.
      
      Fix this by synchronizing all packet_merge state reads and writes against
      all TIR create/modify operations. Include the modify operations of the
      regular redirection steering TIRs under the new lock, for better code
      layering and division of responsibilities.
      
      Fixes: 1182f365
      
       ("net/mlx5e: kTLS, Add kTLS RX HW offload support")
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      4cce2ccf
    • Raed Salem's avatar
      net/mlx5e: Fix missing IPsec statistics on uplink representor · 51ebf5db
      Raed Salem authored
      The cited patch added the IPsec support to uplink representor, however
      as uplink representors have his private statistics where IPsec stats
      is not part of it, that effectively makes IPsec stats hidden when uplink
      representor stats queried.
      
      Resolve by adding IPsec stats to uplink representor private statistics.
      
      Fixes: 5589b8f1
      
       ("net/mlx5e: Add IPsec support to uplink representor")
      Signed-off-by: default avatarRaed Salem <raeds@nvidia.com>
      Reviewed-by: default avatarAlaa Hleihel <alaa@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      51ebf5db
    • Raed Salem's avatar
      net/mlx5e: IPsec: Fix Software parser inner l3 type setting in case of encapsulation · c65d638a
      Raed Salem authored
      Current code wrongly uses the skb->protocol field which reflects the
      outer l3 protocol to set the inner l3 type in Software Parser (SWP)
      fields settings in the ethernet segment (eseg) in flows where inner
      l3 exists like in Vxlan over ESP flow, the above method wrongly use
      the outer protocol type instead of the inner one. thus breaking cases
      where inner and outer headers have different protocols.
      
      Fix by setting the inner l3 type in SWP according to the inner l3 ip
      header version.
      
      Fixes: 2ac9cfe7
      
       ("net/mlx5e: IPSec, Add Innova IPSec offload TX data path")
      Signed-off-by: default avatarRaed Salem <raeds@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      c65d638a
    • Randy Dunlap's avatar
      natsemi: xtensa: fix section mismatch warnings · b0f38e15
      Randy Dunlap authored
      Fix section mismatch warnings in xtsonic. The first one appears to be
      bogus and after fixing the second one, the first one is gone.
      
      WARNING: modpost: vmlinux.o(.text+0x529adc): Section mismatch in reference from the function sonic_get_stats() to the function .init.text:set_reset_devices()
      The function sonic_get_stats() references
      the function __init set_reset_devices().
      This is often because sonic_get_stats lacks a __init
      annotation or the annotation of set_reset_devices is wrong.
      
      WARNING: modpost: vmlinux.o(.text+0x529b3b): Section mismatch in reference from the function xtsonic_probe() to the function .init.text:sonic_probe1()
      The function xtsonic_probe() references
      the function __init sonic_probe1().
      This is often because xtsonic_probe lacks a __init
      annotation or the annotation of sonic_probe1 is wrong.
      
      Fixes: 74f2a5f0
      
       ("xtensa: Add support for the Sonic Ethernet device for the XT2000 board.")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
      Cc: Finn Thain <fthain@telegraphics.com.au>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: linux-xtensa@linux-xtensa.org
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Acked-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Link: https://lore.kernel.org/r/20211130063947.7529-1-rdunlap@infradead.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b0f38e15
    • Harshit Mogalapalli's avatar
      net: netlink: af_netlink: Prevent empty skb by adding a check on len. · f123cffd
      Harshit Mogalapalli authored
      
      
      Adding a check on len parameter to avoid empty skb. This prevents a
      division error in netem_enqueue function which is caused when skb->len=0
      and skb->data_len=0 in the randomized corruption step as shown below.
      
      skb->data[prandom_u32() % skb_headlen(skb)] ^= 1<<(prandom_u32() % 8);
      
      Crash Report:
      [  343.170349] netdevsim netdevsim0 netdevsim3: set [1, 0] type 2 family
      0 port 6081 - 0
      [  343.216110] netem: version 1.3
      [  343.235841] divide error: 0000 [#1] PREEMPT SMP KASAN NOPTI
      [  343.236680] CPU: 3 PID: 4288 Comm: reproducer Not tainted 5.16.0-rc1+
      [  343.237569] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS 1.11.0-2.el7 04/01/2014
      [  343.238707] RIP: 0010:netem_enqueue+0x1590/0x33c0 [sch_netem]
      [  343.239499] Code: 89 85 58 ff ff ff e8 5f 5d e9 d3 48 8b b5 48 ff ff
      ff 8b 8d 50 ff ff ff 8b 85 58 ff ff ff 48 8b bd 70 ff ff ff 31 d2 2b 4f
      74 <f7> f1 48 b8 00 00 00 00 00 fc ff df 49 01 d5 4c 89 e9 48 c1 e9 03
      [  343.241883] RSP: 0018:ffff88800bcd7368 EFLAGS: 00010246
      [  343.242589] RAX: 00000000ba7c0a9c RBX: 0000000000000001 RCX:
      0000000000000000
      [  343.243542] RDX: 0000000000000000 RSI: ffff88800f8edb10 RDI:
      ffff88800f8eda40
      [  343.244474] RBP: ffff88800bcd7458 R08: 0000000000000000 R09:
      ffffffff94fb8445
      [  343.245403] R10: ffffffff94fb8336 R11: ffffffff94fb8445 R12:
      0000000000000000
      [  343.246355] R13: ffff88800a5a7000 R14: ffff88800a5b5800 R15:
      0000000000000020
      [  343.247291] FS:  00007fdde2bd7700(0000) GS:ffff888109780000(0000)
      knlGS:0000000000000000
      [  343.248350] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  343.249120] CR2: 00000000200000c0 CR3: 000000000ef4c000 CR4:
      00000000000006e0
      [  343.250076] Call Trace:
      [  343.250423]  <TASK>
      [  343.250713]  ? memcpy+0x4d/0x60
      [  343.251162]  ? netem_init+0xa0/0xa0 [sch_netem]
      [  343.251795]  ? __sanitizer_cov_trace_pc+0x21/0x60
      [  343.252443]  netem_enqueue+0xe28/0x33c0 [sch_netem]
      [  343.253102]  ? stack_trace_save+0x87/0xb0
      [  343.253655]  ? filter_irq_stacks+0xb0/0xb0
      [  343.254220]  ? netem_init+0xa0/0xa0 [sch_netem]
      [  343.254837]  ? __kasan_check_write+0x14/0x20
      [  343.255418]  ? _raw_spin_lock+0x88/0xd6
      [  343.255953]  dev_qdisc_enqueue+0x50/0x180
      [  343.256508]  __dev_queue_xmit+0x1a7e/0x3090
      [  343.257083]  ? netdev_core_pick_tx+0x300/0x300
      [  343.257690]  ? check_kcov_mode+0x10/0x40
      [  343.258219]  ? _raw_spin_unlock_irqrestore+0x29/0x40
      [  343.258899]  ? __kasan_init_slab_obj+0x24/0x30
      [  343.259529]  ? setup_object.isra.71+0x23/0x90
      [  343.260121]  ? new_slab+0x26e/0x4b0
      [  343.260609]  ? kasan_poison+0x3a/0x50
      [  343.261118]  ? kasan_unpoison+0x28/0x50
      [  343.261637]  ? __kasan_slab_alloc+0x71/0x90
      [  343.262214]  ? memcpy+0x4d/0x60
      [  343.262674]  ? write_comp_data+0x2f/0x90
      [  343.263209]  ? __kasan_check_write+0x14/0x20
      [  343.263802]  ? __skb_clone+0x5d6/0x840
      [  343.264329]  ? __sanitizer_cov_trace_pc+0x21/0x60
      [  343.264958]  dev_queue_xmit+0x1c/0x20
      [  343.265470]  netlink_deliver_tap+0x652/0x9c0
      [  343.266067]  netlink_unicast+0x5a0/0x7f0
      [  343.266608]  ? netlink_attachskb+0x860/0x860
      [  343.267183]  ? __sanitizer_cov_trace_pc+0x21/0x60
      [  343.267820]  ? write_comp_data+0x2f/0x90
      [  343.268367]  netlink_sendmsg+0x922/0xe80
      [  343.268899]  ? netlink_unicast+0x7f0/0x7f0
      [  343.269472]  ? __sanitizer_cov_trace_pc+0x21/0x60
      [  343.270099]  ? write_comp_data+0x2f/0x90
      [  343.270644]  ? netlink_unicast+0x7f0/0x7f0
      [  343.271210]  sock_sendmsg+0x155/0x190
      [  343.271721]  ____sys_sendmsg+0x75f/0x8f0
      [  343.272262]  ? kernel_sendmsg+0x60/0x60
      [  343.272788]  ? write_comp_data+0x2f/0x90
      [  343.273332]  ? write_comp_data+0x2f/0x90
      [  343.273869]  ___sys_sendmsg+0x10f/0x190
      [  343.274405]  ? sendmsg_copy_msghdr+0x80/0x80
      [  343.274984]  ? slab_post_alloc_hook+0x70/0x230
      [  343.275597]  ? futex_wait_setup+0x240/0x240
      [  343.276175]  ? security_file_alloc+0x3e/0x170
      [  343.276779]  ? write_comp_data+0x2f/0x90
      [  343.277313]  ? __sanitizer_cov_trace_pc+0x21/0x60
      [  343.277969]  ? write_comp_data+0x2f/0x90
      [  343.278515]  ? __fget_files+0x1ad/0x260
      [  343.279048]  ? __sanitizer_cov_trace_pc+0x21/0x60
      [  343.279685]  ? write_comp_data+0x2f/0x90
      [  343.280234]  ? __sanitizer_cov_trace_pc+0x21/0x60
      [  343.280874]  ? sockfd_lookup_light+0xd1/0x190
      [  343.281481]  __sys_sendmsg+0x118/0x200
      [  343.281998]  ? __sys_sendmsg_sock+0x40/0x40
      [  343.282578]  ? alloc_fd+0x229/0x5e0
      [  343.283070]  ? write_comp_data+0x2f/0x90
      [  343.283610]  ? write_comp_data+0x2f/0x90
      [  343.284135]  ? __sanitizer_cov_trace_pc+0x21/0x60
      [  343.284776]  ? ktime_get_coarse_real_ts64+0xb8/0xf0
      [  343.285450]  __x64_sys_sendmsg+0x7d/0xc0
      [  343.285981]  ? syscall_enter_from_user_mode+0x4d/0x70
      [  343.286664]  do_syscall_64+0x3a/0x80
      [  343.287158]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  343.287850] RIP: 0033:0x7fdde24cf289
      [  343.288344] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00
      48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
      05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b7 db 2c 00 f7 d8 64 89 01 48
      [  343.290729] RSP: 002b:00007fdde2bd6d98 EFLAGS: 00000246 ORIG_RAX:
      000000000000002e
      [  343.291730] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
      00007fdde24cf289
      [  343.292673] RDX: 0000000000000000 RSI: 00000000200000c0 RDI:
      0000000000000004
      [  343.293618] RBP: 00007fdde2bd6e20 R08: 0000000100000001 R09:
      0000000000000000
      [  343.294557] R10: 0000000100000001 R11: 0000000000000246 R12:
      0000000000000000
      [  343.295493] R13: 0000000000021000 R14: 0000000000000000 R15:
      00007fdde2bd7700
      [  343.296432]  </TASK>
      [  343.296735] Modules linked in: sch_netem ip6_vti ip_vti ip_gre ipip
      sit ip_tunnel geneve macsec macvtap tap ipvlan macvlan 8021q garp mrp
      hsr wireguard libchacha20poly1305 chacha_x86_64 poly1305_x86_64
      ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic
      curve25519_x86_64 libcurve25519_generic libchacha xfrm_interface
      xfrm6_tunnel tunnel4 veth netdevsim psample batman_adv nlmon dummy team
      bonding tls vcan ip6_gre ip6_tunnel tunnel6 gre tun ip6t_rpfilter
      ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set
      ebtable_nat ebtable_broute ip6table_nat ip6table_mangle
      ip6table_security ip6table_raw iptable_nat nf_nat nf_conntrack
      nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_security
      iptable_raw ebtable_filter ebtables rfkill ip6table_filter ip6_tables
      iptable_filter ppdev bochs drm_vram_helper drm_ttm_helper ttm
      drm_kms_helper cec parport_pc drm joydev floppy parport sg syscopyarea
      sysfillrect sysimgblt i2c_piix4 qemu_fw_cfg fb_sys_fops pcspkr
      [  343.297459]  ip_tables xfs virtio_net net_failover failover sd_mod
      sr_mod cdrom t10_pi ata_generic pata_acpi ata_piix libata virtio_pci
      virtio_pci_legacy_dev serio_raw virtio_pci_modern_dev dm_mirror
      dm_region_hash dm_log dm_mod
      [  343.311074] Dumping ftrace buffer:
      [  343.311532]    (ftrace buffer empty)
      [  343.312040] ---[ end trace a2e3db5a6ae05099 ]---
      [  343.312691] RIP: 0010:netem_enqueue+0x1590/0x33c0 [sch_netem]
      [  343.313481] Code: 89 85 58 ff ff ff e8 5f 5d e9 d3 48 8b b5 48 ff ff
      ff 8b 8d 50 ff ff ff 8b 85 58 ff ff ff 48 8b bd 70 ff ff ff 31 d2 2b 4f
      74 <f7> f1 48 b8 00 00 00 00 00 fc ff df 49 01 d5 4c 89 e9 48 c1 e9 03
      [  343.315893] RSP: 0018:ffff88800bcd7368 EFLAGS: 00010246
      [  343.316622] RAX: 00000000ba7c0a9c RBX: 0000000000000001 RCX:
      0000000000000000
      [  343.317585] RDX: 0000000000000000 RSI: ffff88800f8edb10 RDI:
      ffff88800f8eda40
      [  343.318549] RBP: ffff88800bcd7458 R08: 0000000000000000 R09:
      ffffffff94fb8445
      [  343.319503] R10: ffffffff94fb8336 R11: ffffffff94fb8445 R12:
      0000000000000000
      [  343.320455] R13: ffff88800a5a7000 R14: ffff88800a5b5800 R15:
      0000000000000020
      [  343.321414] FS:  00007fdde2bd7700(0000) GS:ffff888109780000(0000)
      knlGS:0000000000000000
      [  343.322489] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  343.323283] CR2: 00000000200000c0 CR3: 000000000ef4c000 CR4:
      00000000000006e0
      [  343.324264] Kernel panic - not syncing: Fatal exception in interrupt
      [  343.333717] Dumping ftrace buffer:
      [  343.334175]    (ftrace buffer empty)
      [  343.334653] Kernel Offset: 0x13600000 from 0xffffffff81000000
      (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      [  343.336027] Rebooting in 86400 seconds..
      
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Signed-off-by: default avatarHarshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
      Link: https://lore.kernel.org/r/20211129175328.55339-1-harshit.m.mogalapalli@oracle.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f123cffd
  3. Nov 30, 2021
    • Karsten Graul's avatar
      MAINTAINERS: s390/net: add Alexandra and Wenjia as maintainer · 34d8778a
      Karsten Graul authored
      
      
      Add Alexandra and Wenjia as maintainers for drivers/s390/net and iucv.
      Also, remove myself as maintainer for these areas.
      
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Acked-by: default avatarAlexandra Winter <wintera@linux.ibm.com>
      Acked-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34d8778a
    • Dongliang Mu's avatar
      dpaa2-eth: destroy workqueue at the end of remove function · f4a8adbf
      Dongliang Mu authored
      The commit c5521189 ("dpaa2-eth: support PTP Sync packet one-step
      timestamping") forgets to destroy workqueue at the end of remove
      function.
      
      Fix this by adding destroy_workqueue before fsl_mc_portal_free and
      free_netdev.
      
      Fixes: c5521189
      
       ("dpaa2-eth: support PTP Sync packet one-step timestamping")
      Signed-off-by: default avatarDongliang Mu <mudongliangabcd@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4a8adbf
    • Maciej Fijalkowski's avatar
      ice: xsk: clear status_error0 for each allocated desc · d1ec975f
      Maciej Fijalkowski authored
      Fix a bug in which the receiving of packets can stop in the zero-copy
      driver. Ice HW ignores 3 lower bits from QRX_TAIL register, which means
      that tail is bumped only on intervals of 8. Currently with XSK RX
      batching in place, ice_alloc_rx_bufs_zc() clears the status_error0 only
      of the last descriptor that has been allocated/taken from the XSK buffer
      pool. status_error0 includes DD bit that is looked upon by the
      ice_clean_rx_irq_zc() to tell if a descriptor can be processed.
      
      The bug can be triggered when driver updates the ntu but not the
      QRX_TAIL, so HW wouldn't have a chance to write to the ready
      descriptors. Later on driver moves the ntc to the mentioned set of
      descriptors and interprets them as a ready to be processed, since
      corresponding DD bits were not cleared nor any writeback has happened
      that would clear it. This can then lead to ntc == ntu case which means
      that ring is empty and no further packet processing.
      
      Fix the XSK traffic hang that can be observed when l2fwd scenario from
      xdpsock is used by making sure that status_error0 is cleared for each
      descriptor that is fed to HW and therefore we are sure that driver will
      not processed non-valid DD bits. This will also prevent the driver from
      processing the descriptors that were allocated in favor of the
      previously processed ones, but writeback didn't happen yet.
      
      Fixes: db804cfc
      
       ("ice: Use the xsk batched rx allocation interface")
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Reviewed-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1ec975f
    • Christophe JAILLET's avatar
      net: marvell: mvpp2: Fix the computation of shared CPUs · b83f5ac7
      Christophe JAILLET authored
      'bitmap_fill()' fills a bitmap one 'long' at a time.
      It is likely that an exact number of bits is expected.
      
      Use 'bitmap_set()' instead in order not to set unexpected bits.
      
      Fixes: e531f767
      
       ("net: mvpp2: handle cases where more CPUs are available than s/w threads")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b83f5ac7
    • Wei Yongjun's avatar
      net: mscc: ocelot: fix missing unlock on error in ocelot_hwstamp_set() · 1a59c9c5
      Wei Yongjun authored
      Add the missing mutex_unlock before return from function
      ocelot_hwstamp_set() in the ocelot_setup_ptp_traps() error
      handling case.
      
      Fixes: 96ca08c0
      
       ("net: mscc: ocelot: set up traps for PTP packets")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20211129151652.1165433-1-weiyongjun1@huawei.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1a59c9c5
    • Jakub Kicinski's avatar
      Merge tag 'rxrpc-fixes-20211129' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 5fdc2333
      Jakub Kicinski authored
      
      
      David Howells says:
      
      ====================
      rxrpc: Leak fixes
      
      Here are a couple of fixes for leaks in AF_RXRPC:
      
       (1) Fix a leak of rxrpc_peer structs in rxrpc_look_up_bundle().
       (2) Fix a leak of rxrpc_local structs in rxrpc_lookup_peer().
      
      * tag 'rxrpc-fixes-20211129' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        rxrpc: Fix rxrpc_local leak in rxrpc_lookup_peer()
        rxrpc: Fix rxrpc_peer leak in rxrpc_look_up_bundle()
      ====================
      
      Link: https://lore.kernel.org/r/163820097905.226370.17234085194655347888.stgit@warthog.procyon.org.uk
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5fdc2333
    • Jakub Kicinski's avatar
      Merge branch 'wireguard-siphash-patches-for-5-16-rc6' · cbd92e7d
      Jakub Kicinski authored
      
      
      Jason A. Donenfeld says:
      
      ====================
      wireguard/siphash patches for 5.16-rc
      
      Here's quite a largeish set of stable patches I've had queued up and
      testing for a number of months now:
      
        - Patch (1) squelches a sparse warning by fixing an annotation.
        - Patches (2), (3), and (5) are minor improvements and fixes to the
          test suite.
        - Patch (4) is part of a tree-wide cleanup to have module-specific
          init and exit functions.
        - Patch (6) fixes a an issue with dangling dst references, by having a
          function to release references immediately rather than deferring,
          and adds an associated test case to prevent this from regressing.
        - Patches (7) and (8) help mitigate somewhat a potential DoS on the
          ingress path due to the use of skb_list's locking hitting contention
          on multiple cores by switching to using a ring buffer and dropping
          packets on contention rather than locking up another core spinning.
        - Patch (9) switches kvzalloc to kvcalloc for better form.
        - Patch (10) fixes alignment traps in siphash with clang-13 (and maybe
          other compilers) on armv6, by switching to using the unaligned
          functions by default instead of the aligned functions by default.
      ====================
      
      Link: https://lore.kernel.org/r/20211129153929.3457-1-Jason@zx2c4.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cbd92e7d
    • Arnd Bergmann's avatar
      siphash: use _unaligned version by default · f7e5b9bf
      Arnd Bergmann authored
      
      
      On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
      because the ordinary load/store instructions (ldr, ldrh, ldrb) can
      tolerate any misalignment of the memory address. However, load/store
      double and load/store multiple instructions (ldrd, ldm) may still only
      be used on memory addresses that are 32-bit aligned, and so we have to
      use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we
      may end up with a severe performance hit due to alignment traps that
      require fixups by the kernel. Testing shows that this currently happens
      with clang-13 but not gcc-11. In theory, any compiler version can
      produce this bug or other problems, as we are dealing with undefined
      behavior in C99 even on architectures that support this in hardware,
      see also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363.
      
      Fortunately, the get_unaligned() accessors do the right thing: when
      building for ARMv6 or later, the compiler will emit unaligned accesses
      using the ordinary load/store instructions (but avoid the ones that
      require 32-bit alignment). When building for older ARM, those accessors
      will emit the appropriate sequence of ldrb/mov/orr instructions. And on
      architectures that can truly tolerate any kind of misalignment, the
      get_unaligned() accessors resolve to the leXX_to_cpup accessors that
      operate on aligned addresses.
      
      Since the compiler will in fact emit ldrd or ldm instructions when
      building this code for ARM v6 or later, the solution is to use the
      unaligned accessors unconditionally on architectures where this is
      known to be fast. The _aligned version of the hash function is
      however still needed to get the best performance on architectures
      that cannot do any unaligned access in hardware.
      
      This new version avoids the undefined behavior and should produce
      the fastest hash on all architectures we support.
      
      Link: https://lore.kernel.org/linux-arm-kernel/20181008211554.5355-4-ard.biesheuvel@linaro.org/
      Link: https://lore.kernel.org/linux-crypto/CAK8P3a2KfmmGDbVHULWevB0hv71P2oi2ZCHEAqT=8dQfa0=cqQ@mail.gmail.com/
      Reported-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Fixes: 2c956a60
      
       ("siphash: add cryptographically secure PRF")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Acked-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f7e5b9bf
    • Gustavo A. R. Silva's avatar
      wireguard: ratelimiter: use kvcalloc() instead of kvzalloc() · 4e3fd721
      Gustavo A. R. Silva authored
      Use 2-factor argument form kvcalloc() instead of kvzalloc().
      
      Link: https://github.com/KSPP/linux/issues/162
      Fixes: e7096c13
      
       ("net: WireGuard secure network tunnel")
      Signed-off-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      [Jason: Gustavo's link above is for KSPP, but this isn't actually a
       security fix, as table_size is bounded to 8192 anyway, and gcc realizes
       this, so the codegen comes out to be about the same.]
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4e3fd721