Skip to content
  1. Feb 12, 2022
    • Alexei Starovoitov's avatar
      Merge branch 'bpf: fix a bpf_timer initialization issue' · 3df9d803
      Alexei Starovoitov authored
      Yonghong Song says:
      
      ====================
      
      The patch [1] exposed a bpf_timer initialization bug in function
      check_and_init_map_value(). With bug fix here, the patch [1]
      can be applied with all selftests passed. Please see individual
      patches for fix details.
      
        [1] https://lore.kernel.org/bpf/20220209070324.1093182-2-memxor@gmail.com/
      
      
      
      Changelog:
        v3 -> v4:
          . move header file in patch #1 to avoid bpf-next merge conflict
        v2 -> v3:
          . switch patch #1 and patch #2 for better bisecting
        v1 -> v2:
          . add Fixes tag for patch #1
          . rebase against bpf tree
      ====================
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      3df9d803
    • Yonghong Song's avatar
      bpf: Fix a bpf_timer initialization issue · 5eaed6ee
      Yonghong Song authored
      The patch in [1] intends to fix a bpf_timer related issue,
      but the fix caused existing 'timer' selftest to fail with
      hang or some random errors. After some debug, I found
      an issue with check_and_init_map_value() in the hashtab.c.
      More specifically, in hashtab.c, we have code
        l_new = bpf_map_kmalloc_node(&htab->map, ...)
        check_and_init_map_value(&htab->map, l_new...)
      Note that bpf_map_kmalloc_node() does not do initialization
      so l_new contains random value.
      
      The function check_and_init_map_value() intends to zero the
      bpf_spin_lock and bpf_timer if they exist in the map.
      But I found bpf_spin_lock is zero'ed but bpf_timer is not zero'ed.
      With [1], later copy_map_value() skips copying of
      bpf_spin_lock and bpf_timer. The non-zero bpf_timer caused
      random failures for 'timer' selftest.
      Without [1], for both bpf_spin_lock and bpf_timer case,
      bpf_timer will be zero'ed, so 'timer' self test is okay.
      
      For check_and_init_map_value(), why bpf_spin_lock is zero'ed
      properly while bpf_timer not. In bpf uapi header, we have
        struct bpf_spin_lock {
              __u32   val;
        };
        struct bpf_timer {
              __u64 :64;
              __u64 :64;
        } __attribute__((aligned(8)));
      
      The initialization code:
        *(struct bpf_spin_lock *)(dst + map->spin_lock_off) =
            (struct bpf_spin_lock){};
        *(struct bpf_timer *)(dst + map->timer_off) =
            (struct bpf_timer){};
      It appears the compiler has no obligation to initialize anonymous fields.
      For example, let us use clang with bpf target as below:
        $ cat t.c
        struct bpf_timer {
              unsigned long long :64;
        };
        struct bpf_timer2 {
              unsigned long long a;
        };
      
        void test(struct bpf_timer *t) {
          *t = (struct bpf_timer){};
        }
        void test2(struct bpf_timer2 *t) {
          *t = (struct bpf_timer2){};
        }
        $ clang -target bpf -O2 -c -g t.c
        $ llvm-objdump -d t.o
         ...
         0000000000000000 <test>:
             0:       95 00 00 00 00 00 00 00 exit
         0000000000000008 <test2>:
             1:       b7 02 00 00 00 00 00 00 r2 = 0
             2:       7b 21 00 00 00 00 00 00 *(u64 *)(r1 + 0) = r2
             3:       95 00 00 00 00 00 00 00 exit
      
      gcc11.2 does not have the above issue. But from
        INTERNATIONAL STANDARD ©ISO/IEC ISO/IEC 9899:201x
        Programming languages — C
        http://www.open-std.org/Jtc1/sc22/wg14/www/docs/n1547.pdf
        page 157:
        Except where explicitly stated otherwise, for the purposes of
        this subclause unnamed members of objects of structure and union
        type do not participate in initialization. Unnamed members of
        structure objects have indeterminate value even after initialization.
      
      To fix the problem, let use memset for bpf_timer case in
      check_and_init_map_value(). For consistency, memset is also
      used for bpf_spin_lock case.
      
        [1] https://lore.kernel.org/bpf/20220209070324.1093182-2-memxor@gmail.com/
      
      Fixes: 68134668
      
       ("bpf: Add map side support for bpf timers.")
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220211194953.3142152-1-yhs@fb.com
      5eaed6ee
    • Yonghong Song's avatar
      bpf: Emit bpf_timer in vmlinux BTF · 3bd916ee
      Yonghong Song authored
      
      
      Currently the following code in check_and_init_map_value()
        *(struct bpf_timer *)(dst + map->timer_off) =
            (struct bpf_timer){};
      can help generate bpf_timer definition in vmlinuxBTF.
      But the code above may not zero the whole structure
      due to anonymour members and that code will be replaced
      by memset in the subsequent patch and
      bpf_timer definition will disappear from vmlinuxBTF.
      Let us emit the type explicitly so bpf program can continue
      to use it from vmlinux.h.
      
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220211194948.3141529-1-yhs@fb.com
      3bd916ee
    • Alexei Starovoitov's avatar
      Merge branch 'Fix for crash due to overwrite in copy_map_value' · acc3c473
      Alexei Starovoitov authored
      Kumar Kartikeya says:
      
      ====================
      
      A fix for an oversight in copy_map_value that leads to kernel crash.
      
      Also, a question for BPF developers:
      It seems in arraymap.c, we always do check_and_free_timer_in_array after we do
      copy_map_value in map_update_elem callback, but the same is not done for
      hashtab.c. Is there a specific reason for this difference in behavior, or did I
      miss that it happens for hashtab.c as well?
      
      Changlog:
      ---------
      v1 -> v2:
      v1: https://lore.kernel.org/bpf/20220209051113.870717-1-memxor@gmail.com
      
      
      
       * Fix build error for selftests patch due to missing SYS_PREFIX in bpf tree
      ====================
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      acc3c473
    • Kumar Kartikeya Dwivedi's avatar
      selftests/bpf: Add test for bpf_timer overwriting crash · a7e75016
      Kumar Kartikeya Dwivedi authored
      
      
      Add a test that validates that timer value is not overwritten when doing
      a copy_map_value call in the kernel. Without the prior fix, this test
      triggers a crash.
      
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220209070324.1093182-3-memxor@gmail.com
      a7e75016
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Fix crash due to incorrect copy_map_value · a8abb0c3
      Kumar Kartikeya Dwivedi authored
      When both bpf_spin_lock and bpf_timer are present in a BPF map value,
      copy_map_value needs to skirt both objects when copying a value into and
      out of the map. However, the current code does not set both s_off and
      t_off in copy_map_value, which leads to a crash when e.g. bpf_spin_lock
      is placed in map value with bpf_timer, as bpf_map_update_elem call will
      be able to overwrite the other timer object.
      
      When the issue is not fixed, an overwriting can produce the following
      splat:
      
      [root@(none) bpf]# ./test_progs -t timer_crash
      [   15.930339] bpf_testmod: loading out-of-tree module taints kernel.
      [   16.037849] ==================================================================
      [   16.038458] BUG: KASAN: user-memory-access in __pv_queued_spin_lock_slowpath+0x32b/0x520
      [   16.038944] Write of size 8 at addr 0000000000043ec0 by task test_progs/325
      [   16.039399]
      [   16.039514] CPU: 0 PID: 325 Comm: test_progs Tainted: G           OE     5.16.0+ #278
      [   16.039983] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
      [   16.040485] Call Trace:
      [   16.040645]  <TASK>
      [   16.040805]  dump_stack_lvl+0x59/0x73
      [   16.041069]  ? __pv_queued_spin_lock_slowpath+0x32b/0x520
      [   16.041427]  kasan_report.cold+0x116/0x11b
      [   16.041673]  ? __pv_queued_spin_lock_slowpath+0x32b/0x520
      [   16.042040]  __pv_queued_spin_lock_slowpath+0x32b/0x520
      [   16.042328]  ? memcpy+0x39/0x60
      [   16.042552]  ? pv_hash+0xd0/0xd0
      [   16.042785]  ? lockdep_hardirqs_off+0x95/0xd0
      [   16.043079]  __bpf_spin_lock_irqsave+0xdf/0xf0
      [   16.043366]  ? bpf_get_current_comm+0x50/0x50
      [   16.043608]  ? jhash+0x11a/0x270
      [   16.043848]  bpf_timer_cancel+0x34/0xe0
      [   16.044119]  bpf_prog_c4ea1c0f7449940d_sys_enter+0x7c/0x81
      [   16.044500]  bpf_trampoline_6442477838_0+0x36/0x1000
      [   16.044836]  __x64_sys_nanosleep+0x5/0x140
      [   16.045119]  do_syscall_64+0x59/0x80
      [   16.045377]  ? lock_is_held_type+0xe4/0x140
      [   16.045670]  ? irqentry_exit_to_user_mode+0xa/0x40
      [   16.046001]  ? mark_held_locks+0x24/0x90
      [   16.046287]  ? asm_exc_page_fault+0x1e/0x30
      [   16.046569]  ? asm_exc_page_fault+0x8/0x30
      [   16.046851]  ? lockdep_hardirqs_on+0x7e/0x100
      [   16.047137]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [   16.047405] RIP: 0033:0x7f9e4831718d
      [   16.047602] Code: b4 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b3 6c 0c 00 f7 d8 64 89 01 48
      [   16.048764] RSP: 002b:00007fff488086b8 EFLAGS: 00000206 ORIG_RAX: 0000000000000023
      [   16.049275] RAX: ffffffffffffffda RBX: 00007f9e48683740 RCX: 00007f9e4831718d
      [   16.049747] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007fff488086d0
      [   16.050225] RBP: 00007fff488086f0 R08: 00007fff488085d7 R09: 00007f9e4cb594a0
      [   16.050648] R10: 0000000000000000 R11: 0000000000000206 R12: 00007f9e484cde30
      [   16.051124] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      [   16.051608]  </TASK>
      [   16.051762] ==================================================================
      
      Fixes: 68134668
      
       ("bpf: Add map side support for bpf timers.")
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220209070324.1093182-2-memxor@gmail.com
      a8abb0c3
  2. Feb 11, 2022
    • Felix Maurer's avatar
      bpf: Do not try bpf_msg_push_data with len 0 · 4a11678f
      Felix Maurer authored
      If bpf_msg_push_data() is called with len 0 (as it happens during
      selftests/bpf/test_sockmap), we do not need to do anything and can
      return early.
      
      Calling bpf_msg_push_data() with len 0 previously lead to a wrong ENOMEM
      error: we later called get_order(copy + len); if len was 0, copy + len
      was also often 0 and get_order() returned some undefined value (at the
      moment 52). alloc_pages() caught that and failed, but then bpf_msg_push_data()
      returned ENOMEM. This was wrong because we are most probably not out of
      memory and actually do not need any additional memory.
      
      Fixes: 6fff607e
      
       ("bpf: sk_msg program helper bpf_msg_push_data")
      Signed-off-by: default avatarFelix Maurer <fmaurer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/df69012695c7094ccb1943ca02b4920db3537466.1644421921.git.fmaurer@redhat.com
      4a11678f
    • David S. Miller's avatar
      Merge ra.kernel.org:/pub/scm/linux/kernel/git/netfilter/nf · 525de9a7
      David S. Miller authored
      
      
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Add selftest for nft_synproxy, from Florian Westphal.
      
      2) xt_socket destroy path incorrectly disables IPv4 defrag for
         IPv6 traffic (typo), from Eric Dumazet.
      
      3) Fix exit value selftest nft_concat_range.sh, from Hangbin Liu.
      
      4) nft_synproxy disables the IPv4 hooks if the IPv6 hooks fail
         to be registered.
      
      5) disable rp_filter on router in selftest nft_fib.sh, also
         from Hangbin Liu.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      525de9a7
    • Eric Dumazet's avatar
      drop_monitor: fix data-race in dropmon_net_event / trace_napi_poll_hit · dcd54265
      Eric Dumazet authored
      trace_napi_poll_hit() is reading stat->dev while another thread can write
      on it from dropmon_net_event()
      
      Use READ_ONCE()/WRITE_ONCE() here, RCU rules are properly enforced already,
      we only have to take care of load/store tearing.
      
      BUG: KCSAN: data-race in dropmon_net_event / trace_napi_poll_hit
      
      write to 0xffff88816f3ab9c0 of 8 bytes by task 20260 on cpu 1:
       dropmon_net_event+0xb8/0x2b0 net/core/drop_monitor.c:1579
       notifier_call_chain kernel/notifier.c:84 [inline]
       raw_notifier_call_chain+0x53/0xb0 kernel/notifier.c:392
       call_netdevice_notifiers_info net/core/dev.c:1919 [inline]
       call_netdevice_notifiers_extack net/core/dev.c:1931 [inline]
       call_netdevice_notifiers net/core/dev.c:1945 [inline]
       unregister_netdevice_many+0x867/0xfb0 net/core/dev.c:10415
       ip_tunnel_delete_nets+0x24a/0x280 net/ipv4/ip_tunnel.c:1123
       vti_exit_batch_net+0x2a/0x30 net/ipv4/ip_vti.c:515
       ops_exit_list net/core/net_namespace.c:173 [inline]
       cleanup_net+0x4dc/0x8d0 net/core/net_namespace.c:597
       process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
       worker_thread+0x616/0xa70 kernel/workqueue.c:2454
       kthread+0x1bf/0x1e0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30
      
      read to 0xffff88816f3ab9c0 of 8 bytes by interrupt on cpu 0:
       trace_napi_poll_hit+0x89/0x1c0 net/core/drop_monitor.c:292
       trace_napi_poll include/trace/events/napi.h:14 [inline]
       __napi_poll+0x36b/0x3f0 net/core/dev.c:6366
       napi_poll net/core/dev.c:6432 [inline]
       net_rx_action+0x29e/0x650 net/core/dev.c:6519
       __do_softirq+0x158/0x2de kernel/softirq.c:558
       do_softirq+0xb1/0xf0 kernel/softirq.c:459
       __local_bh_enable_ip+0x68/0x70 kernel/softirq.c:383
       __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline]
       _raw_spin_unlock_bh+0x33/0x40 kernel/locking/spinlock.c:210
       spin_unlock_bh include/linux/spinlock.h:394 [inline]
       ptr_ring_consume_bh include/linux/ptr_ring.h:367 [inline]
       wg_packet_decrypt_worker+0x73c/0x780 drivers/net/wireguard/receive.c:506
       process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
       worker_thread+0x616/0xa70 kernel/workqueue.c:2454
       kthread+0x1bf/0x1e0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30
      
      value changed: 0xffff88815883e000 -> 0x0000000000000000
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 26435 Comm: kworker/0:1 Not tainted 5.17.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: wg-crypt-wg2 wg_packet_decrypt_worker
      
      Fixes: 4ea7e386
      
       ("dropmon: add ability to detect when hardware dropsrxpackets")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dcd54265
    • Wen Gu's avatar
      net/smc: Avoid overwriting the copies of clcsock callback functions · 1de9770d
      Wen Gu authored
      
      
      The callback functions of clcsock will be saved and replaced during
      the fallback. But if the fallback happens more than once, then the
      copies of these callback functions will be overwritten incorrectly,
      resulting in a loop call issue:
      
      clcsk->sk_error_report
       |- smc_fback_error_report() <------------------------------|
           |- smc_fback_forward_wakeup()                          | (loop)
               |- clcsock_callback()  (incorrectly overwritten)   |
                   |- smc->clcsk_error_report() ------------------|
      
      So this patch fixes the issue by saving these function pointers only
      once in the fallback and avoiding overwriting.
      
      Reported-by: default avatar <syzbot+4de3c0e8a263e1e499bc@syzkaller.appspotmail.com>
      Fixes: 341adeec ("net/smc: Forward wakeup to smc socket waitqueue after fallback")
      Link: https://lore.kernel.org/r/0000000000006d045e05d78776f6@google.com
      
      
      Signed-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1de9770d
    • Linus Torvalds's avatar
      Merge tag 'net-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · f1baf68e
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from netfilter and can.
      
      Current release - new code bugs:
      
         - sparx5: fix get_stat64 out-of-bound access and crash
      
         - smc: fix netdev ref tracker misuse
      
        Previous releases - regressions:
      
         - eth: ixgbevf: require large buffers for build_skb on 82599VF, avoid
           overflows
      
         - eth: ocelot: fix all IP traffic getting trapped to CPU with PTP
           over IP
      
         - bonding: fix rare link activation misses in 802.3ad mode
      
        Previous releases - always broken:
      
         - tcp: fix tcp sock mem accounting in zero-copy corner cases
      
         - remove the cached dst when uncloning an skb dst and its metadata,
           since we only have one ref it'd lead to an UaF
      
         - netfilter:
            - conntrack: don't refresh sctp entries in closed state
            - conntrack: re-init state for retransmitted syn-ack, avoid
              connection establishment getting stuck with strange stacks
            - ctnetlink: disable helper autoassign, avoid it getting lost
            - nft_payload: don't allow transport header access for fragments
      
         - dsa: fix use of devres for mdio throughout drivers
      
         - eth: amd-xgbe: disable interrupts during pci removal
      
         - eth: dpaa2-eth: unregister netdev before disconnecting the PHY
      
         - eth: ice: fix IPIP and SIT TSO offload"
      
      * tag 'net-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits)
        net: dsa: mv88e6xxx: fix use-after-free in mv88e6xxx_mdios_unregister
        net: mscc: ocelot: fix mutex lock error during ethtool stats read
        ice: Avoid RTNL lock when re-creating auxiliary device
        ice: Fix KASAN error in LAG NETDEV_UNREGISTER handler
        ice: fix IPIP and SIT TSO offload
        ice: fix an error code in ice_cfg_phy_fec()
        net: mpls: Fix GCC 12 warning
        dpaa2-eth: unregister the netdev before disconnecting from the PHY
        skbuff: cleanup double word in comment
        net: macb: Align the dma and coherent dma masks
        mptcp: netlink: process IPv6 addrs in creating listening sockets
        selftests: mptcp: add missing join check
        net: usb: qmi_wwan: Add support for Dell DW5829e
        vlan: move dev_put into vlan_dev_uninit
        vlan: introduce vlan_dev_free_egress_priority
        ax25: fix UAF bugs of net_device caused by rebinding operation
        net: dsa: fix panic when DSA master device unbinds on shutdown
        net: amd-xgbe: disable interrupts during pci removal
        tipc: rate limit warning for received illegal binding update
        net: mdio: aspeed: Add missing MODULE_DEVICE_TABLE
        ...
      f1baf68e
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-fixes-5.17-rc4' of... · 16f7432c
      Linus Torvalds authored
      Merge tag 'linux-kselftest-fixes-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull Kselftest fixes from Shuah Khan:
       "Build and run-time fixes to pidfd, clone3, and ir tests"
      
      * tag 'linux-kselftest-fixes-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        selftests/ir: fix build with ancient kernel headers
        selftests: fixup build warnings in pidfd / clone3 tests
        pidfd: fix test failure due to stack overflow on some arches
      16f7432c
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-kunit-fixes-5.17-rc4' of... · ff008548
      Linus Torvalds authored
      Merge tag 'linux-kselftest-kunit-fixes-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull KUnit fixes from Shuah Khan:
       "Fixes to the test and usage documentation"
      
      * tag 'linux-kselftest-kunit-fixes-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        Documentation: KUnit: Fix usage bug
        kunit: fix missing f in f-string in run_checks.py
      ff008548
    • Hangbin Liu's avatar
      selftests: netfilter: disable rp_filter on router · bbe4c089
      Hangbin Liu authored
      Some distros may enable rp_filter by default. After ns1 change addr to
      10.0.2.99 and set default router to 10.0.2.1, while the connected router
      address is still 10.0.1.1. The router will not reply the arp request
      from ns1. Fix it by setting the router's veth0 rp_filter to 0.
      
      Before the fix:
        # ./nft_fib.sh
        PASS: fib expression did not cause unwanted packet drops
        Netns nsrouter-HQkDORO2 fib counter doesn't match expected packet count of 1 for 1.1.1.1
        table inet filter {
                chain prerouting {
                        type filter hook prerouting priority filter; policy accept;
                        ip daddr 1.1.1.1 fib saddr . iif oif missing counter packets 0 bytes 0 drop
                        ip6 daddr 1c3::c01d fib saddr . iif oif missing counter packets 0 bytes 0 drop
                }
        }
      
      After the fix:
        # ./nft_fib.sh
        PASS: fib expression did not cause unwanted packet drops
        PASS: fib expression did drop packets for 1.1.1.1
        PASS: fib expression did drop packets for 1c3::c01d
      
      Fixes: 82944421
      
       ("selftests: netfilter: add fib test case")
      Signed-off-by: default avatarYi Chen <yiche@redhat.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      bbe4c089
    • Vladimir Oltean's avatar
      net: dsa: mv88e6xxx: fix use-after-free in mv88e6xxx_mdios_unregister · 51a04ebf
      Vladimir Oltean authored
      Since struct mv88e6xxx_mdio_bus *mdio_bus is the bus->priv of something
      allocated with mdiobus_alloc_size(), this means that mdiobus_free(bus)
      will free the memory backing the mdio_bus as well. Therefore, the
      mdio_bus->list element is freed memory, but we continue to iterate
      through the list of MDIO buses using that list element.
      
      To fix this, use the proper list iterator that handles element deletion
      by keeping a copy of the list element next pointer.
      
      Fixes: f53a2ce8
      
       ("net: dsa: mv88e6xxx: don't use devres for mdiobus")
      Reported-by: default avatarRafael Richter <rafael.richter@gin.de>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20220210174017.3271099-1-vladimir.oltean@nxp.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      51a04ebf
    • Jakub Kicinski's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · a19f7d7d
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2022-02-10
      
      Dan Carpenter propagates an error in FEC configuration.
      
      Jesse fixes TSO offloads of IPIP and SIT frames.
      
      Dave adds a dedicated LAG unregister function to resolve a KASAN error
      and moves auxiliary device re-creation after LAG removal to the service
      task to avoid issues with RTNL lock.
      
      * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        ice: Avoid RTNL lock when re-creating auxiliary device
        ice: Fix KASAN error in LAG NETDEV_UNREGISTER handler
        ice: fix IPIP and SIT TSO offload
        ice: fix an error code in ice_cfg_phy_fec()
      ====================
      
      Link: https://lore.kernel.org/r/20220210170515.2609656-1-anthony.l.nguyen@intel.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a19f7d7d
    • Colin Foster's avatar
      net: mscc: ocelot: fix mutex lock error during ethtool stats read · 7fbf6795
      Colin Foster authored
      
      
      An ongoing workqueue populates the stats buffer. At the same time, a user
      might query the statistics. While writing to the buffer is mutex-locked,
      reading from the buffer wasn't. This could lead to buggy reads by ethtool.
      
      This patch fixes the former blamed commit, but the bug was introduced in
      the latter.
      
      Signed-off-by: default avatarColin Foster <colin.foster@in-advantage.com>
      Fixes: 1e1caa97 ("ocelot: Clean up stats update deferred work")
      Fixes: a556c76a
      
       ("net: mscc: Add initial Ocelot switch support")
      Reported-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/all/20220210150451.416845-2-colin.foster@in-advantage.com/
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7fbf6795
    • Dave Ertman's avatar
      ice: Avoid RTNL lock when re-creating auxiliary device · 5dbbbd01
      Dave Ertman authored
      If a call to re-create the auxiliary device happens in a context that has
      already taken the RTNL lock, then the call flow that recreates auxiliary
      device can hang if there is another attempt to claim the RTNL lock by the
      auxiliary driver.
      
      To avoid this, any call to re-create auxiliary devices that comes from
      an source that is holding the RTNL lock (e.g. netdev notifier when
      interface exits a bond) should execute in a separate thread.  To
      accomplish this, add a flag to the PF that will be evaluated in the
      service task and dealt with there.
      
      Fixes: f9f5301e
      
       ("ice: Register auxiliary device to provide RDMA")
      Signed-off-by: default avatarDave Ertman <david.m.ertman@intel.com>
      Reviewed-by: default avatarJonathan Toppins <jtoppins@redhat.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      5dbbbd01
    • Dave Ertman's avatar
      ice: Fix KASAN error in LAG NETDEV_UNREGISTER handler · bea1898f
      Dave Ertman authored
      Currently, the same handler is called for both a NETDEV_BONDING_INFO
      LAG unlink notification as for a NETDEV_UNREGISTER call.  This is
      causing a problem though, since the netdev_notifier_info passed has
      a different structure depending on which event is passed.  The problem
      manifests as a call trace from a BUG: KASAN stack-out-of-bounds error.
      
      Fix this by creating a handler specific to NETDEV_UNREGISTER that only
      is passed valid elements in the netdev_notifier_info struct for the
      NETDEV_UNREGISTER event.
      
      Also included is the removal of an unbalanced dev_put on the peer_netdev
      and related braces.
      
      Fixes: 6a8b3572
      
       ("ice: Respond to a NETDEV_UNREGISTER event for LAG")
      Signed-off-by: default avatarDave Ertman <david.m.ertman@intel.com>
      Acked-by: default avatarJonathan Toppins <jtoppins@redhat.com>
      Tested-by: default avatarSunitha Mekala <sunithax.d.mekala@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      bea1898f
    • Jesse Brandeburg's avatar
      ice: fix IPIP and SIT TSO offload · 46b699c5
      Jesse Brandeburg authored
      The driver was avoiding offload for IPIP (at least) frames due to
      parsing the inner header offsets incorrectly when trying to check
      lengths.
      
      This length check works for VXLAN frames but fails on IPIP frames
      because skb_transport_offset points to the inner header in IPIP
      frames, which meant the subtraction of transport_header from
      inner_network_header returns a negative value (-20).
      
      With the code before this patch, everything continued to work, but GSO
      was being used to segment, causing throughputs of 1.5Gb/s per thread.
      After this patch, throughput is more like 10Gb/s per thread for IPIP
      traffic.
      
      Fixes: e94d4478
      
       ("ice: Implement filter sync, NDO operations and bump version")
      Signed-off-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Reviewed-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      46b699c5
  3. Feb 10, 2022
  4. Feb 09, 2022