Skip to content
  1. Dec 08, 2023
  2. Dec 07, 2023
    • Hui Zhou's avatar
      nfp: flower: fix for take a mutex lock in soft irq context and rcu lock · 0ad722bd
      Hui Zhou authored
      The neighbour event callback call the function nfp_tun_write_neigh,
      this function will take a mutex lock and it is in soft irq context,
      change the work queue to process the neighbour event.
      
      Move the nfp_tun_write_neigh function out of range rcu_read_lock/unlock()
      in function nfp_tunnel_request_route_v4 and nfp_tunnel_request_route_v6.
      
      Fixes: abc21095
      
       ("nfp: flower: tunnel neigh support bond offload")
      CC: stable@vger.kernel.org # 6.2+
      Signed-off-by: default avatarHui Zhou <hui.zhou@corigine.com>
      Signed-off-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ad722bd
    • Jakub Kicinski's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 803a809d
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2023-12-05 (ice, i40e, iavf)
      
      This series contains updates to ice, i40e and iavf drivers.
      
      Michal fixes incorrect usage of VF MSIX value and index calculation for
      ice.
      
      Marcin restores disabling of Rx VLAN filtering which was inadvertently
      removed for ice.
      
      Ivan Vecera corrects improper messaging of MFS port for i40e.
      
      Jake fixes incorrect checking of coalesce values on iavf.
      
      * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        iavf: validate tx_coalesce_usecs even if rx_coalesce_usecs is zero
        i40e: Fix unexpected MFS warning message
        ice: Restore fix disabling RX VLAN filtering
        ice: change vfs.num_msix_per to vf->num_msix
      ====================
      
      Link: https://lore.kernel.org/r/20231205211918.2123019-1-anthony.l.nguyen@intel.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      803a809d
    • Tobias Waldekranz's avatar
      net: dsa: mv88e6xxx: Restore USXGMII support for 6393X · 0c7ed1f9
      Tobias Waldekranz authored
      In 4a562127
      
      , USXGMII support was added for 6393X, but this was
      lost in the PCS conversion (the blamed commit), most likely because
      these efforts where more or less done in parallel.
      
      Restore this feature by porting Michal's patch to fit the new
      implementation.
      
      Reviewed-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Tested-by: default avatarMichal Smulski <michal.smulski@ooma.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Fixes: e5b732a2
      
       ("net: dsa: mv88e6xxx: convert 88e639x to phylink_pcs")
      Signed-off-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Link: https://lore.kernel.org/r/20231205221359.3926018-1-tobias@waldekranz.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0c7ed1f9
    • Eric Dumazet's avatar
      tcp: do not accept ACK of bytes we never sent · 3d501dd3
      Eric Dumazet authored
      This patch is based on a detailed report and ideas from Yepeng Pan
      and Christian Rossow.
      
      ACK seq validation is currently following RFC 5961 5.2 guidelines:
      
         The ACK value is considered acceptable only if
         it is in the range of ((SND.UNA - MAX.SND.WND) <= SEG.ACK <=
         SND.NXT).  All incoming segments whose ACK value doesn't satisfy the
         above condition MUST be discarded and an ACK sent back.  It needs to
         be noted that RFC 793 on page 72 (fifth check) says: "If the ACK is a
         duplicate (SEG.ACK < SND.UNA), it can be ignored.  If the ACK
         acknowledges something not yet sent (SEG.ACK > SND.NXT) then send an
         ACK, drop the segment, and return".  The "ignored" above implies that
         the processing of the incoming data segment continues, which means
         the ACK value is treated as acceptable.  This mitigation makes the
         ACK check more stringent since any ACK < SND.UNA wouldn't be
         accepted, instead only ACKs that are in the range ((SND.UNA -
         MAX.SND.WND) <= SEG.ACK <= SND.NXT) get through.
      
      This can be refined for new (and possibly spoofed) flows,
      by not accepting ACK for bytes that were never sent.
      
      This greatly improves TCP security at a little cost.
      
      I added a Fixes: tag to make sure this patch will reach stable trees,
      even if the 'blamed' patch was adhering to the RFC.
      
      tp->bytes_acked was added in linux-4.2
      
      Following packetdrill test (courtesy of Yepeng Pan) shows
      the issue at hand:
      
      0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +0 bind(3, ..., ...) = 0
      +0 listen(3, 1024) = 0
      
      // ---------------- Handshake ------------------- //
      
      // when window scale is set to 14 the window size can be extended to
      // 65535 * (2^14) = 1073725440. Linux would accept an ACK packet
      // with ack number in (Server_ISN+1-1073725440. Server_ISN+1)
      // ,though this ack number acknowledges some data never
      // sent by the server.
      
      +0 < S 0:0(0) win 65535 <mss 1400,nop,wscale 14>
      +0 > S. 0:0(0) ack 1 <...>
      +0 < . 1:1(0) ack 1 win 65535
      +0 accept(3, ..., ...) = 4
      
      // For the established connection, we send an ACK packet,
      // the ack packet uses ack number 1 - 1073725300 + 2^32,
      // where 2^32 is used to wrap around.
      // Note: we used 1073725300 instead of 1073725440 to avoid possible
      // edge cases.
      // 1 - 1073725300 + 2^32 = 3221241997
      
      // Oops, old kernels happily accept this packet.
      +0 < . 1:1001(1000) ack 3221241997 win 65535
      
      // After the kernel fix the following will be replaced by a challenge ACK,
      // and prior malicious frame would be dropped.
      +0 > . 1:1(0) ack 1001
      
      Fixes: 354e4aa3
      
       ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarYepeng Pan <yepeng.pan@cispa.de>
      Reported-by: default avatarChristian Rossow <rossow@cispa.de>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Link: https://lore.kernel.org/r/20231205161841.2702925-1-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3d501dd3
    • Jiri Olsa's avatar
      selftests/bpf: Add test for early update in prog_array_map_poke_run · ffed24ef
      Jiri Olsa authored
      
      
      Adding test that tries to trigger the BUG_IN during early map update
      in prog_array_map_poke_run function.
      
      The idea is to share prog array map between thread that constantly
      updates it and another one loading a program that uses that prog
      array.
      
      Eventually we will hit a place where the program is ok to be updated
      (poke->tailcall_target_stable check) but the address is still not
      registered in kallsyms, so the bpf_arch_text_poke returns -EINVAL
      and cause imbalance for the next tail call update check, which will
      fail with -EBUSY in bpf_arch_text_poke as described in previous fix.
      
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Link: https://lore.kernel.org/bpf/20231206083041.1306660-3-jolsa@kernel.org
      ffed24ef
    • Jiri Olsa's avatar
      bpf: Fix prog_array_map_poke_run map poke update · 4b7de801
      Jiri Olsa authored
      Lee pointed out issue found by syscaller [0] hitting BUG in prog array
      map poke update in prog_array_map_poke_run function due to error value
      returned from bpf_arch_text_poke function.
      
      There's race window where bpf_arch_text_poke can fail due to missing
      bpf program kallsym symbols, which is accounted for with check for
      -EINVAL in that BUG_ON call.
      
      The problem is that in such case we won't update the tail call jump
      and cause imbalance for the next tail call update check which will
      fail with -EBUSY in bpf_arch_text_poke.
      
      I'm hitting following race during the program load:
      
        CPU 0                             CPU 1
      
        bpf_prog_load
          bpf_check
            do_misc_fixups
              prog_array_map_poke_track
      
                                          map_update_elem
                                            bpf_fd_array_map_update_elem
                                              prog_array_map_poke_run
      
                                                bpf_arch_text_poke returns -EINVAL
      
          bpf_prog_kallsyms_add
      
      After bpf_arch_text_poke (CPU 1) fails to update the tail call jump, the next
      poke update fails on expected jump instruction check in bpf_arch_text_poke
      with -EBUSY and triggers the BUG_ON in prog_array_map_poke_run.
      
      Similar race exists on the program unload.
      
      Fixing this by moving the update to bpf_arch_poke_desc_update function which
      makes sure we call __bpf_arch_text_poke that skips the bpf address check.
      
      Each architecture has slightly different approach wrt looking up bpf address
      in bpf_arch_text_poke, so instead of splitting the function or adding new
      'checkip' argument in previous version, it seems best to move the whole
      map_poke_run update as arch specific code.
      
        [0] https://syzkaller.appspot.com/bug?extid=97a4fe20470e9bc30810
      
      Fixes: ebf7d1f5
      
       ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
      Reported-by: default avatar <syzbot+97a4fe20470e9bc30810@syzkaller.appspotmail.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Cc: Lee Jones <lee@kernel.org>
      Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
      Link: https://lore.kernel.org/bpf/20231206083041.1306660-2-jolsa@kernel.org
      4b7de801
    • Phil Sutter's avatar
      netfilter: xt_owner: Fix for unsafe access of sk->sk_socket · 7ae836a3
      Phil Sutter authored
      A concurrently running sock_orphan() may NULL the sk_socket pointer in
      between check and deref. Follow other users (like nft_meta.c for
      instance) and acquire sk_callback_lock before dereferencing sk_socket.
      
      Fixes: 0265ab44
      
       ("[NETFILTER]: merge ipt_owner/ip6t_owner in xt_owner")
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      7ae836a3
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: validate family when identifying table via handle · f6e1532a
      Pablo Neira Ayuso authored
      Validate table family when looking up for it via NFTA_TABLE_HANDLE.
      
      Fixes: 3ecbfd65
      
       ("netfilter: nf_tables: allocate handle and delete objects via handle")
      Reported-by: default avatarXingyuan Mo <hdthky0@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      f6e1532a
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: bail out on mismatching dynset and set expressions · 3701cd39
      Pablo Neira Ayuso authored
      If dynset expressions provided by userspace is larger than the declared
      set expressions, then bail out.
      
      Fixes: 48b0ae04
      
       ("netfilter: nftables: netlink support for several set element expressions")
      Reported-by: default avatarXingyuan Mo <hdthky0@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      3701cd39
    • Florian Westphal's avatar
      netfilter: nf_tables: fix 'exist' matching on bigendian arches · 63331e37
      Florian Westphal authored
      Maze reports "tcp option fastopen exists" fails to match on
      OpenWrt 22.03.5, r20134-5f15225c1e (5.10.176) router.
      
      "tcp option fastopen exists" translates to:
      inet
        [ exthdr load tcpopt 1b @ 34 + 0 present => reg 1 ]
        [ cmp eq reg 1 0x00000001 ]
      
      .. but existing nft userspace generates a 1-byte compare.
      
      On LSB (x86), "*reg32 = 1" is identical to nft_reg_store8(reg32, 1), but
      not on MSB, which will place the 1 last. IOW, on bigendian aches the cmp8
      is awalys false.
      
      Make sure we store this in a consistent fashion, so existing userspace
      will also work on MSB (bigendian).
      
      Regardless of this patch we can also change nft userspace to generate
      'reg32 == 0' and 'reg32 != 0' instead of u8 == 0 // u8 == 1 when
      adding 'option x missing/exists' expressions as well.
      
      Fixes: 3c1fece8 ("netfilter: nft_exthdr: Allow checking TCP option presence, too")
      Fixes: b9f9a485 ("netfilter: nft_exthdr: add boolean DCCP option matching")
      Fixes: 055c4b34
      
       ("netfilter: nft_fib: Support existence check")
      Reported-by: default avatarMaciej Żenczykowski <zenczykowski@gmail.com>
      Closes: https://lore.kernel.org/netfilter-devel/CAHo-OozyEqHUjL2-ntATzeZOiuftLWZ_HU6TOM_js4qLfDEAJg@mail.gmail.com/
      
      
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      63331e37
    • Florian Westphal's avatar
      netfilter: nft_set_pipapo: skip inactive elements during set walk · 317eb968
      Florian Westphal authored
      
      
      Otherwise set elements can be deactivated twice which will cause a crash.
      
      Reported-by: default avatarXingyuan Mo <hdthky0@gmail.com>
      Fixes: 3c4287f6
      
       ("nf_tables: Add set type for arbitrary concatenation of ranges")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      317eb968
    • D. Wythe's avatar
      netfilter: bpf: fix bad registration on nf_defrag · 1834d62a
      D. Wythe authored
      We should pass a pointer to global_hook to the get_proto_defrag_hook()
      instead of its value, since the passed value won't be updated even if
      the request module was loaded successfully.
      
      Log:
      
      [   54.915713] nf_defrag_ipv4 has bad registration
      [   54.915779] WARNING: CPU: 3 PID: 6323 at net/netfilter/nf_bpf_link.c:62 get_proto_defrag_hook+0x137/0x160
      [   54.915835] CPU: 3 PID: 6323 Comm: fentry Kdump: loaded Tainted: G            E      6.7.0-rc2+ #35
      [   54.915839] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
      [   54.915841] RIP: 0010:get_proto_defrag_hook+0x137/0x160
      [   54.915844] Code: 4f 8c e8 2c cf 68 ff 80 3d db 83 9a 01 00 0f 85 74 ff ff ff 48 89 ee 48 c7 c7 8f 12 4f 8c c6 05 c4 83 9a 01 01 e8 09 ee 5f ff <0f> 0b e9 57 ff ff ff 49 8b 3c 24 4c 63 e5 e8 36 28 6c ff 4c 89 e0
      [   54.915849] RSP: 0018:ffffb676003fbdb0 EFLAGS: 00010286
      [   54.915852] RAX: 0000000000000023 RBX: ffff9596503d5600 RCX: ffff95996fce08c8
      [   54.915854] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff95996fce08c0
      [   54.915855] RBP: ffffffff8c4f12de R08: 0000000000000000 R09: 00000000fffeffff
      [   54.915859] R10: ffffb676003fbc70 R11: ffffffff8d363ae8 R12: 0000000000000000
      [   54.915861] R13: ffffffff8e1f75c0 R14: ffffb676003c9000 R15: 00007ffd15e78ef0
      [   54.915864] FS:  00007fb6e9cab740(0000) GS:ffff95996fcc0000(0000) knlGS:0000000000000000
      [   54.915867] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   54.915868] CR2: 00007ffd15e75c40 CR3: 0000000101e62006 CR4: 0000000000360ef0
      [   54.915870] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   54.915871] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   54.915873] Call Trace:
      [   54.915891]  <TASK>
      [   54.915894]  ? __warn+0x84/0x140
      [   54.915905]  ? get_proto_defrag_hook+0x137/0x160
      [   54.915908]  ? __report_bug+0xea/0x100
      [   54.915925]  ? report_bug+0x2b/0x80
      [   54.915928]  ? handle_bug+0x3c/0x70
      [   54.915939]  ? exc_invalid_op+0x18/0x70
      [   54.915942]  ? asm_exc_invalid_op+0x1a/0x20
      [   54.915948]  ? get_proto_defrag_hook+0x137/0x160
      [   54.915950]  bpf_nf_link_attach+0x1eb/0x240
      [   54.915953]  link_create+0x173/0x290
      [   54.915969]  __sys_bpf+0x588/0x8f0
      [   54.915974]  __x64_sys_bpf+0x20/0x30
      [   54.915977]  do_syscall_64+0x45/0xf0
      [   54.915989]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
      [   54.915998] RIP: 0033:0x7fb6e9daa51d
      [   54.916001] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2b 89 0c 00 f7 d8 64 89 01 48
      [   54.916003] RSP: 002b:00007ffd15e78ed8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
      [   54.916006] RAX: ffffffffffffffda RBX: 00007ffd15e78fc0 RCX: 00007fb6e9daa51d
      [   54.916007] RDX: 0000000000000040 RSI: 00007ffd15e78ef0 RDI: 000000000000001c
      [   54.916009] RBP: 000000000000002d R08: 00007fb6e9e73a60 R09: 0000000000000001
      [   54.916010] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000006
      [   54.916012] R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000
      [   54.916014]  </TASK>
      [   54.916015] ---[ end trace 0000000000000000 ]---
      
      Fixes: 91721c2d
      
       ("netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link")
      Signed-off-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
      Acked-by: default avatarDaniel Xu <dxu@dxuuu.xyz>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      1834d62a
  3. Dec 06, 2023