Skip to content
  1. Mar 01, 2024
    • Sabrina Dubroca's avatar
      tls: fix peeking with sync+async decryption · 6caaf104
      Sabrina Dubroca authored
      If we peek from 2 records with a currently empty rx_list, and the
      first record is decrypted synchronously but the second record is
      decrypted async, the following happens:
        1. decrypt record 1 (sync)
        2. copy from record 1 to the userspace's msg
        3. queue the decrypted record to rx_list for future read(!PEEK)
        4. decrypt record 2 (async)
        5. queue record 2 to rx_list
        6. call process_rx_list to copy data from the 2nd record
      
      We currently pass copied=0 as skip offset to process_rx_list, so we
      end up copying once again from the first record. We should skip over
      the data we've already copied.
      
      Seen with selftest tls.12_aes_gcm.recv_peek_large_buf_mult_recs
      
      Fixes: 692d7b5d
      
       ("tls: Fix recvmsg() to be able to peek across multiple records")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Link: https://lore.kernel.org/r/1b132d2b2b99296bfde54e8a67672d90d6d16e71.1709132643.git.sd@queasysnail.net
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6caaf104
    • Sabrina Dubroca's avatar
      tls: decrement decrypt_pending if no async completion will be called · f7fa16d4
      Sabrina Dubroca authored
      With mixed sync/async decryption, or failures of crypto_aead_decrypt,
      we increment decrypt_pending but we never do the corresponding
      decrement since tls_decrypt_done will not be called. In this case, we
      should decrement decrypt_pending immediately to avoid getting stuck.
      
      For example, the prequeue prequeue test gets stuck with mixed
      modes (one async decrypt + one sync decrypt).
      
      Fixes: 94524d8f
      
       ("net/tls: Add support for async decryption of tls records")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Link: https://lore.kernel.org/r/c56d5fc35543891d5319f834f25622360e1bfbec.1709132643.git.sd@queasysnail.net
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f7fa16d4
  2. Feb 29, 2024
    • Alexander Ofitserov's avatar
      gtp: fix use-after-free and null-ptr-deref in gtp_newlink() · 616d82c3
      Alexander Ofitserov authored
      
      
      The gtp_link_ops operations structure for the subsystem must be
      registered after registering the gtp_net_ops pernet operations structure.
      
      Syzkaller hit 'general protection fault in gtp_genl_dump_pdp' bug:
      
      [ 1010.702740] gtp: GTP module unloaded
      [ 1010.715877] general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] SMP KASAN NOPTI
      [ 1010.715888] KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
      [ 1010.715895] CPU: 1 PID: 128616 Comm: a.out Not tainted 6.8.0-rc6-std-def-alt1 #1
      [ 1010.715899] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-alt1 04/01/2014
      [ 1010.715908] RIP: 0010:gtp_newlink+0x4d7/0x9c0 [gtp]
      [ 1010.715915] Code: 80 3c 02 00 0f 85 41 04 00 00 48 8b bb d8 05 00 00 e8 ed f6 ff ff 48 89 c2 48 89 c5 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 4f 04 00 00 4c 89 e2 4c 8b 6d 00 48 b8 00 00 00
      [ 1010.715920] RSP: 0018:ffff888020fbf180 EFLAGS: 00010203
      [ 1010.715929] RAX: dffffc0000000000 RBX: ffff88800399c000 RCX: 0000000000000000
      [ 1010.715933] RDX: 0000000000000001 RSI: ffffffff84805280 RDI: 0000000000000282
      [ 1010.715938] RBP: 000000000000000d R08: 0000000000000001 R09: 0000000000000000
      [ 1010.715942] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88800399cc80
      [ 1010.715947] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000400
      [ 1010.715953] FS:  00007fd1509ab5c0(0000) GS:ffff88805b300000(0000) knlGS:0000000000000000
      [ 1010.715958] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1010.715962] CR2: 0000000000000000 CR3: 000000001c07a000 CR4: 0000000000750ee0
      [ 1010.715968] PKRU: 55555554
      [ 1010.715972] Call Trace:
      [ 1010.715985]  ? __die_body.cold+0x1a/0x1f
      [ 1010.715995]  ? die_addr+0x43/0x70
      [ 1010.716002]  ? exc_general_protection+0x199/0x2f0
      [ 1010.716016]  ? asm_exc_general_protection+0x1e/0x30
      [ 1010.716026]  ? gtp_newlink+0x4d7/0x9c0 [gtp]
      [ 1010.716034]  ? gtp_net_exit+0x150/0x150 [gtp]
      [ 1010.716042]  __rtnl_newlink+0x1063/0x1700
      [ 1010.716051]  ? rtnl_setlink+0x3c0/0x3c0
      [ 1010.716063]  ? is_bpf_text_address+0xc0/0x1f0
      [ 1010.716070]  ? kernel_text_address.part.0+0xbb/0xd0
      [ 1010.716076]  ? __kernel_text_address+0x56/0xa0
      [ 1010.716084]  ? unwind_get_return_address+0x5a/0xa0
      [ 1010.716091]  ? create_prof_cpu_mask+0x30/0x30
      [ 1010.716098]  ? arch_stack_walk+0x9e/0xf0
      [ 1010.716106]  ? stack_trace_save+0x91/0xd0
      [ 1010.716113]  ? stack_trace_consume_entry+0x170/0x170
      [ 1010.716121]  ? __lock_acquire+0x15c5/0x5380
      [ 1010.716139]  ? mark_held_locks+0x9e/0xe0
      [ 1010.716148]  ? kmem_cache_alloc_trace+0x35f/0x3c0
      [ 1010.716155]  ? __rtnl_newlink+0x1700/0x1700
      [ 1010.716160]  rtnl_newlink+0x69/0xa0
      [ 1010.716166]  rtnetlink_rcv_msg+0x43b/0xc50
      [ 1010.716172]  ? rtnl_fdb_dump+0x9f0/0x9f0
      [ 1010.716179]  ? lock_acquire+0x1fe/0x560
      [ 1010.716188]  ? netlink_deliver_tap+0x12f/0xd50
      [ 1010.716196]  netlink_rcv_skb+0x14d/0x440
      [ 1010.716202]  ? rtnl_fdb_dump+0x9f0/0x9f0
      [ 1010.716208]  ? netlink_ack+0xab0/0xab0
      [ 1010.716213]  ? netlink_deliver_tap+0x202/0xd50
      [ 1010.716220]  ? netlink_deliver_tap+0x218/0xd50
      [ 1010.716226]  ? __virt_addr_valid+0x30b/0x590
      [ 1010.716233]  netlink_unicast+0x54b/0x800
      [ 1010.716240]  ? netlink_attachskb+0x870/0x870
      [ 1010.716248]  ? __check_object_size+0x2de/0x3b0
      [ 1010.716254]  netlink_sendmsg+0x938/0xe40
      [ 1010.716261]  ? netlink_unicast+0x800/0x800
      [ 1010.716269]  ? __import_iovec+0x292/0x510
      [ 1010.716276]  ? netlink_unicast+0x800/0x800
      [ 1010.716284]  __sock_sendmsg+0x159/0x190
      [ 1010.716290]  ____sys_sendmsg+0x712/0x880
      [ 1010.716297]  ? sock_write_iter+0x3d0/0x3d0
      [ 1010.716304]  ? __ia32_sys_recvmmsg+0x270/0x270
      [ 1010.716309]  ? lock_acquire+0x1fe/0x560
      [ 1010.716315]  ? drain_array_locked+0x90/0x90
      [ 1010.716324]  ___sys_sendmsg+0xf8/0x170
      [ 1010.716331]  ? sendmsg_copy_msghdr+0x170/0x170
      [ 1010.716337]  ? lockdep_init_map_type+0x2c7/0x860
      [ 1010.716343]  ? lockdep_hardirqs_on_prepare+0x430/0x430
      [ 1010.716350]  ? debug_mutex_init+0x33/0x70
      [ 1010.716360]  ? percpu_counter_add_batch+0x8b/0x140
      [ 1010.716367]  ? lock_acquire+0x1fe/0x560
      [ 1010.716373]  ? find_held_lock+0x2c/0x110
      [ 1010.716384]  ? __fd_install+0x1b6/0x6f0
      [ 1010.716389]  ? lock_downgrade+0x810/0x810
      [ 1010.716396]  ? __fget_light+0x222/0x290
      [ 1010.716403]  __sys_sendmsg+0xea/0x1b0
      [ 1010.716409]  ? __sys_sendmsg_sock+0x40/0x40
      [ 1010.716419]  ? lockdep_hardirqs_on_prepare+0x2b3/0x430
      [ 1010.716425]  ? syscall_enter_from_user_mode+0x1d/0x60
      [ 1010.716432]  do_syscall_64+0x30/0x40
      [ 1010.716438]  entry_SYSCALL_64_after_hwframe+0x62/0xc7
      [ 1010.716444] RIP: 0033:0x7fd1508cbd49
      [ 1010.716452] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ef 70 0d 00 f7 d8 64 89 01 48
      [ 1010.716456] RSP: 002b:00007fff18872348 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
      [ 1010.716463] RAX: ffffffffffffffda RBX: 000055f72bf0eac0 RCX: 00007fd1508cbd49
      [ 1010.716468] RDX: 0000000000000000 RSI: 0000000020000280 RDI: 0000000000000006
      [ 1010.716473] RBP: 00007fff18872360 R08: 00007fff18872360 R09: 00007fff18872360
      [ 1010.716478] R10: 00007fff18872360 R11: 0000000000000202 R12: 000055f72bf0e1b0
      [ 1010.716482] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      [ 1010.716491] Modules linked in: gtp(+) udp_tunnel ib_core uinput af_packet rfkill qrtr joydev hid_generic usbhid hid kvm_intel iTCO_wdt intel_pmc_bxt iTCO_vendor_support kvm snd_hda_codec_generic ledtrig_audio irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hda_intel nls_utf8 snd_intel_dspcfg nls_cp866 psmouse aesni_intel vfat crypto_simd fat cryptd glue_helper snd_hda_codec pcspkr snd_hda_core i2c_i801 snd_hwdep i2c_smbus xhci_pci snd_pcm lpc_ich xhci_pci_renesas xhci_hcd qemu_fw_cfg tiny_power_button button sch_fq_codel vboxvideo drm_vram_helper drm_ttm_helper ttm vboxsf vboxguest snd_seq_midi snd_seq_midi_event snd_seq snd_rawmidi snd_seq_device snd_timer snd soundcore msr fuse efi_pstore dm_mod ip_tables x_tables autofs4 virtio_gpu virtio_dma_buf drm_kms_helper cec rc_core drm virtio_rng virtio_scsi rng_core virtio_balloon virtio_blk virtio_net virtio_console net_failover failover ahci libahci libata evdev scsi_mod input_leds serio_raw virtio_pci intel_agp
      [ 1010.716674]  virtio_ring intel_gtt virtio [last unloaded: gtp]
      [ 1010.716693] ---[ end trace 04990a4ce61e174b ]---
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAlexander Ofitserov <oficerovas@altlinux.org>
      Fixes: 459aa660
      
       ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/20240228114703.465107-1-oficerovas@altlinux.org
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      616d82c3
    • Paolo Abeni's avatar
      Merge tag 'nf-24-02-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · b611b776
      Paolo Abeni authored
      
      
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      Patch #1 restores NFPROTO_INET with nft_compat, from Ignat Korchagin.
      
      Patch #2 fixes an issue with bridge netfilter and broadcast/multicast
      packets.
      
      There is a day 0 bug in br_netfilter when used with connection tracking.
      
      Conntrack assumes that an nf_conn structure that is not yet added to
      hash table ("unconfirmed"), is only visible by the current cpu that is
      processing the sk_buff.
      
      For bridge this isn't true, sk_buff can get cloned in between, and
      clones can be processed in parallel on different cpu.
      
      This patch disables NAT and conntrack helpers for multicast packets.
      
      Patch #3 adds a selftest to cover for the br_netfilter bug.
      
      netfilter pull request 24-02-29
      
      * tag 'nf-24-02-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        selftests: netfilter: add bridge conntrack + multicast test case
        netfilter: bridge: confirm multicast packets before passing them up the stack
        netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate()
      ====================
      
      Link: https://lore.kernel.org/r/20240229000135.8780-1-pablo@netfilter.org
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b611b776
    • Lukasz Majewski's avatar
      net: hsr: Use correct offset for HSR TLV values in supervisory HSR frames · 51dd4ee0
      Lukasz Majewski authored
      Current HSR implementation uses following supervisory frame (even for
      HSRv1 the HSR tag is not is not present):
      
      00000000: 01 15 4e 00 01 2d XX YY ZZ 94 77 10 88 fb 00 01
      00000010: 7e 1c 17 06 XX YY ZZ 94 77 10 1e 06 XX YY ZZ 94
      00000020: 77 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      00000030: 00 00 00 00 00 00 00 00 00 00 00 00
      
      The current code adds extra two bytes (i.e. sizeof(struct hsr_sup_tlv))
      when offset for skb_pull() is calculated.
      This is wrong, as both 'struct hsrv1_ethhdr_sp' and 'hsrv0_ethhdr_sp'
      already have 'struct hsr_sup_tag' defined in them, so there is no need
      for adding extra two bytes.
      
      This code was working correctly as with no RedBox support, the check for
      HSR_TLV_EOT (0x00) was off by two bytes, which were corresponding to
      zeroed padded bytes for minimal packet size.
      
      Fixes: eafaa88b
      
       ("net: hsr: Add support for redbox supervision frames")
      Signed-off-by: default avatarLukasz Majewski <lukma@denx.de>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/20240228085644.3618044-1-lukma@denx.de
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      51dd4ee0
    • Oleksij Rempel's avatar
      igb: extend PTP timestamp adjustments to i211 · 0bb7b093
      Oleksij Rempel authored
      The i211 requires the same PTP timestamp adjustments as the i210,
      according to its datasheet. To ensure consistent timestamping across
      different platforms, this change extends the existing adjustments to
      include the i211.
      
      The adjustment result are tested and comparable for i210 and i211 based
      systems.
      
      Fixes: 3f544d2a
      
       ("igb: adjust PTP timestamps for Tx/Rx latency")
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://lore.kernel.org/r/20240227184942.362710-1-anthony.l.nguyen@intel.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0bb7b093
    • Lin Ma's avatar
      rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back · 743ad091
      Lin Ma authored
      In the commit d73ef2d6 ("rtnetlink: let rtnl_bridge_setlink checks
      IFLA_BRIDGE_MODE length"), an adjustment was made to the old loop logic
      in the function `rtnl_bridge_setlink` to enable the loop to also check
      the length of the IFLA_BRIDGE_MODE attribute. However, this adjustment
      removed the `break` statement and led to an error logic of the flags
      writing back at the end of this function.
      
      if (have_flags)
          memcpy(nla_data(attr), &flags, sizeof(flags));
          // attr should point to IFLA_BRIDGE_FLAGS NLA !!!
      
      Before the mentioned commit, the `attr` is granted to be IFLA_BRIDGE_FLAGS.
      However, this is not necessarily true fow now as the updated loop will let
      the attr point to the last NLA, even an invalid NLA which could cause
      overflow writes.
      
      This patch introduces a new variable `br_flag` to save the NLA pointer
      that points to IFLA_BRIDGE_FLAGS and uses it to resolve the mentioned
      error logic.
      
      Fixes: d73ef2d6
      
       ("rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length")
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Link: https://lore.kernel.org/r/20240227121128.608110-1-linma@zju.edu.cn
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      743ad091
    • Jakub Kicinski's avatar
      tools: ynl: fix handling of multiple mcast groups · b6c65eb2
      Jakub Kicinski authored
      We never increment the group number iterator, so all groups
      get recorded into index 0 of the mcast_groups[] array.
      
      As a result YNL can only handle using the last group.
      For example using the "netdev" sample on kernel with
      page pool commands results in:
      
        $ ./samples/netdev
        YNL: Multicast group 'mgmt' not found
      
      Most families have only one multicast group, so this hasn't
      been noticed. Plus perhaps developers usually test the last
      group which would have worked.
      
      Fixes: 86878f14
      
       ("tools: ynl: user space helpers")
      Reviewed-by: default avatarDonald Hunter <donald.hunter@gmail.com>
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Link: https://lore.kernel.org/r/20240226214019.1255242-1-kuba@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b6c65eb2
    • Florian Westphal's avatar
      selftests: netfilter: add bridge conntrack + multicast test case · 6523cf51
      Florian Westphal authored
      
      
      Add test case for multicast packet confirm race.
      Without preceding patch, this should result in:
      
       WARNING: CPU: 0 PID: 38 at net/netfilter/nf_conntrack_core.c:1198 __nf_conntrack_confirm+0x3ed/0x5f0
       Workqueue: events_unbound macvlan_process_broadcast
       RIP: 0010:__nf_conntrack_confirm+0x3ed/0x5f0
        ? __nf_conntrack_confirm+0x3ed/0x5f0
        nf_confirm+0x2ad/0x2d0
        nf_hook_slow+0x36/0xd0
        ip_local_deliver+0xce/0x110
        __netif_receive_skb_one_core+0x4f/0x70
        process_backlog+0x8c/0x130
        [..]
      
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      6523cf51
    • Florian Westphal's avatar
      netfilter: bridge: confirm multicast packets before passing them up the stack · 62e7151a
      Florian Westphal authored
      
      
      conntrack nf_confirm logic cannot handle cloned skbs referencing
      the same nf_conn entry, which will happen for multicast (broadcast)
      frames on bridges.
      
       Example:
          macvlan0
             |
            br0
           /  \
        ethX    ethY
      
       ethX (or Y) receives a L2 multicast or broadcast packet containing
       an IP packet, flow is not yet in conntrack table.
      
       1. skb passes through bridge and fake-ip (br_netfilter)Prerouting.
          -> skb->_nfct now references a unconfirmed entry
       2. skb is broad/mcast packet. bridge now passes clones out on each bridge
          interface.
       3. skb gets passed up the stack.
       4. In macvlan case, macvlan driver retains clone(s) of the mcast skb
          and schedules a work queue to send them out on the lower devices.
      
          The clone skb->_nfct is not a copy, it is the same entry as the
          original skb.  The macvlan rx handler then returns RX_HANDLER_PASS.
       5. Normal conntrack hooks (in NF_INET_LOCAL_IN) confirm the orig skb.
      
      The Macvlan broadcast worker and normal confirm path will race.
      
      This race will not happen if step 2 already confirmed a clone. In that
      case later steps perform skb_clone() with skb->_nfct already confirmed (in
      hash table).  This works fine.
      
      But such confirmation won't happen when eb/ip/nftables rules dropped the
      packets before they reached the nf_confirm step in postrouting.
      
      Pablo points out that nf_conntrack_bridge doesn't allow use of stateful
      nat, so we can safely discard the nf_conn entry and let inet call
      conntrack again.
      
      This doesn't work for bridge netfilter: skb could have a nat
      transformation. Also bridge nf prevents re-invocation of inet prerouting
      via 'sabotage_in' hook.
      
      Work around this problem by explicit confirmation of the entry at LOCAL_IN
      time, before upper layer has a chance to clone the unconfirmed entry.
      
      The downside is that this disables NAT and conntrack helpers.
      
      Alternative fix would be to add locking to all code parts that deal with
      unconfirmed packets, but even if that could be done in a sane way this
      opens up other problems, for example:
      
      -m physdev --physdev-out eth0 -j SNAT --snat-to 1.2.3.4
      -m physdev --physdev-out eth1 -j SNAT --snat-to 1.2.3.5
      
      For multicast case, only one of such conflicting mappings will be
      created, conntrack only handles 1:1 NAT mappings.
      
      Users should set create a setup that explicitly marks such traffic
      NOTRACK (conntrack bypass) to avoid this, but we cannot auto-bypass
      them, ruleset might have accept rules for untracked traffic already,
      so user-visible behaviour would change.
      
      Suggested-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217777
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      62e7151a
    • Ignat Korchagin's avatar
      netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate() · 7e0f122c
      Ignat Korchagin authored
      Commit d0009eff ("netfilter: nf_tables: validate NFPROTO_* family") added
      some validation of NFPROTO_* families in the nft_compat module, but it broke
      the ability to use legacy iptables modules in dual-stack nftables.
      
      While with legacy iptables one had to independently manage IPv4 and IPv6
      tables, with nftables it is possible to have dual-stack tables sharing the
      rules. Moreover, it was possible to use rules based on legacy iptables
      match/target modules in dual-stack nftables.
      
      As an example, the program from [2] creates an INET dual-stack family table
      using an xt_bpf based rule, which looks like the following (the actual output
      was generated with a patched nft tool as the current nft tool does not parse
      dual stack tables with legacy match rules, so consider it for illustrative
      purposes only):
      
      table inet testfw {
        chain input {
          type filter hook prerouting priority filter; policy accept;
          bytecode counter packets 0 bytes 0 accept
        }
      }
      
      After d0009eff ("netfilter: nf_tables: validate NFPROTO_* family") we get
      EOPNOTSUPP for the above program.
      
      Fix this by allowing NFPROTO_INET for nft_(match/target)_validate(), but also
      restrict the functions to classic iptables hooks.
      
      Changes in v3:
        * clarify that upstream nft will not display such configuration properly and
          that the output was generated with a patched nft tool
        * remove example program from commit description and link to it instead
        * no code changes otherwise
      
      Changes in v2:
        * restrict nft_(match/target)_validate() to classic iptables hooks
        * rewrite example program to use unmodified libnftnl
      
      Fixes: d0009eff
      
       ("netfilter: nf_tables: validate NFPROTO_* family")
      Link: https://lore.kernel.org/all/Zc1PfoWN38UuFJRI@calendula/T/#mc947262582c90fec044c7a3398cc92fac7afea72 [1]
      Link: https://lore.kernel.org/all/20240220145509.53357-1-ignat@cloudflare.com/ [2]
      Reported-by: default avatarJordan Griege <jgriege@cloudflare.com>
      Signed-off-by: default avatarIgnat Korchagin <ignat@cloudflare.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      7e0f122c
  3. Feb 28, 2024
    • Haiyue Wang's avatar
      Documentations: correct net_cachelines title for struct inet_sock · 4adfc94d
      Haiyue Wang authored
      
      
      The fast path usage breakdown describes the detail for 'inet_sock', fix
      the markup title.
      
      Signed-off-by: default avatarHaiyue Wang <haiyue.wang@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4adfc94d
    • Jakub Raczynski's avatar
      stmmac: Clear variable when destroying workqueue · 8af411bb
      Jakub Raczynski authored
      Currently when suspending driver and stopping workqueue it is checked whether
      workqueue is not NULL and if so, it is destroyed.
      Function destroy_workqueue() does drain queue and does clear variable, but
      it does not set workqueue variable to NULL. This can cause kernel/module
      panic if code attempts to clear workqueue that was not initialized.
      
      This scenario is possible when resuming suspended driver in stmmac_resume(),
      because there is no handling for failed stmmac_hw_setup(),
      which can fail and return if DMA engine has failed to initialize,
      and workqueue is initialized after DMA engine.
      Should DMA engine fail to initialize, resume will proceed normally,
      but interface won't work and TX queue will eventually timeout,
      causing 'Reset adapter' error.
      This then does destroy workqueue during reset process.
      And since workqueue is initialized after DMA engine and can be skipped,
      it will cause kernel/module panic.
      
      To secure against this possible crash, set workqueue variable to NULL when
      destroying workqueue.
      
      Log/backtrace from crash goes as follows:
      [88.031977]------------[ cut here ]------------
      [88.031985]NETDEV WATCHDOG: eth0 (sxgmac): transmit queue 1 timed out
      [88.032017]WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:477 dev_watchdog+0x390/0x398
                 <Skipping backtrace for watchdog timeout>
      [88.032251]---[ end trace e70de432e4d5c2c0 ]---
      [88.032282]sxgmac 16d88000.ethernet eth0: Reset adapter.
      [88.036359]------------[ cut here ]------------
      [88.036519]Call trace:
      [88.036523] flush_workqueue+0x3e4/0x430
      [88.036528] drain_workqueue+0xc4/0x160
      [88.036533] destroy_workqueue+0x40/0x270
      [88.036537] stmmac_fpe_stop_wq+0x4c/0x70
      [88.036541] stmmac_release+0x278/0x280
      [88.036546] __dev_close_many+0xcc/0x158
      [88.036551] dev_close_many+0xbc/0x190
      [88.036555] dev_close.part.0+0x70/0xc0
      [88.036560] dev_close+0x24/0x30
      [88.036564] stmmac_service_task+0x110/0x140
      [88.036569] process_one_work+0x1d8/0x4a0
      [88.036573] worker_thread+0x54/0x408
      [88.036578] kthread+0x164/0x170
      [88.036583] ret_from_fork+0x10/0x20
      [88.036588]---[ end trace e70de432e4d5c2c1 ]---
      [88.036597]Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004
      
      Fixes: 5a558611
      
       ("net: stmmac: support FPE link partner hand-shaking procedure")
      Signed-off-by: default avatarJakub Raczynski <j.raczynski@samsung.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8af411bb
    • Lukasz Majewski's avatar
      net: hsr: Fix typo in the hsr_forward_do() function comment · 995161ed
      Lukasz Majewski authored
      
      
      Correct type in the hsr_forward_do() comment.
      
      Signed-off-by: default avatarLukasz Majewski <lukma@denx.de>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      995161ed
    • Randy Dunlap's avatar
      net: ethernet: adi: move PHYLIB from vendor to driver symbol · 943d4bd6
      Randy Dunlap authored
      In a previous patch I added "select PHYLIB" at the wrong place for the
      ADIN1110 driver symbol, so move it to its correct place under the
      ADIN1110 kconfig symbol.
      
      Fixes: a9f80df4
      
       ("net: ethernet: adi: requires PHYLIB support")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reported-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Closes: https://lore.kernel.org/lkml/77012b38-4b49-47f4-9a88-d773d52909ad@infradead.org/T/#m8ba397484738711edc0ad607b2c63ca02244e3c3
      Cc: Lennart Franzen <lennart@lfdomain.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: netdev@vger.kernel.org
      Cc: Nuno Sa <nuno.sa@analog.com>
      Tested-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      943d4bd6
    • Jakub Kicinski's avatar
      Merge tag 'wireless-2024-02-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless · ed2c0e4c
      Jakub Kicinski authored
      
      
      Kalle Valo says:
      
      ====================
      wireless fixes for v6.8-rc7
      
      Few remaining fixes, hopefully the last wireless pull request to v6.8.
      Two fixes to the stack and two to iwlwifi but no high priority fixes
      this time.
      
      * tag 'wireless-2024-02-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
        wifi: mac80211: only call drv_sta_rc_update for uploaded stations
        MAINTAINERS: wifi: Add N: ath1*k entries to match .yaml files
        MAINTAINERS: wifi: update Jeff Johnson e-mail address
        wifi: iwlwifi: mvm: fix the TXF mapping for BZ devices
        wifi: iwlwifi: mvm: ensure offloading TID queue exists
        wifi: nl80211: reject iftype change with mesh ID change
      ====================
      
      Link: https://lore.kernel.org/r/20240227135751.C5EC6C43390@smtp.kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ed2c0e4c
    • Justin Iurman's avatar
      uapi: in6: replace temporary label with rfc9486 · 6a200864
      Justin Iurman authored
      Not really a fix per se, but IPV6_TLV_IOAM is still tagged as "TEMPORARY
      IANA allocation for IOAM", while RFC 9486 is available for some time
      now. Just update the reference.
      
      Fixes: 9ee11f0f
      
       ("ipv6: ioam: Data plane support for Pre-allocated Trace")
      Signed-off-by: default avatarJustin Iurman <justin.iurman@uliege.be>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240226124921.9097-1-justin.iurman@uliege.be
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6a200864
    • Oleksij Rempel's avatar
      net: lan78xx: fix "softirq work is pending" error · e3d5d70c
      Oleksij Rempel authored
      Disable BH around the call to napi_schedule() to avoid following
      error:
      NOHZ tick-stop error: local softirq work is pending, handler #08!!!
      
      Fixes: ec4c7e12
      
       ("lan78xx: Introduce NAPI polling support")
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Link: https://lore.kernel.org/r/20240226110820.2113584-1-o.rempel@pengutronix.de
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e3d5d70c
    • Kurt Kanzenbach's avatar
      net: stmmac: Complete meta data only when enabled · f72a1994
      Kurt Kanzenbach authored
      Currently using plain XDP/ZC sockets on stmmac results in a kernel crash:
      
      |[  255.822584] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
      |[...]
      |[  255.822764] Call trace:
      |[  255.822766]  stmmac_tx_clean.constprop.0+0x848/0xc38
      
      The program counter indicates xsk_tx_metadata_complete(). It works on
      compl->tx_timestamp, which is not set by xsk_tx_metadata_to_compl() due to
      missing meta data. Therefore, call xsk_tx_metadata_complete() only when
      meta data is actually used.
      
      Tested on imx93 without XDP, with XDP and with XDP/ZC.
      
      Fixes: 1347b419
      
       ("net: stmmac: Add Tx HWTS support to XDP ZC")
      Suggested-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Tested-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Link: https://lore.kernel.org/netdev/87r0h7wg8u.fsf@kurt.kurt.home/
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarKurt Kanzenbach <kurt@linutronix.de>
      Link: https://lore.kernel.org/r/20240222-stmmac_xdp-v2-1-4beee3a037e4@linutronix.de
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f72a1994
    • Javier Carrasco's avatar
      net: usb: dm9601: fix wrong return value in dm9601_mdio_read · c68b2c9e
      Javier Carrasco authored
      The MII code does not check the return value of mdio_read (among
      others), and therefore no error code should be sent. A previous fix to
      the use of an uninitialized variable propagates negative error codes,
      that might lead to wrong operations by the MII library.
      
      An example of such issues is the use of mii_nway_restart by the dm9601
      driver. The mii_nway_restart function does not check the value returned
      by mdio_read, which in this case might be a negative number which could
      contain the exact bit the function checks (BMCR_ANENABLE = 0x1000).
      
      Return zero in case of error, as it is common practice in users of
      mdio_read to avoid wrong uses of the return value.
      
      Fixes: 8f8abb86
      
       ("net: usb: dm9601: fix uninitialized variable use in dm9601_mdio_read")
      Signed-off-by: default avatarJavier Carrasco <javier.carrasco.cruz@gmail.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarPeter Korsgaard <peter@korsgaard.com>
      Link: https://lore.kernel.org/r/20240225-dm9601_ret_err-v1-1-02c1d959ea59@gmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c68b2c9e
  4. Feb 27, 2024
    • Jakub Kicinski's avatar
      veth: try harder when allocating queue memory · 1ce7d306
      Jakub Kicinski authored
      
      
      struct veth_rq is pretty large, 832B total without debug
      options enabled. Since commit under Fixes we try to pre-allocate
      enough queues for every possible CPU. Miao Wang reports that
      this may lead to order-5 allocations which will fail in production.
      
      Let the allocation fallback to vmalloc() and try harder.
      These are the same flags we pass to netdev queue allocation.
      
      Reported-and-tested-by: default avatarMiao Wang <shankerwangmiao@gmail.com>
      Fixes: 9d3684c2
      
       ("veth: create by default nr_possible_cpus queues")
      Link: https://lore.kernel.org/all/5F52CAE2-2FB7-4712-95F1-3312FBBFA8DD@gmail.com/
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240223235908.693010-1-kuba@kernel.org
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      1ce7d306
    • Paolo Abeni's avatar
      Merge branch 'ionic-pci-error-handling-fixes' · 237274fa
      Paolo Abeni authored
      
      
      Shannon Nelson says:
      
      ====================
      ionic: PCI error handling fixes
      
      These are a few things to make our PCI reset handling better.
      ====================
      
      Link: https://lore.kernel.org/r/20240223222742.13923-1-shannon.nelson@amd.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      237274fa
    • Shannon Nelson's avatar
      ionic: restore netdev feature bits after reset · 155a1efc
      Shannon Nelson authored
      When rebuilding the lif after an FLR, be sure to restore the
      current netdev features, not do the usual first time feature
      init.  This prevents losing user changes to things like TSO
      or vlan tagging states.
      
      Fixes: 45b84188
      
       ("ionic: keep filters across FLR")
      Reviewed-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      155a1efc
    • Shannon Nelson's avatar
      ionic: check cmd_regs before copying in or out · 7662fad3
      Shannon Nelson authored
      Since we now have potential cases of NULL cmd_regs and info_regs
      during a reset recovery, and left NULL if a reset recovery has
      failed, we need to check that they exist before we use them.
      Most of the cases were covered in the original patch where we
      verify before doing the ioreadb() for health or cmd status.
      However, we need to protect a few uses of io mem that could
      be hit in error recovery or asynchronous threads calls as well
      (e.g. ethtool or devlink handlers).
      
      Fixes: 219e1832
      
       ("ionic: no fw read when PCI reset failed")
      Reviewed-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      7662fad3
    • Shannon Nelson's avatar
      ionic: check before releasing pci regions · a36b0787
      Shannon Nelson authored
      AER recovery handler can trigger a PCI Reset after tearing
      down the device setup in the error detection handler.  The PCI
      Reset handler will also attempt to tear down the device setup,
      and this second tear down needs to know that it doesn't need
      to call pci_release_regions() a second time.  We can clear
      num_bars on tear down and use that to decide later if we need
      to clear the resources.  This prevents a harmless but disturbing
      warning message
          resource: Trying to free nonexistent resource <0xXXXXXXXXXX-0xXXXXXXXXXX>
      
      Fixes: c3a910e1
      
       ("ionic: fill out pci error handlers")
      Reviewed-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      a36b0787
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-more-misc-fixes-for-v6-8' · 3980cf16
      Jakub Kicinski authored
      
      
      Matthieu Baerts says:
      
      ====================
      mptcp: more misc. fixes for v6.8
      
      This series includes 6 types of fixes:
      
      - Patch 1 fixes v4 mapped in v6 addresses support for the userspace PM,
        when asking to delete a subflow. It was done everywhere else, but not
        there. Patch 2 validates the modification, thanks to a subtest in
        mptcp_join.sh. These patches can be backported up to v5.19.
      
      - Patch 3 is a small fix for a recent bug-fix patch, just to avoid
        printing an irrelevant warning (pr_warn()) once. It can be backported
        up to v5.6, alongside the bug-fix that has been introduced in the
        v6.8-rc5.
      
      - Patches 4 to 6 are fixes for bugs found by Paolo while working on
        TCP_NOTSENT_LOWAT support for MPTCP. These fixes can improve the
        performances in some cases. Patches can be backported up to v5.6,
        v5.11 and v6.7 respectively.
      
      - Patch 7 makes sure 'ss -M' is available when starting MPTCP Join
        selftest as it is required for some subtests since v5.18.
      
      - Patch 8 fixes a possible double-free on socket dismantle. The issue
        always existed, but was unnoticed because it was not causing any
        problem so far. This fix can be backported up to v5.6.
      
      - Patch 9 is a fix for a very recent patch causing lockdep warnings in
        subflow diag. The patch causing the regression -- which fixes another
        issue present since v5.7 -- should be part of the future v6.8-rc6.
        Patch 10 validates the modification, thanks to a new subtest in
        diag.sh.
      ====================
      
      Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-0-162e87e48497@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3980cf16
    • Paolo Abeni's avatar
      selftests: mptcp: explicitly trigger the listener diag code-path · b4b51d36
      Paolo Abeni authored
      
      
      The mptcp diag interface already experienced a few locking bugs
      that lockdep and appropriate coverage have detected in advance.
      
      Let's add a test-case triggering the relevant code path, to prevent
      similar issues in the future.
      
      Be careful to cope with very slow environments.
      
      Note that we don't need an explicit timeout on the mptcp_connect
      subprocess to cope with eventual bug/hang-up as the final cleanup
      terminating the child processes will take care of that.
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-10-162e87e48497@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b4b51d36
    • Paolo Abeni's avatar
      mptcp: fix possible deadlock in subflow diag · d6a9608a
      Paolo Abeni authored
      Syzbot and Eric reported a lockdep splat in the subflow diag:
      
         WARNING: possible circular locking dependency detected
         6.8.0-rc4-syzkaller-00212-g40b9385dd8e6 #0 Not tainted
      
         syz-executor.2/24141 is trying to acquire lock:
         ffff888045870130 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at:
         tcp_diag_put_ulp net/ipv4/tcp_diag.c:100 [inline]
         ffff888045870130 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at:
         tcp_diag_get_aux+0x738/0x830 net/ipv4/tcp_diag.c:137
      
         but task is already holding lock:
         ffffc9000135e488 (&h->lhash2[i].lock){+.+.}-{2:2}, at: spin_lock
         include/linux/spinlock.h:351 [inline]
         ffffc9000135e488 (&h->lhash2[i].lock){+.+.}-{2:2}, at:
         inet_diag_dump_icsk+0x39f/0x1f80 net/ipv4/inet_diag.c:1038
      
         which lock already depends on the new lock.
      
         the existing dependency chain (in reverse order) is:
      
         -> #1 (&h->lhash2[i].lock){+.+.}-{2:2}:
         lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
         __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
         _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
         spin_lock include/linux/spinlock.h:351 [inline]
         __inet_hash+0x335/0xbe0 net/ipv4/inet_hashtables.c:743
         inet_csk_listen_start+0x23a/0x320 net/ipv4/inet_connection_sock.c:1261
         __inet_listen_sk+0x2a2/0x770 net/ipv4/af_inet.c:217
         inet_listen+0xa3/0x110 net/ipv4/af_inet.c:239
         rds_tcp_listen_init+0x3fd/0x5a0 net/rds/tcp_listen.c:316
         rds_tcp_init_net+0x141/0x320 net/rds/tcp.c:577
         ops_init+0x352/0x610 net/core/net_namespace.c:136
         __register_pernet_operations net/core/net_namespace.c:1214 [inline]
         register_pernet_operations+0x2cb/0x660 net/core/net_namespace.c:1283
         register_pernet_device+0x33/0x80 net/core/net_namespace.c:1370
         rds_tcp_init+0x62/0xd0 net/rds/tcp.c:735
         do_one_initcall+0x238/0x830 init/main.c:1236
         do_initcall_level+0x157/0x210 init/main.c:1298
         do_initcalls+0x3f/0x80 init/main.c:1314
         kernel_init_freeable+0x42f/0x5d0 init/main.c:1551
         kernel_init+0x1d/0x2a0 init/main.c:1441
         ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
         ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242
      
         -> #0 (k-sk_lock-AF_INET6){+.+.}-{0:0}:
         check_prev_add kernel/locking/lockdep.c:3134 [inline]
         check_prevs_add kernel/locking/lockdep.c:3253 [inline]
         validate_chain+0x18ca/0x58e0 kernel/locking/lockdep.c:3869
         __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137
         lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
         lock_sock_fast include/net/sock.h:1723 [inline]
         subflow_get_info+0x166/0xd20 net/mptcp/diag.c:28
         tcp_diag_put_ulp net/ipv4/tcp_diag.c:100 [inline]
         tcp_diag_get_aux+0x738/0x830 net/ipv4/tcp_diag.c:137
         inet_sk_diag_fill+0x10ed/0x1e00 net/ipv4/inet_diag.c:345
         inet_diag_dump_icsk+0x55b/0x1f80 net/ipv4/inet_diag.c:1061
         __inet_diag_dump+0x211/0x3a0 net/ipv4/inet_diag.c:1263
         inet_diag_dump_compat+0x1c1/0x2d0 net/ipv4/inet_diag.c:1371
         netlink_dump+0x59b/0xc80 net/netlink/af_netlink.c:2264
         __netlink_dump_start+0x5df/0x790 net/netlink/af_netlink.c:2370
         netlink_dump_start include/linux/netlink.h:338 [inline]
         inet_diag_rcv_msg_compat+0x209/0x4c0 net/ipv4/inet_diag.c:1405
         sock_diag_rcv_msg+0xe7/0x410
         netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543
         sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280
         netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
         netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367
         netlink_sendmsg+0xa3b/0xd70 net/netlink/af_netlink.c:1908
         sock_sendmsg_nosec net/socket.c:730 [inline]
         __sock_sendmsg+0x221/0x270 net/socket.c:745
         ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
         ___sys_sendmsg net/socket.c:2638 [inline]
         __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
         do_syscall_64+0xf9/0x240
         entry_SYSCALL_64_after_hwframe+0x6f/0x77
      
      As noted by Eric we can break the lock dependency chain avoid
      dumping any extended info for the mptcp subflow listener:
      nothing actually useful is presented there.
      
      Fixes: b8adb69a
      
       ("mptcp: fix lockless access in subflow ULP diag")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Closes: https://lore.kernel.org/netdev/CANn89iJ=Oecw6OZDwmSYc9HJKQ_G32uN11L+oUcMu+TOD5Xiaw@mail.gmail.com/
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-9-162e87e48497@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d6a9608a
    • Davide Caratti's avatar
      mptcp: fix double-free on socket dismantle · 10048689
      Davide Caratti authored
      when MPTCP server accepts an incoming connection, it clones its listener
      socket. However, the pointer to 'inet_opt' for the new socket has the same
      value as the original one: as a consequence, on program exit it's possible
      to observe the following splat:
      
        BUG: KASAN: double-free in inet_sock_destruct+0x54f/0x8b0
        Free of addr ffff888485950880 by task swapper/25/0
      
        CPU: 25 PID: 0 Comm: swapper/25 Kdump: loaded Not tainted 6.8.0-rc1+ #609
        Hardware name: Supermicro SYS-6027R-72RF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0  07/26/2013
        Call Trace:
         <IRQ>
         dump_stack_lvl+0x32/0x50
         print_report+0xca/0x620
         kasan_report_invalid_free+0x64/0x90
         __kasan_slab_free+0x1aa/0x1f0
         kfree+0xed/0x2e0
         inet_sock_destruct+0x54f/0x8b0
         __sk_destruct+0x48/0x5b0
         rcu_do_batch+0x34e/0xd90
         rcu_core+0x559/0xac0
         __do_softirq+0x183/0x5a4
         irq_exit_rcu+0x12d/0x170
         sysvec_apic_timer_interrupt+0x6b/0x80
         </IRQ>
         <TASK>
         asm_sysvec_apic_timer_interrupt+0x16/0x20
        RIP: 0010:cpuidle_enter_state+0x175/0x300
        Code: 30 00 0f 84 1f 01 00 00 83 e8 01 83 f8 ff 75 e5 48 83 c4 18 44 89 e8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc fb 45 85 ed <0f> 89 60 ff ff ff 48 c1 e5 06 48 c7 43 18 00 00 00 00 48 83 44 2b
        RSP: 0018:ffff888481cf7d90 EFLAGS: 00000202
        RAX: 0000000000000000 RBX: ffff88887facddc8 RCX: 0000000000000000
        RDX: 1ffff1110ff588b1 RSI: 0000000000000019 RDI: ffff88887fac4588
        RBP: 0000000000000004 R08: 0000000000000002 R09: 0000000000043080
        R10: 0009b02ea273363f R11: ffff88887fabf42b R12: ffffffff932592e0
        R13: 0000000000000004 R14: 0000000000000000 R15: 00000022c880ec80
         cpuidle_enter+0x4a/0xa0
         do_idle+0x310/0x410
         cpu_startup_entry+0x51/0x60
         start_secondary+0x211/0x270
         secondary_startup_64_no_verify+0x184/0x18b
         </TASK>
      
        Allocated by task 6853:
         kasan_save_stack+0x1c/0x40
         kasan_save_track+0x10/0x30
         __kasan_kmalloc+0xa6/0xb0
         __kmalloc+0x1eb/0x450
         cipso_v4_sock_setattr+0x96/0x360
         netlbl_sock_setattr+0x132/0x1f0
         selinux_netlbl_socket_post_create+0x6c/0x110
         selinux_socket_post_create+0x37b/0x7f0
         security_socket_post_create+0x63/0xb0
         __sock_create+0x305/0x450
         __sys_socket_create.part.23+0xbd/0x130
         __sys_socket+0x37/0xb0
         __x64_sys_socket+0x6f/0xb0
         do_syscall_64+0x83/0x160
         entry_SYSCALL_64_after_hwframe+0x6e/0x76
      
        Freed by task 6858:
         kasan_save_stack+0x1c/0x40
         kasan_save_track+0x10/0x30
         kasan_save_free_info+0x3b/0x60
         __kasan_slab_free+0x12c/0x1f0
         kfree+0xed/0x2e0
         inet_sock_destruct+0x54f/0x8b0
         __sk_destruct+0x48/0x5b0
         subflow_ulp_release+0x1f0/0x250
         tcp_cleanup_ulp+0x6e/0x110
         tcp_v4_destroy_sock+0x5a/0x3a0
         inet_csk_destroy_sock+0x135/0x390
         tcp_fin+0x416/0x5c0
         tcp_data_queue+0x1bc8/0x4310
         tcp_rcv_state_process+0x15a3/0x47b0
         tcp_v4_do_rcv+0x2c1/0x990
         tcp_v4_rcv+0x41fb/0x5ed0
         ip_protocol_deliver_rcu+0x6d/0x9f0
         ip_local_deliver_finish+0x278/0x360
         ip_local_deliver+0x182/0x2c0
         ip_rcv+0xb5/0x1c0
         __netif_receive_skb_one_core+0x16e/0x1b0
         process_backlog+0x1e3/0x650
         __napi_poll+0xa6/0x500
         net_rx_action+0x740/0xbb0
         __do_softirq+0x183/0x5a4
      
        The buggy address belongs to the object at ffff888485950880
         which belongs to the cache kmalloc-64 of size 64
        The buggy address is located 0 bytes inside of
         64-byte region [ffff888485950880, ffff8884859508c0)
      
        The buggy address belongs to the physical page:
        page:0000000056d1e95e refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888485950700 pfn:0x485950
        flags: 0x57ffffc0000800(slab|node=1|zone=2|lastcpupid=0x1fffff)
        page_type: 0xffffffff()
        raw: 0057ffffc0000800 ffff88810004c640 ffffea00121b8ac0 dead000000000006
        raw: ffff888485950700 0000000000200019 00000001ffffffff 0000000000000000
        page dumped because: kasan: bad access detected
      
        Memory state around the buggy address:
         ffff888485950780: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
         ffff888485950800: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
        >ffff888485950880: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
                           ^
         ffff888485950900: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
         ffff888485950980: 00 00 00 00 00 01 fc fc fc fc fc fc fc fc fc fc
      
      Something similar (a refcount underflow) happens with CALIPSO/IPv6. Fix
      this by duplicating IP / IPv6 options after clone, so that
      ip{,6}_sock_destruct() doesn't end up freeing the same memory area twice.
      
      Fixes: cf7da0d6
      
       ("mptcp: Create SUBFLOW socket for incoming connections")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-8-162e87e48497@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      10048689
    • Geliang Tang's avatar
      selftests: mptcp: join: add ss mptcp support check · 9480f388
      Geliang Tang authored
      Commands 'ss -M' are used in script mptcp_join.sh to display only MPTCP
      sockets. So it must be checked if ss tool supports MPTCP in this script.
      
      Fixes: e274f715
      
       ("selftests: mptcp: add subflow limits test-cases")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGeliang Tang <tanggeliang@kylinos.cn>
      Reviewed-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-7-162e87e48497@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9480f388
    • Paolo Abeni's avatar
      mptcp: fix potential wake-up event loss · b111d8fb
      Paolo Abeni authored
      After the blamed commit below, the send buffer auto-tuning can
      happen after that the mptcp_propagate_sndbuf() completes - via
      the delegated action infrastructure.
      
      We must check for write space even after such change or we risk
      missing the wake-up event.
      
      Fixes: 8005184f
      
       ("mptcp: refactor sndbuf auto-tuning")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-6-162e87e48497@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b111d8fb
    • Paolo Abeni's avatar
      mptcp: fix snd_wnd initialization for passive socket · adf1bb78
      Paolo Abeni authored
      Such value should be inherited from the first subflow, but
      passive sockets always used 'rsk_rcv_wnd'.
      
      Fixes: 6f8a612a
      
       ("mptcp: keep track of advertised windows right edge")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-5-162e87e48497@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      adf1bb78
    • Paolo Abeni's avatar
      mptcp: push at DSS boundaries · b9cd26f6
      Paolo Abeni authored
      when inserting not contiguous data in the subflow write queue,
      the protocol creates a new skb and prevent the TCP stack from
      merging it later with already queued skbs by setting the EOR marker.
      
      Still no push flag is explicitly set at the end of previous GSO
      packet, making the aggregation on the receiver side sub-optimal -
      and packetdrill self-tests less predictable.
      
      Explicitly mark the end of not contiguous DSS with the push flag.
      
      Fixes: 6d0060f6
      
       ("mptcp: Write MPTCP DSS headers to outgoing data packets")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-4-162e87e48497@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b9cd26f6
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: avoid printing warning once on client side · 5b49c41a
      Matthieu Baerts (NGI0) authored
      After the 'Fixes' commit mentioned below, the client side might print
      the following warning once when a subflow is fully established at the
      reception of any valid additional ack:
      
        MPTCP: bogus mpc option on established client sk
      
      That's a normal situation, and no warning should be printed for that. We
      can then skip the check when the label is used.
      
      Fixes: e4a0fa47
      
       ("mptcp: corner case locking for rx path fields initialization")
      Cc: stable@vger.kernel.org
      Suggested-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-3-162e87e48497@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5b49c41a
    • Geliang Tang's avatar
      selftests: mptcp: rm subflow with v4/v4mapped addr · 7092dbee
      Geliang Tang authored
      Now both a v4 address and a v4-mapped address are supported when
      destroying a userspace pm subflow, this patch adds a second subflow
      to "userspace pm add & remove address" test, and two subflows could
      be removed two different ways, one with the v4mapped and one with v4.
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/387
      Fixes: 48d73f60
      
       ("selftests: mptcp: update userspace pm addr tests")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGeliang Tang <tanggeliang@kylinos.cn>
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Reviewed-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-2-162e87e48497@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7092dbee
    • Geliang Tang's avatar
      mptcp: map v4 address to v6 when destroying subflow · 535d620e
      Geliang Tang authored
      Address family of server side mismatches with that of client side, like
      in "userspace pm add & remove address" test:
      
          userspace_pm_add_addr $ns1 10.0.2.1 10
          userspace_pm_rm_sf $ns1 "::ffff:10.0.2.1" $SUB_ESTABLISHED
      
      That's because on the server side, the family is set to AF_INET6 and the
      v4 address is mapped in a v6 one.
      
      This patch fixes this issue. In mptcp_pm_nl_subflow_destroy_doit(), before
      checking local address family with remote address family, map an IPv4
      address to an IPv6 address if the pair is a v4-mapped address.
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/387
      Fixes: 702c2f64
      
       ("mptcp: netlink: allow userspace-driven subflow establishment")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGeliang Tang <tanggeliang@kylinos.cn>
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Reviewed-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-1-162e87e48497@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      535d620e
    • Eric Dumazet's avatar
      dpll: rely on rcu for netdev_dpll_pin() · 0d60d8df
      Eric Dumazet authored
      This fixes a possible UAF in if_nlmsg_size(),
      which can run without RTNL.
      
      Add rcu protection to "struct dpll_pin"
      
      Move netdev_dpll_pin() from netdevice.h to dpll.h to
      decrease name pollution.
      
      Note: This looks possible to no longer acquire RTNL in
      netdev_dpll_pin_assign() later in net-next.
      
      v2: do not force rcu_read_lock() in rtnl_dpll_pin_size() (Jiri Pirko)
      
      Fixes: 5f184269
      
       ("netdev: expose DPLL pin handle for netdevice")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
      Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/20240223123208.3543319-1-edumazet@google.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0d60d8df
    • Oleksij Rempel's avatar
      lan78xx: enable auto speed configuration for LAN7850 if no EEPROM is detected · 0e67899a
      Oleksij Rempel authored
      Same as LAN7800, LAN7850 can be used without EEPROM. If EEPROM is not
      present or not flashed, LAN7850 will fail to sync the speed detected by the PHY
      with the MAC. In case link speed is 100Mbit, it will accidentally work,
      otherwise no data can be transferred.
      
      Better way would be to implement link_up callback, or set auto speed
      configuration unconditionally. But this changes would be more intrusive.
      So, for now, set it only if no EEPROM is found.
      
      Fixes: e69647a1
      
       ("lan78xx: Set ASD in MAC_CR when EEE is enabled.")
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Link: https://lore.kernel.org/r/20240222123839.2816561-1-o.rempel@pengutronix.de
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0e67899a
  5. Feb 26, 2024