Skip to content
  1. Aug 04, 2023
  2. Aug 03, 2023
    • David Howells's avatar
      udp: Fix __ip_append_data()'s handling of MSG_SPLICE_PAGES · 0f71c9ca
      David Howells authored
      __ip_append_data() can get into an infinite loop when asked to splice into
      a partially-built UDP message that has more than the frag-limit data and up
      to the MTU limit.  Something like:
      
              pipe(pfd);
              sfd = socket(AF_INET, SOCK_DGRAM, 0);
              connect(sfd, ...);
              send(sfd, buffer, 8161, MSG_CONFIRM|MSG_MORE);
              write(pfd[1], buffer, 8);
              splice(pfd[0], 0, sfd, 0, 0x4ffe0ul, 0);
      
      where the amount of data given to send() is dependent on the MTU size (in
      this instance an interface with an MTU of 8192).
      
      The problem is that the calculation of the amount to copy in
      __ip_append_data() goes negative in two places, and, in the second place,
      this gets subtracted from the length remaining, thereby increasing it.
      
      This happens when pagedlen > 0 (which happens for MSG_ZEROCOPY and
      MSG_SPLICE_PAGES), because the terms in:
      
              copy = datalen - transhdrlen - fraggap - pagedlen;
      
      then mostly cancel when pagedlen is substituted for, leaving just -fraggap.
      This causes:
      
              length -= copy + transhdrlen;
      
      to increase the length to more than the amount of data in msg->msg_iter,
      which causes skb_splice_from_iter() to be unable to fill the request and it
      returns less than 'copied' - which means that length never gets to 0 and we
      never exit the loop.
      
      Fix this by:
      
       (1) Insert a note about the dodgy calculation of 'copy'.
      
       (2) If MSG_SPLICE_PAGES, clear copy if it is negative from the above
           equation, so that 'offset' isn't regressed and 'length' isn't
           increased, which will mean that length and thus copy should match the
           amount left in the iterator.
      
       (3) When handling MSG_SPLICE_PAGES, give a warning and return -EIO if
           we're asked to splice more than is in the iterator.  It might be
           better to not give the warning or even just give a 'short' write.
      
      [!] Note that this ought to also affect MSG_ZEROCOPY, but MSG_ZEROCOPY
      avoids the problem by simply assuming that everything asked for got copied,
      not just the amount that was in the iterator.  This is a potential bug for
      the future.
      
      Fixes: 7ac7c987
      
       ("udp: Convert udp_sendpage() to use MSG_SPLICE_PAGES")
      Reported-by: default avatar <syzbot+f527b971b4bdc8e79f9e@syzkaller.appspotmail.com>
      Link: https://lore.kernel.org/r/000000000000881d0606004541d1@google.com/
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: David Ahern <dsahern@kernel.org>
      cc: Jens Axboe <axboe@kernel.dk>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/1420063.1690904933@warthog.procyon.org.uk
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0f71c9ca
    • Jakub Kicinski's avatar
      Merge branch 'mlx5-ipsec-fixes' · a2d9831d
      Jakub Kicinski authored
      
      
      Leon Romanovsky says:
      
      ====================
      mlx5 IPsec fixes
      
      The following patches are combination of Jianbo's work on IPsec eswitch mode
      together with our internal review toward addition of TCP protocol selectors
      support to IPSec packet offload.
      
      Despite not-being fix, the first patch helps us to make second one more
      clear, so I'm asking to apply it anyway as part of this series.
      ====================
      
      Link: https://lore.kernel.org/r/cover.1690803944.git.leonro@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a2d9831d
    • Leon Romanovsky's avatar
      net/mlx5e: Set proper IPsec source port in L4 selector · 62da0833
      Leon Romanovsky authored
      Fix typo in setup_fte_upper_proto_match() where destination UDP port
      was used instead of source port.
      
      Fixes: a7385187
      
       ("net/mlx5e: IPsec, support upper protocol selector field offload")
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/ffc024a4d192113103f392b0502688366ca88c1f.1690803944.git.leonro@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      62da0833
    • Jianbo Liu's avatar
      net/mlx5: fs_core: Skip the FTs in the same FS_TYPE_PRIO_CHAINS fs_prio · c635ca45
      Jianbo Liu authored
      In the cited commit, new type of FS_TYPE_PRIO_CHAINS fs_prio was added
      to support multiple parallel namespaces for multi-chains. And we skip
      all the flow tables under the fs_node of this type unconditionally,
      when searching for the next or previous flow table to connect for a
      new table.
      
      As this search function is also used for find new root table when the
      old one is being deleted, it will skip the entire FS_TYPE_PRIO_CHAINS
      fs_node next to the old root. However, new root table should be chosen
      from it if there is any table in it. Fix it by skipping only the flow
      tables in the same FS_TYPE_PRIO_CHAINS fs_node when finding the
      closest FT for a fs_node.
      
      Besides, complete the connecting from FTs of previous priority of prio
      because there should be multiple prevs after this fs_prio type is
      introduced. And also the next FT should be chosen from the first flow
      table next to the prio in the same FS_TYPE_PRIO_CHAINS fs_prio, if
      this prio is the first child.
      
      Fixes: 328edb49
      
       ("net/mlx5: Split FDB fast path prio to multiple namespaces")
      Signed-off-by: default avatarJianbo Liu <jianbol@nvidia.com>
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/7a95754df479e722038996c97c97b062b372591f.1690803944.git.leonro@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c635ca45
    • Jianbo Liu's avatar
      net/mlx5: fs_core: Make find_closest_ft more generic · 618d28a5
      Jianbo Liu authored
      
      
      As find_closest_ft_recursive is called to find the closest FT, the
      first parameter of find_closest_ft can be changed from fs_prio to
      fs_node. Thus this function is extended to find the closest FT for the
      nodes of any type, not only prios, but also the sub namespaces.
      
      Signed-off-by: default avatarJianbo Liu <jianbol@nvidia.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/d3962c2b443ec8dde7a740dc742a1f052d5e256c.1690803944.git.leonro@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      618d28a5
  3. Aug 02, 2023
    • Benjamin Poirier's avatar
      vxlan: Fix nexthop hash size · 0756384f
      Benjamin Poirier authored
      The nexthop code expects a 31 bit hash, such as what is returned by
      fib_multipath_hash() and rt6_multipath_hash(). Passing the 32 bit hash
      returned by skb_get_hash() can lead to problems related to the fact that
      'int hash' is a negative number when the MSB is set.
      
      In the case of hash threshold nexthop groups, nexthop_select_path_hthr()
      will disproportionately select the first nexthop group entry. In the case
      of resilient nexthop groups, nexthop_select_path_res() may do an out of
      bounds access in nh_buckets[], for example:
          hash = -912054133
          num_nh_buckets = 2
          bucket_index = 65535
      
      which leads to the following panic:
      
      BUG: unable to handle page fault for address: ffffc900025910c8
      PGD 100000067 P4D 100000067 PUD 10026b067 PMD 0
      Oops: 0002 [#1] PREEMPT SMP KASAN NOPTI
      CPU: 4 PID: 856 Comm: kworker/4:3 Not tainted 6.5.0-rc2+ #34
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
      Workqueue: ipv6_addrconf addrconf_dad_work
      RIP: 0010:nexthop_select_path+0x197/0xbf0
      Code: c1 e4 05 be 08 00 00 00 4c 8b 35 a4 14 7e 01 4e 8d 6c 25 00 4a 8d 7c 25 08 48 01 dd e8 c2 25 15 ff 49 8d 7d 08 e8 39 13 15 ff <4d> 89 75 08 48 89 ef e8 7d 12 15 ff 48 8b 5d 00 e8 14 55 2f 00 85
      RSP: 0018:ffff88810c36f260 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 00000000002000c0 RCX: ffffffffaf02dd77
      RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffc900025910c8
      RBP: ffffc900025910c0 R08: 0000000000000001 R09: fffff520004b2219
      R10: ffffc900025910cf R11: 31392d2068736168 R12: 00000000002000c0
      R13: ffffc900025910c0 R14: 00000000fffef608 R15: ffff88811840e900
      FS:  0000000000000000(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffc900025910c8 CR3: 0000000129d00000 CR4: 0000000000750ee0
      PKRU: 55555554
      Call Trace:
       <TASK>
       ? __die+0x23/0x70
       ? page_fault_oops+0x1ee/0x5c0
       ? __pfx_is_prefetch.constprop.0+0x10/0x10
       ? __pfx_page_fault_oops+0x10/0x10
       ? search_bpf_extables+0xfe/0x1c0
       ? fixup_exception+0x3b/0x470
       ? exc_page_fault+0xf6/0x110
       ? asm_exc_page_fault+0x26/0x30
       ? nexthop_select_path+0x197/0xbf0
       ? nexthop_select_path+0x197/0xbf0
       ? lock_is_held_type+0xe7/0x140
       vxlan_xmit+0x5b2/0x2340
       ? __lock_acquire+0x92b/0x3370
       ? __pfx_vxlan_xmit+0x10/0x10
       ? __pfx___lock_acquire+0x10/0x10
       ? __pfx_register_lock_class+0x10/0x10
       ? skb_network_protocol+0xce/0x2d0
       ? dev_hard_start_xmit+0xca/0x350
       ? __pfx_vxlan_xmit+0x10/0x10
       dev_hard_start_xmit+0xca/0x350
       __dev_queue_xmit+0x513/0x1e20
       ? __pfx___dev_queue_xmit+0x10/0x10
       ? __pfx_lock_release+0x10/0x10
       ? mark_held_locks+0x44/0x90
       ? skb_push+0x4c/0x80
       ? eth_header+0x81/0xe0
       ? __pfx_eth_header+0x10/0x10
       ? neigh_resolve_output+0x215/0x310
       ? ip6_finish_output2+0x2ba/0xc90
       ip6_finish_output2+0x2ba/0xc90
       ? lock_release+0x236/0x3e0
       ? ip6_mtu+0xbb/0x240
       ? __pfx_ip6_finish_output2+0x10/0x10
       ? find_held_lock+0x83/0xa0
       ? lock_is_held_type+0xe7/0x140
       ip6_finish_output+0x1ee/0x780
       ip6_output+0x138/0x460
       ? __pfx_ip6_output+0x10/0x10
       ? __pfx___lock_acquire+0x10/0x10
       ? __pfx_ip6_finish_output+0x10/0x10
       NF_HOOK.constprop.0+0xc0/0x420
       ? __pfx_NF_HOOK.constprop.0+0x10/0x10
       ? ndisc_send_skb+0x2c0/0x960
       ? __pfx_lock_release+0x10/0x10
       ? __local_bh_enable_ip+0x93/0x110
       ? lock_is_held_type+0xe7/0x140
       ndisc_send_skb+0x4be/0x960
       ? __pfx_ndisc_send_skb+0x10/0x10
       ? mark_held_locks+0x65/0x90
       ? find_held_lock+0x83/0xa0
       ndisc_send_ns+0xb0/0x110
       ? __pfx_ndisc_send_ns+0x10/0x10
       addrconf_dad_work+0x631/0x8e0
       ? lock_acquire+0x180/0x3f0
       ? __pfx_addrconf_dad_work+0x10/0x10
       ? mark_held_locks+0x24/0x90
       process_one_work+0x582/0x9c0
       ? __pfx_process_one_work+0x10/0x10
       ? __pfx_do_raw_spin_lock+0x10/0x10
       ? mark_held_locks+0x24/0x90
       worker_thread+0x93/0x630
       ? __kthread_parkme+0xdc/0x100
       ? __pfx_worker_thread+0x10/0x10
       kthread+0x1a5/0x1e0
       ? __pfx_kthread+0x10/0x10
       ret_from_fork+0x34/0x60
       ? __pfx_kthread+0x10/0x10
       ret_from_fork_asm+0x1b/0x30
      RIP: 0000:0x0
      Code: Unable to access opcode bytes at 0xffffffffffffffd6.
      RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
       </TASK>
      Modules linked in:
      CR2: ffffc900025910c8
      ---[ end trace 0000000000000000 ]---
      RIP: 0010:nexthop_select_path+0x197/0xbf0
      Code: c1 e4 05 be 08 00 00 00 4c 8b 35 a4 14 7e 01 4e 8d 6c 25 00 4a 8d 7c 25 08 48 01 dd e8 c2 25 15 ff 49 8d 7d 08 e8 39 13 15 ff <4d> 89 75 08 48 89 ef e8 7d 12 15 ff 48 8b 5d 00 e8 14 55 2f 00 85
      RSP: 0018:ffff88810c36f260 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 00000000002000c0 RCX: ffffffffaf02dd77
      RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffc900025910c8
      RBP: ffffc900025910c0 R08: 0000000000000001 R09: fffff520004b2219
      R10: ffffc900025910cf R11: 31392d2068736168 R12: 00000000002000c0
      R13: ffffc900025910c0 R14: 00000000fffef608 R15: ffff88811840e900
      FS:  0000000000000000(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffffffffd6 CR3: 0000000129d00000 CR4: 0000000000750ee0
      PKRU: 55555554
      Kernel panic - not syncing: Fatal exception in interrupt
      Kernel Offset: 0x2ca00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
      
      Fix this problem by ensuring the MSB of hash is 0 using a right shift - the
      same approach used in fib_multipath_hash() and rt6_multipath_hash().
      
      Fixes: 1274e1cc
      
       ("vxlan: ecmp support for mac fdb entries")
      Signed-off-by: default avatarBenjamin Poirier <bpoirier@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0756384f
    • Yue Haibing's avatar
      ip6mr: Fix skb_under_panic in ip6mr_cache_report() · 30e0191b
      Yue Haibing authored
      skbuff: skb_under_panic: text:ffffffff88771f69 len:56 put:-4
       head:ffff88805f86a800 data:ffff887f5f86a850 tail:0x88 end:0x2c0 dev:pim6reg
       ------------[ cut here ]------------
       kernel BUG at net/core/skbuff.c:192!
       invalid opcode: 0000 [#1] PREEMPT SMP KASAN
       CPU: 2 PID: 22968 Comm: kworker/2:11 Not tainted 6.5.0-rc3-00044-g0a8db05b571a #236
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
       Workqueue: ipv6_addrconf addrconf_dad_work
       RIP: 0010:skb_panic+0x152/0x1d0
       Call Trace:
        <TASK>
        skb_push+0xc4/0xe0
        ip6mr_cache_report+0xd69/0x19b0
        reg_vif_xmit+0x406/0x690
        dev_hard_start_xmit+0x17e/0x6e0
        __dev_queue_xmit+0x2d6a/0x3d20
        vlan_dev_hard_start_xmit+0x3ab/0x5c0
        dev_hard_start_xmit+0x17e/0x6e0
        __dev_queue_xmit+0x2d6a/0x3d20
        neigh_connected_output+0x3ed/0x570
        ip6_finish_output2+0x5b5/0x1950
        ip6_finish_output+0x693/0x11c0
        ip6_output+0x24b/0x880
        NF_HOOK.constprop.0+0xfd/0x530
        ndisc_send_skb+0x9db/0x1400
        ndisc_send_rs+0x12a/0x6c0
        addrconf_dad_completed+0x3c9/0xea0
        addrconf_dad_work+0x849/0x1420
        process_one_work+0xa22/0x16e0
        worker_thread+0x679/0x10c0
        ret_from_fork+0x28/0x60
        ret_from_fork_asm+0x11/0x20
      
      When setup a vlan device on dev pim6reg, DAD ns packet may sent on reg_vif_xmit().
      reg_vif_xmit()
          ip6mr_cache_report()
              skb_push(skb, -skb_network_offset(pkt));//skb_network_offset(pkt) is 4
      And skb_push declared as:
      	void *skb_push(struct sk_buff *skb, unsigned int len);
      		skb->data -= len;
      		//0xffff88805f86a84c - 0xfffffffc = 0xffff887f5f86a850
      skb->data is set to 0xffff887f5f86a850, which is invalid mem addr, lead to skb_push() fails.
      
      Fixes: 14fb64e1
      
       ("[IPV6] MROUTE: Support PIM-SM (SSM).")
      Signed-off-by: default avatarYue Haibing <yuehaibing@huawei.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30e0191b
    • Alexandra Winter's avatar
      s390/qeth: Don't call dev_close/dev_open (DOWN/UP) · 1cfef80d
      Alexandra Winter authored
      dev_close() and dev_open() are issued to change the interface state to DOWN
      or UP (dev->flags IFF_UP). When the netdev is set DOWN it loses e.g its
      Ipv6 addresses and routes. We don't want this in cases of device recovery
      (triggered by hardware or software) or when the qeth device is set
      offline.
      
      Setting a qeth device offline or online and device recovery actions call
      netif_device_detach() and/or netif_device_attach(). That will reset or
      set the LOWER_UP indication i.e. change the dev->state Bit
      __LINK_STATE_PRESENT. That is enough to e.g. cause bond failovers, and
      still preserves the interface settings that are handled by the network
      stack.
      
      Don't call dev_open() nor dev_close() from the qeth device driver. Let the
      network stack handle this.
      
      Fixes: d4560150
      
       ("s390/qeth: call dev_close() during recovery")
      Signed-off-by: default avatarAlexandra Winter <wintera@linux.ibm.com>
      Reviewed-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1cfef80d
    • David S. Miller's avatar
      Merge branch 'tun-tap-uid' · 666c135b
      David S. Miller authored
      
      
      Laszlo Ersek says:
      
      ====================
      tun/tap: set sk_uid from current_fsuid()
      
      The original patches fixing CVE-2023-1076 are incorrect in my opinion.
      This small series fixes them up; see the individual commit messages for
      explanation.
      
      I have a very elaborate test procedure demonstrating the problem for
      both tun and tap; it involves libvirt, qemu, and "crash". I can share
      that procedure if necessary, but it's indeed quite long (I wrote it
      originally for our QE team).
      
      The patches in this series are supposed to "re-fix" CVE-2023-1076; given
      that said CVE is classified as Low Impact (CVSSv3=5.5), I'm posting this
      publicly, and not suggesting any embargo. Red Hat Product Security may
      assign a new CVE number later.
      
      I've tested the patches on top of v6.5-rc4, with "crash" built at commit
      c74f375e0ef7.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Pietro Borrello <borrello@diag.uniroma1.it>
      Cc: netdev@vger.kernel.org
      Cc: stable@vger.kernel.org
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      666c135b
    • Laszlo Ersek's avatar
      net: tap_open(): set sk_uid from current_fsuid() · 5c9241f3
      Laszlo Ersek authored
      Commit 66b2c338 initializes the "sk_uid" field in the protocol socket
      (struct sock) from the "/dev/tapX" device node's owner UID. Per original
      commit 86741ec2 ("net: core: Add a UID field to struct sock.",
      2016-11-04), that's wrong: the idea is to cache the UID of the userspace
      process that creates the socket. Commit 86741ec2 mentions socket() and
      accept(); with "tap", the action that creates the socket is
      open("/dev/tapX").
      
      Therefore the device node's owner UID is irrelevant. In most cases,
      "/dev/tapX" will be owned by root, so in practice, commit 66b2c338 has
      no observable effect:
      
      - before, "sk_uid" would be zero, due to undefined behavior
        (CVE-2023-1076),
      
      - after, "sk_uid" would be zero, due to "/dev/tapX" being owned by root.
      
      What matters is the (fs)UID of the process performing the open(), so cache
      that in "sk_uid".
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Pietro Borrello <borrello@diag.uniroma1.it>
      Cc: netdev@vger.kernel.org
      Cc: stable@vger.kernel.org
      Fixes: 66b2c338
      
       ("tap: tap_open(): correctly initialize socket uid")
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2173435
      Signed-off-by: default avatarLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5c9241f3
    • Laszlo Ersek's avatar
      net: tun_chr_open(): set sk_uid from current_fsuid() · 9bc30473
      Laszlo Ersek authored
      Commit a096ccca initializes the "sk_uid" field in the protocol socket
      (struct sock) from the "/dev/net/tun" device node's owner UID. Per
      original commit 86741ec2 ("net: core: Add a UID field to struct
      sock.", 2016-11-04), that's wrong: the idea is to cache the UID of the
      userspace process that creates the socket. Commit 86741ec2 mentions
      socket() and accept(); with "tun", the action that creates the socket is
      open("/dev/net/tun").
      
      Therefore the device node's owner UID is irrelevant. In most cases,
      "/dev/net/tun" will be owned by root, so in practice, commit a096ccca
      has no observable effect:
      
      - before, "sk_uid" would be zero, due to undefined behavior
        (CVE-2023-1076),
      
      - after, "sk_uid" would be zero, due to "/dev/net/tun" being owned by root.
      
      What matters is the (fs)UID of the process performing the open(), so cache
      that in "sk_uid".
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Pietro Borrello <borrello@diag.uniroma1.it>
      Cc: netdev@vger.kernel.org
      Cc: stable@vger.kernel.org
      Fixes: a096ccca
      
       ("tun: tun_chr_open(): correctly initialize socket uid")
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2173435
      Signed-off-by: default avatarLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9bc30473
    • Lin Ma's avatar
      net: dcb: choose correct policy to parse DCB_ATTR_BCN · 31d49ba0
      Lin Ma authored
      The dcbnl_bcn_setcfg uses erroneous policy to parse tb[DCB_ATTR_BCN],
      which is introduced in commit 859ee3c4 ("DCB: Add support for DCB
      BCN"). Please see the comment in below code
      
      static int dcbnl_bcn_setcfg(...)
      {
        ...
        ret = nla_parse_nested_deprecated(..., dcbnl_pfc_up_nest, .. )
        // !!! dcbnl_pfc_up_nest for attributes
        //  DCB_PFC_UP_ATTR_0 to DCB_PFC_UP_ATTR_ALL in enum dcbnl_pfc_up_attrs
        ...
        for (i = DCB_BCN_ATTR_RP_0; i <= DCB_BCN_ATTR_RP_7; i++) {
        // !!! DCB_BCN_ATTR_RP_0 to DCB_BCN_ATTR_RP_7 in enum dcbnl_bcn_attrs
          ...
          value_byte = nla_get_u8(data[i]);
          ...
        }
        ...
        for (i = DCB_BCN_ATTR_BCNA_0; i <= DCB_BCN_ATTR_RI; i++) {
        // !!! DCB_BCN_ATTR_BCNA_0 to DCB_BCN_ATTR_RI in enum dcbnl_bcn_attrs
        ...
          value_int = nla_get_u32(data[i]);
        ...
        }
        ...
      }
      
      That is, the nla_parse_nested_deprecated uses dcbnl_pfc_up_nest
      attributes to parse nlattr defined in dcbnl_pfc_up_attrs. But the
      following access code fetch each nlattr as dcbnl_bcn_attrs attributes.
      By looking up the associated nla_policy for dcbnl_bcn_attrs. We can find
      the beginning part of these two policies are "same".
      
      static const struct nla_policy dcbnl_pfc_up_nest[...] = {
              [DCB_PFC_UP_ATTR_0]   = {.type = NLA_U8},
              [DCB_PFC_UP_ATTR_1]   = {.type = NLA_U8},
              [DCB_PFC_UP_ATTR_2]   = {.type = NLA_U8},
              [DCB_PFC_UP_ATTR_3]   = {.type = NLA_U8},
              [DCB_PFC_UP_ATTR_4]   = {.type = NLA_U8},
              [DCB_PFC_UP_ATTR_5]   = {.type = NLA_U8},
              [DCB_PFC_UP_ATTR_6]   = {.type = NLA_U8},
              [DCB_PFC_UP_ATTR_7]   = {.type = NLA_U8},
              [DCB_PFC_UP_ATTR_ALL] = {.type = NLA_FLAG},
      };
      
      static const struct nla_policy dcbnl_bcn_nest[...] = {
              [DCB_BCN_ATTR_RP_0]         = {.type = NLA_U8},
              [DCB_BCN_ATTR_RP_1]         = {.type = NLA_U8},
              [DCB_BCN_ATTR_RP_2]         = {.type = NLA_U8},
              [DCB_BCN_ATTR_RP_3]         = {.type = NLA_U8},
              [DCB_BCN_ATTR_RP_4]         = {.type = NLA_U8},
              [DCB_BCN_ATTR_RP_5]         = {.type = NLA_U8},
              [DCB_BCN_ATTR_RP_6]         = {.type = NLA_U8},
              [DCB_BCN_ATTR_RP_7]         = {.type = NLA_U8},
              [DCB_BCN_ATTR_RP_ALL]       = {.type = NLA_FLAG},
              // from here is somewhat different
              [DCB_BCN_ATTR_BCNA_0]       = {.type = NLA_U32},
              ...
              [DCB_BCN_ATTR_ALL]          = {.type = NLA_FLAG},
      };
      
      Therefore, the current code is buggy and this
      nla_parse_nested_deprecated could overflow the dcbnl_pfc_up_nest and use
      the adjacent nla_policy to parse attributes from DCB_BCN_ATTR_BCNA_0.
      
      Hence use the correct policy dcbnl_bcn_nest to parse the nested
      tb[DCB_ATTR_BCN] TLV.
      
      Fixes: 859ee3c4
      
       ("DCB: Add support for DCB BCN")
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20230801013248.87240-1-linma@zju.edu.cn
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      31d49ba0
    • Jakub Kicinski's avatar
      Merge branch 'bnxt_en-2-xdp-bug-fixes' · 4a4474e3
      Jakub Kicinski authored
      
      
      Michael Chan says:
      
      ====================
      bnxt_en: 2 XDP bug fixes
      
      The first patch fixes XDP page pool logic on systems with page size >=
      64K.  The second patch fixes the max_mtu setting when an XDP program
      supporting multi buffers is attached.
      ====================
      
      Link: https://lore.kernel.org/r/20230731142043.58855-1-michael.chan@broadcom.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4a4474e3
    • Michael Chan's avatar
      bnxt_en: Fix max_mtu setting for multi-buf XDP · 08450ea9
      Michael Chan authored
      The existing code does not allow the MTU to be set to the maximum even
      after an XDP program supporting multiple buffers is attached.  Fix it
      to set the netdev->max_mtu to the maximum value if the attached XDP
      program supports mutiple buffers, regardless of the current MTU value.
      
      Also use a local variable dev instead of repeatedly using bp->dev.
      
      Fixes: 1dc4c557
      
       ("bnxt: adding bnxt_xdp_build_skb to build skb from multibuffer xdp_buff")
      Reviewed-by: default avatarSomnath Kotur <somnath.kotur@broadcom.com>
      Reviewed-by: default avatarAjit Khaparde <ajit.khaparde@broadcom.com>
      Reviewed-by: default avatarAndy Gospodarek <andrew.gospodarek@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Link: https://lore.kernel.org/r/20230731142043.58855-3-michael.chan@broadcom.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      08450ea9
    • Somnath Kotur's avatar
      bnxt_en: Fix page pool logic for page size >= 64K · f6974b4c
      Somnath Kotur authored
      The RXBD length field on all bnxt chips is 16-bit and so we cannot
      support a full page when the native page size is 64K or greater.
      The non-XDP (non page pool) code path has logic to handle this but
      the XDP page pool code path does not handle this.  Add the missing
      logic to use page_pool_dev_alloc_frag() to allocate 32K chunks if
      the page size is 64K or greater.
      
      Fixes: 9f4b2830
      
       ("bnxt: XDP multibuffer enablement")
      Link: https://lore.kernel.org/netdev/20230728231829.235716-2-michael.chan@broadcom.com/
      Reviewed-by: default avatarAndy Gospodarek <andrew.gospodarek@broadcom.com>
      Signed-off-by: default avatarSomnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Link: https://lore.kernel.org/r/20230731142043.58855-2-michael.chan@broadcom.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f6974b4c
    • Kuniyuki Iwashima's avatar
      selftest: net: Assert on a proper value in so_incoming_cpu.c. · 3ff16174
      Kuniyuki Iwashima authored
      Dan Carpenter reported an error spotted by Smatch.
      
        ./tools/testing/selftests/net/so_incoming_cpu.c:163 create_clients()
        error: uninitialized symbol 'ret'.
      
      The returned value of sched_setaffinity() should be checked with
      ASSERT_EQ(), but the value was not saved in a proper variable,
      resulting in an error above.
      
      Let's save the returned value of with sched_setaffinity().
      
      Fixes: 6df96146
      
       ("selftest: Add test for SO_INCOMING_CPU.")
      Reported-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Closes: https://lore.kernel.org/linux-kselftest/fe376760-33b6-4fc9-88e8-178e809af1ac@moroto.mountain/
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230731181553.5392-1-kuniyu@amazon.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3ff16174
    • Mark Brown's avatar
      net: netsec: Ignore 'phy-mode' on SynQuacer in DT mode · f3bb7759
      Mark Brown authored
      As documented in acd7aaf5 ("netsec: ignore 'phy-mode' device
      property on ACPI systems") the SocioNext SynQuacer platform ships with
      firmware defining the PHY mode as RGMII even though the physical
      configuration of the PHY is for TX and RX delays.  Since bbc4d71d
      ("net: phy: realtek: fix rtl8211e rx/tx delay config") this has caused
      misconfiguration of the PHY, rendering the network unusable.
      
      This was worked around for ACPI by ignoring the phy-mode property but
      the system is also used with DT.  For DT instead if we're running on a
      SynQuacer force a working PHY mode, as well as the standard EDK2
      firmware with DT there are also some of these systems that use u-boot
      and might not initialise the PHY if not netbooting.  Newer firmware
      imagaes for at least EDK2 are available from Linaro so print a warning
      when doing this.
      
      Fixes: 533dd11a
      
       ("net: socionext: Add Synquacer NetSec driver")
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Acked-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Acked-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20230731-synquacer-net-v3-1-944be5f06428@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f3bb7759
    • Yuanjun Gong's avatar
      net: korina: handle clk prepare error in korina_probe() · 0b6291ad
      Yuanjun Gong authored
      in korina_probe(), the return value of clk_prepare_enable()
      should be checked since it might fail. we can use
      devm_clk_get_optional_enabled() instead of devm_clk_get_optional()
      and clk_prepare_enable() to automatically handle the error.
      
      Fixes: e4cd854e
      
       ("net: korina: Get mdio input clock via common clock framework")
      Signed-off-by: default avatarYuanjun Gong <ruc_gongyuanjun@163.com>
      Link: https://lore.kernel.org/r/20230731090535.21416-1-ruc_gongyuanjun@163.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0b6291ad
    • Ross Maynard's avatar
      USB: zaurus: Add ID for A-300/B-500/C-700 · b99225b4
      Ross Maynard authored
      The SL-A300, B500/5600, and C700 devices no longer auto-load because of
      "usbnet: Remove over-broad module alias from zaurus."
      This patch adds IDs for those 3 devices.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=217632
      Fixes: 16adf5d0
      
       ("usbnet: Remove over-broad module alias from zaurus.")
      Signed-off-by: default avatarRoss Maynard <bids.7405@bigpond.com>
      Cc: stable@vger.kernel.org
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/69b5423b-2013-9fc9-9569-58e707d9bafb@bigpond.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b99225b4
    • Dan Carpenter's avatar
      net: ll_temac: fix error checking of irq_of_parse_and_map() · ef45e840
      Dan Carpenter authored
      Most kernel functions return negative error codes but some irq functions
      return zero on error.  In this code irq_of_parse_and_map(), returns zero
      and platform_get_irq() returns negative error codes.  We need to handle
      both cases appropriately.
      
      Fixes: 8425c41d
      
       ("net: ll_temac: Extend support to non-device-tree platforms")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Acked-by: default avatarEsben Haabendal <esben@geanix.com>
      Reviewed-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Reviewed-by: default avatarHarini Katakam <harini.katakam@amd.com>
      Link: https://lore.kernel.org/r/3d0aef75-06e0-45a5-a2a6-2cc4738d4143@moroto.mountain
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ef45e840
  4. Aug 01, 2023
  5. Jul 31, 2023
    • Duoming Zhou's avatar
      net: usb: lan78xx: reorder cleanup operations to avoid UAF bugs · 1e7417c1
      Duoming Zhou authored
      The timer dev->stat_monitor can schedule the delayed work dev->wq and
      the delayed work dev->wq can also arm the dev->stat_monitor timer.
      
      When the device is detaching, the net_device will be deallocated. but
      the net_device private data could still be dereferenced in delayed work
      or timer handler. As a result, the UAF bugs will happen.
      
      One racy situation is shown below:
      
            (Thread 1)                 |      (Thread 2)
      lan78xx_stat_monitor()           |
       ...                             |  lan78xx_disconnect()
       lan78xx_defer_kevent()          |    ...
        ...                            |    cancel_delayed_work_sync(&dev->wq);
        schedule_delayed_work()        |    ...
        (wait some time)               |    free_netdev(net); //free net_device
        lan78xx_delayedwork()          |
        //use net_device private data  |
        dev-> //use                    |
      
      Although we use cancel_delayed_work_sync() to cancel the delayed work
      in lan78xx_disconnect(), it could still be scheduled in timer handler
      lan78xx_stat_monitor().
      
      Another racy situation is shown below:
      
            (Thread 1)                |      (Thread 2)
      lan78xx_delayedwork             |
       mod_timer()                    |  lan78xx_disconnect()
                                      |   cancel_delayed_work_sync()
       (wait some time)               |   if (timer_pending(&dev->stat_monitor))
                   	                |       del_timer_sync(&dev->stat_monitor);
       lan78xx_stat_monitor()         |   ...
        lan78xx_defer_kevent()        |   free_netdev(net); //free
         //use net_device private data|
         dev-> //use                  |
      
      Although we use del_timer_sync() to delete the timer, the function
      timer_pending() returns 0 when the timer is activated. As a result,
      the del_timer_sync() will not be executed and the timer could be
      re-armed.
      
      In order to mitigate this bug, We use timer_shutdown_sync() to shutdown
      the timer and then use cancel_delayed_work_sync() to cancel the delayed
      work. As a result, the net_device could be deallocated safely.
      
      What's more, the dev->flags is set to EVENT_DEV_DISCONNECT in
      lan78xx_disconnect(). But it could still be set to EVENT_STAT_UPDATE
      in lan78xx_stat_monitor(). So this patch put the set_bit() behind
      timer_shutdown_sync().
      
      Fixes: 77dfff5b
      
       ("lan78xx: Fix race condition in disconnect handling")
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e7417c1
    • Rafał Miłecki's avatar
      dt-bindings: net: mediatek,net: fixup MAC binding · 8469c7f5
      Rafał Miłecki authored
      
      
      1. Use unevaluatedProperties
      It's needed to allow ethernet-controller.yaml properties work correctly.
      
      2. Drop unneeded phy-handle/phy-mode
      
      3. Don't require phy-handle
      Some SoCs may use fixed link.
      
      For in-kernel MT7621 DTS files this fixes following errors:
      arch/mips/boot/dts/ralink/mt7621-tplink-hc220-g5-v1.dtb: ethernet@1e100000: mac@0: 'fixed-link' does not match any of the regexes: 'pinctrl-[0-9]+'
              From schema: Documentation/devicetree/bindings/net/mediatek,net.yaml
      arch/mips/boot/dts/ralink/mt7621-tplink-hc220-g5-v1.dtb: ethernet@1e100000: mac@0: 'phy-handle' is a required property
              From schema: Documentation/devicetree/bindings/net/mediatek,net.yaml
      arch/mips/boot/dts/ralink/mt7621-tplink-hc220-g5-v1.dtb: ethernet@1e100000: mac@1: 'fixed-link' does not match any of the regexes: 'pinctrl-[0-9]+'
              From schema: Documentation/devicetree/bindings/net/mediatek,net.yaml
      arch/mips/boot/dts/ralink/mt7621-tplink-hc220-g5-v1.dtb: ethernet@1e100000: mac@1: 'phy-handle' is a required property
              From schema: Documentation/devicetree/bindings/net/mediatek,net.yaml
      
      Signed-off-by: default avatarRafał Miłecki <rafal@milecki.pl>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8469c7f5
    • Kuniyuki Iwashima's avatar
      net/sched: taprio: Limit TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME to INT_MAX. · e7397184
      Kuniyuki Iwashima authored
      syzkaller found zero division error [0] in div_s64_rem() called from
      get_cycle_time_elapsed(), where sched->cycle_time is the divisor.
      
      We have tests in parse_taprio_schedule() so that cycle_time will never
      be 0, and actually cycle_time is not 0 in get_cycle_time_elapsed().
      
      The problem is that the types of divisor are different; cycle_time is
      s64, but the argument of div_s64_rem() is s32.
      
      syzkaller fed this input and 0x100000000 is cast to s32 to be 0.
      
        @TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME={0xc, 0x8, 0x100000000}
      
      We use s64 for cycle_time to cast it to ktime_t, so let's keep it and
      set max for cycle_time.
      
      While at it, we prevent overflow in setup_txtime() and add another
      test in parse_taprio_schedule() to check if cycle_time overflows.
      
      Also, we add a new tdc test case for this issue.
      
      [0]:
      divide error: 0000 [#1] PREEMPT SMP KASAN NOPTI
      CPU: 1 PID: 103 Comm: kworker/1:3 Not tainted 6.5.0-rc1-00330-g60cc1f7d0605 #3
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      Workqueue: ipv6_addrconf addrconf_dad_work
      RIP: 0010:div_s64_rem include/linux/math64.h:42 [inline]
      RIP: 0010:get_cycle_time_elapsed net/sched/sch_taprio.c:223 [inline]
      RIP: 0010:find_entry_to_transmit+0x252/0x7e0 net/sched/sch_taprio.c:344
      Code: 3c 02 00 0f 85 5e 05 00 00 48 8b 4c 24 08 4d 8b bd 40 01 00 00 48 8b 7c 24 48 48 89 c8 4c 29 f8 48 63 f7 48 99 48 89 74 24 70 <48> f7 fe 48 29 d1 48 8d 04 0f 49 89 cc 48 89 44 24 20 49 8d 85 10
      RSP: 0018:ffffc90000acf260 EFLAGS: 00010206
      RAX: 177450e0347560cf RBX: 0000000000000000 RCX: 177450e0347560cf
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000100000000
      RBP: 0000000000000056 R08: 0000000000000000 R09: ffffed10020a0934
      R10: ffff8880105049a7 R11: ffff88806cf3a520 R12: ffff888010504800
      R13: ffff88800c00d800 R14: ffff8880105049a0 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff88806cf00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f0edf84f0e8 CR3: 000000000d73c002 CR4: 0000000000770ee0
      PKRU: 55555554
      Call Trace:
       <TASK>
       get_packet_txtime net/sched/sch_taprio.c:508 [inline]
       taprio_enqueue_one+0x900/0xff0 net/sched/sch_taprio.c:577
       taprio_enqueue+0x378/0xae0 net/sched/sch_taprio.c:658
       dev_qdisc_enqueue+0x46/0x170 net/core/dev.c:3732
       __dev_xmit_skb net/core/dev.c:3821 [inline]
       __dev_queue_xmit+0x1b2f/0x3000 net/core/dev.c:4169
       dev_queue_xmit include/linux/netdevice.h:3088 [inline]
       neigh_resolve_output net/core/neighbour.c:1552 [inline]
       neigh_resolve_output+0x4a7/0x780 net/core/neighbour.c:1532
       neigh_output include/net/neighbour.h:544 [inline]
       ip6_finish_output2+0x924/0x17d0 net/ipv6/ip6_output.c:135
       __ip6_finish_output+0x620/0xaa0 net/ipv6/ip6_output.c:196
       ip6_finish_output net/ipv6/ip6_output.c:207 [inline]
       NF_HOOK_COND include/linux/netfilter.h:292 [inline]
       ip6_output+0x206/0x410 net/ipv6/ip6_output.c:228
       dst_output include/net/dst.h:458 [inline]
       NF_HOOK.constprop.0+0xea/0x260 include/linux/netfilter.h:303
       ndisc_send_skb+0x872/0xe80 net/ipv6/ndisc.c:508
       ndisc_send_ns+0xb5/0x130 net/ipv6/ndisc.c:666
       addrconf_dad_work+0xc14/0x13f0 net/ipv6/addrconf.c:4175
       process_one_work+0x92c/0x13a0 kernel/workqueue.c:2597
       worker_thread+0x60f/0x1240 kernel/workqueue.c:2748
       kthread+0x2fe/0x3f0 kernel/kthread.c:389
       ret_from_fork+0x2c/0x50 arch/x86/entry/entry_64.S:308
       </TASK>
      Modules linked in:
      
      Fixes: 4cfd5779
      
       ("taprio: Add support for txtime-assist mode")
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Co-developed-by: default avatarEric Dumazet <edumazet@google.com>
      Co-developed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Acked-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7397184
  6. Jul 30, 2023