Skip to content
  1. Feb 01, 2024
    • Zhipeng Lu's avatar
      fjes: fix memleaks in fjes_hw_setup · 5b1086d2
      Zhipeng Lu authored
      [ Upstream commit f6cc4b6a ]
      
      In fjes_hw_setup, it allocates several memory and delay the deallocation
      to the fjes_hw_exit in fjes_probe through the following call chain:
      
      fjes_probe
        |-> fjes_hw_init
              |-> fjes_hw_setup
        |-> fjes_hw_exit
      
      However, when fjes_hw_setup fails, fjes_hw_exit won't be called and thus
      all the resources allocated in fjes_hw_setup will be leaked. In this
      patch, we free those resources in fjes_hw_setup and prevents such leaks.
      
      Fixes: 2fcbca68
      
       ("fjes: platform_driver's .probe and .remove routine")
      Signed-off-by: default avatarZhipeng Lu <alexious@zju.edu.cn>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240122172445.3841883-1-alexious@zju.edu.cn
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5b1086d2
    • Jakub Kicinski's avatar
      selftests: netdevsim: fix the udp_tunnel_nic test · 4b4dcb3f
      Jakub Kicinski authored
      [ Upstream commit 0879020a ]
      
      This test is missing a whole bunch of checks for interface
      renaming and one ifup. Presumably it was only used on a system
      with renaming disabled and NetworkManager running.
      
      Fixes: 91f430b2
      
       ("selftests: net: add a test for UDP tunnel info infra")
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240123060529.1033912-1-kuba@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4b4dcb3f
    • Jenishkumar Maheshbhai Patel's avatar
      net: mvpp2: clear BM pool before initialization · cec65f09
      Jenishkumar Maheshbhai Patel authored
      [ Upstream commit 9f538b41 ]
      
      Register value persist after booting the kernel using
      kexec which results in kernel panic. Thus clear the
      BM pool registers before initialisation to fix the issue.
      
      Fixes: 3f518509
      
       ("ethernet: Add new driver for Marvell Armada 375 network unit")
      Signed-off-by: default avatarJenishkumar Maheshbhai Patel <jpatel2@marvell.com>
      Reviewed-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Link: https://lore.kernel.org/r/20240119035914.25956650
      
      -1-jpatel2@marvell.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      cec65f09
    • Bernd Edlinger's avatar
      net: stmmac: Wait a bit for the reset to take effect · acb6eaf2
      Bernd Edlinger authored
      [ Upstream commit a5f5eee2 ]
      
      otherwise the synopsys_id value may be read out wrong,
      because the GMAC_VERSION register might still be in reset
      state, for at least 1 us after the reset is de-asserted.
      
      Add a wait for 10 us before continuing to be on the safe side.
      
      > From what have you got that delay value?
      
      Just try and error, with very old linux versions and old gcc versions
      the synopsys_id was read out correctly most of the time (but not always),
      with recent linux versions and recnet gcc versions it was read out
      wrongly most of the time, but again not always.
      I don't have access to the VHDL code in question, so I cannot
      tell why it takes so long to get the correct values, I also do not
      have more than a few hardware samples, so I cannot tell how long
      this timeout must be in worst case.
      Experimentally I can tell that the register is read several times
      as zero immediately after the reset is de-asserted, also adding several
      no-ops is not enough, adding a printk is enough, also udelay(1) seems to
      be enough but I tried that not very often, and I have not access to many
      hardware samples to be 100% sure about the necessary delay.
      And since the udelay here is only executed once per device instance,
      it seems acceptable to delay the boot for 10 us.
      
      BTW: my hardware's synopsys id is 0x37.
      
      Fixes: c5e4ddbd
      
       ("net: stmmac: Add support for optional reset control")
      Signed-off-by: default avatarBernd Edlinger <bernd.edlinger@hotmail.de>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Link: https://lore.kernel.org/r/AS8P193MB1285A810BD78C111E7F6AA34E4752@AS8P193MB1285.EURP193.PROD.OUTLOOK.COM
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      acb6eaf2
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: validate NFPROTO_* family · 67ee3736
      Pablo Neira Ayuso authored
      [ Upstream commit d0009eff ]
      
      Several expressions explicitly refer to NF_INET_* hook definitions
      from expr->ops->validate, however, family is not validated.
      
      Bail out with EOPNOTSUPP in case they are used from unsupported
      families.
      
      Fixes: 0ca743a5 ("netfilter: nf_tables: add compatibility layer for x_tables")
      Fixes: a3c90f7a ("netfilter: nf_tables: flow offload expression")
      Fixes: 2fa84193 ("netfilter: nf_tables: introduce routing expression")
      Fixes: 554ced0a ("netfilter: nf_tables: add support for native socket matching")
      Fixes: ad49d86e ("netfilter: nf_tables: Add synproxy support")
      Fixes: 4ed8eb65 ("netfilter: nf_tables: Add native tproxy support")
      Fixes: 6c472602
      
       ("netfilter: nf_tables: add xfrm expression")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      67ee3736
    • Florian Westphal's avatar
      netfilter: nf_tables: restrict anonymous set and map names to 16 bytes · ed5b62bb
      Florian Westphal authored
      [ Upstream commit b462579b ]
      
      nftables has two types of sets/maps, one where userspace defines the
      name, and anonymous sets/maps, where userspace defines a template name.
      
      For the latter, kernel requires presence of exactly one "%d".
      nftables uses "__set%d" and "__map%d" for this.  The kernel will
      expand the format specifier and replaces it with the smallest unused
      number.
      
      As-is, userspace could define a template name that allows to move
      the set name past the 256 bytes upperlimit (post-expansion).
      
      I don't see how this could be a problem, but I would prefer if userspace
      cannot do this, so add a limit of 16 bytes for the '%d' template name.
      
      16 bytes is the old total upper limit for set names that existed when
      nf_tables was merged initially.
      
      Fixes: 38745490
      
       ("netfilter: nf_tables: Allow set names of up to 255 chars")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ed5b62bb
    • Filipe Manana's avatar
      btrfs: fix race between reading a directory and adding entries to it · c25d7922
      Filipe Manana authored
      commit 8e7f82de
      
       upstream.
      
      When opening a directory (opendir(3)) or rewinding it (rewinddir(3)), we
      are not holding the directory's inode locked, and this can result in later
      attempting to add two entries to the directory with the same index number,
      resulting in a transaction abort, with -EEXIST (-17), when inserting the
      second delayed dir index. This results in a trace like the following:
      
        Sep 11 22:34:59 myhostname kernel: BTRFS error (device dm-3): err add delayed dir index item(name: cockroach-stderr.log) into the insertion tree of the delayed node(root id: 5, inode id: 4539217, errno: -17)
        Sep 11 22:34:59 myhostname kernel: ------------[ cut here ]------------
        Sep 11 22:34:59 myhostname kernel: kernel BUG at fs/btrfs/delayed-inode.c:1504!
        Sep 11 22:34:59 myhostname kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
        Sep 11 22:34:59 myhostname kernel: CPU: 0 PID: 7159 Comm: cockroach Not tainted 6.4.15-200.fc38.x86_64 #1
        Sep 11 22:34:59 myhostname kernel: Hardware name: ASUS ESC500 G3/P9D WS, BIOS 2402 06/27/2018
        Sep 11 22:34:59 myhostname kernel: RIP: 0010:btrfs_insert_delayed_dir_index+0x1da/0x260
        Sep 11 22:34:59 myhostname kernel: Code: eb dd 48 (...)
        Sep 11 22:34:59 myhostname kernel: RSP: 0000:ffffa9980e0fbb28 EFLAGS: 00010282
        Sep 11 22:34:59 myhostname kernel: RAX: 0000000000000000 RBX: ffff8b10b8f4a3c0 RCX: 0000000000000000
        Sep 11 22:34:59 myhostname kernel: RDX: 0000000000000000 RSI: ffff8b177ec21540 RDI: ffff8b177ec21540
        Sep 11 22:34:59 myhostname kernel: RBP: ffff8b110cf80888 R08: 0000000000000000 R09: ffffa9980e0fb938
        Sep 11 22:34:59 myhostname kernel: R10: 0000000000000003 R11: ffffffff86146508 R12: 0000000000000014
        Sep 11 22:34:59 myhostname kernel: R13: ffff8b1131ae5b40 R14: ffff8b10b8f4a418 R15: 00000000ffffffef
        Sep 11 22:34:59 myhostname kernel: FS:  00007fb14a7fe6c0(0000) GS:ffff8b177ec00000(0000) knlGS:0000000000000000
        Sep 11 22:34:59 myhostname kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        Sep 11 22:34:59 myhostname kernel: CR2: 000000c00143d000 CR3: 00000001b3b4e002 CR4: 00000000001706f0
        Sep 11 22:34:59 myhostname kernel: Call Trace:
        Sep 11 22:34:59 myhostname kernel:  <TASK>
        Sep 11 22:34:59 myhostname kernel:  ? die+0x36/0x90
        Sep 11 22:34:59 myhostname kernel:  ? do_trap+0xda/0x100
        Sep 11 22:34:59 myhostname kernel:  ? btrfs_insert_delayed_dir_index+0x1da/0x260
        Sep 11 22:34:59 myhostname kernel:  ? do_error_trap+0x6a/0x90
        Sep 11 22:34:59 myhostname kernel:  ? btrfs_insert_delayed_dir_index+0x1da/0x260
        Sep 11 22:34:59 myhostname kernel:  ? exc_invalid_op+0x50/0x70
        Sep 11 22:34:59 myhostname kernel:  ? btrfs_insert_delayed_dir_index+0x1da/0x260
        Sep 11 22:34:59 myhostname kernel:  ? asm_exc_invalid_op+0x1a/0x20
        Sep 11 22:34:59 myhostname kernel:  ? btrfs_insert_delayed_dir_index+0x1da/0x260
        Sep 11 22:34:59 myhostname kernel:  ? btrfs_insert_delayed_dir_index+0x1da/0x260
        Sep 11 22:34:59 myhostname kernel:  btrfs_insert_dir_item+0x200/0x280
        Sep 11 22:34:59 myhostname kernel:  btrfs_add_link+0xab/0x4f0
        Sep 11 22:34:59 myhostname kernel:  ? ktime_get_real_ts64+0x47/0xe0
        Sep 11 22:34:59 myhostname kernel:  btrfs_create_new_inode+0x7cd/0xa80
        Sep 11 22:34:59 myhostname kernel:  btrfs_symlink+0x190/0x4d0
        Sep 11 22:34:59 myhostname kernel:  ? schedule+0x5e/0xd0
        Sep 11 22:34:59 myhostname kernel:  ? __d_lookup+0x7e/0xc0
        Sep 11 22:34:59 myhostname kernel:  vfs_symlink+0x148/0x1e0
        Sep 11 22:34:59 myhostname kernel:  do_symlinkat+0x130/0x140
        Sep 11 22:34:59 myhostname kernel:  __x64_sys_symlinkat+0x3d/0x50
        Sep 11 22:34:59 myhostname kernel:  do_syscall_64+0x5d/0x90
        Sep 11 22:34:59 myhostname kernel:  ? syscall_exit_to_user_mode+0x2b/0x40
        Sep 11 22:34:59 myhostname kernel:  ? do_syscall_64+0x6c/0x90
        Sep 11 22:34:59 myhostname kernel:  entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      The race leading to the problem happens like this:
      
      1) Directory inode X is loaded into memory, its ->index_cnt field is
         initialized to (u64)-1 (at btrfs_alloc_inode());
      
      2) Task A is adding a new file to directory X, holding its vfs inode lock,
         and calls btrfs_set_inode_index() to get an index number for the entry.
      
         Because the inode's index_cnt field is set to (u64)-1 it calls
         btrfs_inode_delayed_dir_index_count() which fails because no dir index
         entries were added yet to the delayed inode and then it calls
         btrfs_set_inode_index_count(). This functions finds the last dir index
         key and then sets index_cnt to that index value + 1. It found that the
         last index key has an offset of 100. However before it assigns a value
         of 101 to index_cnt...
      
      3) Task B calls opendir(3), ending up at btrfs_opendir(), where the VFS
         lock for inode X is not taken, so it calls btrfs_get_dir_last_index()
         and sees index_cnt still with a value of (u64)-1. Because of that it
         calls btrfs_inode_delayed_dir_index_count() which fails since no dir
         index entries were added to the delayed inode yet, and then it also
         calls btrfs_set_inode_index_count(). This also finds that the last
         index key has an offset of 100, and before it assigns the value 101
         to the index_cnt field of inode X...
      
      4) Task A assigns a value of 101 to index_cnt. And then the code flow
         goes to btrfs_set_inode_index() where it increments index_cnt from
         101 to 102. Task A then creates a delayed dir index entry with a
         sequence number of 101 and adds it to the delayed inode;
      
      5) Task B assigns 101 to the index_cnt field of inode X;
      
      6) At some later point when someone tries to add a new entry to the
         directory, btrfs_set_inode_index() will return 101 again and shortly
         after an attempt to add another delayed dir index key with index
         number 101 will fail with -EEXIST resulting in a transaction abort.
      
      Fix this by locking the inode at btrfs_get_dir_last_index(), which is only
      only used when opening a directory or attempting to lseek on it.
      
      Reported-by: default avatarken <ken@bllue.org>
      Link: https://lore.kernel.org/linux-btrfs/CAE6xmH+Lp=Q=E61bU+v9eWX8gYfLvu6jLYxjxjFpo3zHVPR0EQ@mail.gmail.com/
      Reported-by: default avatar <syzbot+d13490c82ad5353c779d@syzkaller.appspotmail.com>
      Link: https://lore.kernel.org/linux-btrfs/00000000000036e1290603e097e0@google.com/
      Fixes: 9b378f6a
      
       ("btrfs: fix infinite directory reads")
      CC: stable@vger.kernel.org # 6.5+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c25d7922
    • Filipe Manana's avatar
      btrfs: refresh dir last index during a rewinddir(3) call · fd968e68
      Filipe Manana authored
      commit e60aa5da upstream.
      
      When opening a directory we find what's the index of its last entry and
      then store it in the directory's file handle private data (struct
      btrfs_file_private::last_index), so that in the case new directory entries
      are added to a directory after an opendir(3) call we don't end up in an
      infinite loop (see commit 9b378f6a
      
       ("btrfs: fix infinite directory
      reads")) when calling readdir(3).
      
      However once rewinddir(3) is called, POSIX states [1] that any new
      directory entries added after the previous opendir(3) call, must be
      returned by subsequent calls to readdir(3):
      
        "The rewinddir() function shall reset the position of the directory
         stream to which dirp refers to the beginning of the directory.
         It shall also cause the directory stream to refer to the current
         state of the corresponding directory, as a call to opendir() would
         have done."
      
      We currently don't refresh the last_index field of the struct
      btrfs_file_private associated to the directory, so after a rewinddir(3)
      we are not returning any new entries added after the opendir(3) call.
      
      Fix this by finding the current last index of the directory when llseek
      is called against the directory.
      
      This can be reproduced by the following C program provided by Ian Johnson:
      
         #include <dirent.h>
         #include <stdio.h>
      
         int main(void) {
           DIR *dir = opendir("test");
      
           FILE *file;
           file = fopen("test/1", "w");
           fwrite("1", 1, 1, file);
           fclose(file);
      
           file = fopen("test/2", "w");
           fwrite("2", 1, 1, file);
           fclose(file);
      
           rewinddir(dir);
      
           struct dirent *entry;
           while ((entry = readdir(dir))) {
              printf("%s\n", entry->d_name);
           }
           closedir(dir);
           return 0;
         }
      
      Reported-by: default avatarIan Johnson <ian@ianjohnson.dev>
      Link: https://lore.kernel.org/linux-btrfs/YR1P0S.NGASEG570GJ8@ianjohnson.dev/
      Fixes: 9b378f6a
      
       ("btrfs: fix infinite directory reads")
      CC: stable@vger.kernel.org # 6.5+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fd968e68
    • Filipe Manana's avatar
      btrfs: set last dir index to the current last index when opening dir · a045b6b1
      Filipe Manana authored
      commit 35795036 upstream.
      
      When opening a directory for reading it, we set the last index where we
      stop iteration to the value in struct btrfs_inode::index_cnt. That value
      does not match the index of the most recently added directory entry but
      it's instead the index number that will be assigned the next directory
      entry.
      
      This means that if after the call to opendir(3) new directory entries are
      added, a readdir(3) call will return the first new directory entry. This
      is fine because POSIX says the following [1]:
      
        "If a file is removed from or added to the directory after the most
         recent call to opendir() or rewinddir(), whether a subsequent call to
         readdir() returns an entry for that file is unspecified."
      
      For example for the test script from commit 9b378f6a
      
       ("btrfs: fix
      infinite directory reads"), where we have 2000 files in a directory, ext4
      doesn't return any new directory entry after opendir(3), while xfs returns
      the first 13 new directory entries added after the opendir(3) call.
      
      If we move to a shorter example with an empty directory when opendir(3) is
      called, and 2 files added to the directory after the opendir(3) call, then
      readdir(3) on btrfs will return the first file, ext4 and xfs return the 2
      files (but in a different order). A test program for this, reported by
      Ian Johnson, is the following:
      
         #include <dirent.h>
         #include <stdio.h>
      
         int main(void) {
           DIR *dir = opendir("test");
      
           FILE *file;
           file = fopen("test/1", "w");
           fwrite("1", 1, 1, file);
           fclose(file);
      
           file = fopen("test/2", "w");
           fwrite("2", 1, 1, file);
           fclose(file);
      
           struct dirent *entry;
           while ((entry = readdir(dir))) {
              printf("%s\n", entry->d_name);
           }
           closedir(dir);
           return 0;
         }
      
      To make this less odd, change the behaviour to never return new entries
      that were added after the opendir(3) call. This is done by setting the
      last_index field of the struct btrfs_file_private attached to the
      directory's file handle with a value matching btrfs_inode::index_cnt
      minus 1, since that value always matches the index of the next new
      directory entry and not the index of the most recently added entry.
      
      [1] https://pubs.opengroup.org/onlinepubs/007904875/functions/readdir_r.html
      
      Link: https://lore.kernel.org/linux-btrfs/YR1P0S.NGASEG570GJ8@ianjohnson.dev/
      CC: stable@vger.kernel.org # 6.5+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a045b6b1
    • Filipe Manana's avatar
      btrfs: fix infinite directory reads · 2aa515b5
      Filipe Manana authored
      commit 9b378f6a
      
       upstream.
      
      The readdir implementation currently processes always up to the last index
      it finds. This however can result in an infinite loop if the directory has
      a large number of entries such that they won't all fit in the given buffer
      passed to the readdir callback, that is, dir_emit() returns a non-zero
      value. Because in that case readdir() will be called again and if in the
      meanwhile new directory entries were added and we still can't put all the
      remaining entries in the buffer, we keep repeating this over and over.
      
      The following C program and test script reproduce the problem:
      
        $ cat /mnt/readdir_prog.c
        #include <sys/types.h>
        #include <dirent.h>
        #include <stdio.h>
      
        int main(int argc, char *argv[])
        {
          DIR *dir = opendir(".");
          struct dirent *dd;
      
          while ((dd = readdir(dir))) {
            printf("%s\n", dd->d_name);
            rename(dd->d_name, "TEMPFILE");
            rename("TEMPFILE", dd->d_name);
          }
          closedir(dir);
        }
      
        $ gcc -o /mnt/readdir_prog /mnt/readdir_prog.c
      
        $ cat test.sh
        #!/bin/bash
      
        DEV=/dev/sdi
        MNT=/mnt/sdi
      
        mkfs.btrfs -f $DEV &> /dev/null
        #mkfs.xfs -f $DEV &> /dev/null
        #mkfs.ext4 -F $DEV &> /dev/null
      
        mount $DEV $MNT
      
        mkdir $MNT/testdir
        for ((i = 1; i <= 2000; i++)); do
            echo -n > $MNT/testdir/file_$i
        done
      
        cd $MNT/testdir
        /mnt/readdir_prog
      
        cd /mnt
      
        umount $MNT
      
      This behaviour is surprising to applications and it's unlike ext4, xfs,
      tmpfs, vfat and other filesystems, which always finish. In this case where
      new entries were added due to renames, some file names may be reported
      more than once, but this varies according to each filesystem - for example
      ext4 never reported the same file more than once while xfs reports the
      first 13 file names twice.
      
      So change our readdir implementation to track the last index number when
      opendir() is called and then make readdir() never process beyond that
      index number. This gives the same behaviour as ext4.
      
      Reported-by: default avatarRob Landley <rob@landley.net>
      Link: https://lore.kernel.org/linux-btrfs/2c8c55ec-04c6-e0dc-9c5c-8c7924778c35@landley.net/
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=217681
      CC: stable@vger.kernel.org # 5.15
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2aa515b5
    • Florian Westphal's avatar
      netfilter: nft_limit: reject configurations that cause integer overflow · bc6e242b
      Florian Westphal authored
      [ Upstream commit c9d9eb9c ]
      
      Reject bogus configs where internal token counter wraps around.
      This only occurs with very very large requests, such as 17gbyte/s.
      
      Its better to reject this rather than having incorrect ratelimit.
      
      Fixes: d2168e84
      
       ("netfilter: nft_limit: add per-byte limiting")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      bc6e242b
    • Frederic Weisbecker's avatar
      rcu: Defer RCU kthreads wakeup when CPU is dying · c817f5c0
      Frederic Weisbecker authored
      [ Upstream commit e787644c
      
       ]
      
      When the CPU goes idle for the last time during the CPU down hotplug
      process, RCU reports a final quiescent state for the current CPU. If
      this quiescent state propagates up to the top, some tasks may then be
      woken up to complete the grace period: the main grace period kthread
      and/or the expedited main workqueue (or kworker).
      
      If those kthreads have a SCHED_FIFO policy, the wake up can indirectly
      arm the RT bandwith timer to the local offline CPU. Since this happens
      after hrtimers have been migrated at CPUHP_AP_HRTIMERS_DYING stage, the
      timer gets ignored. Therefore if the RCU kthreads are waiting for RT
      bandwidth to be available, they may never be actually scheduled.
      
      This triggers TREE03 rcutorture hangs:
      
      	 rcu: INFO: rcu_preempt self-detected stall on CPU
      	 rcu:     4-...!: (1 GPs behind) idle=9874/1/0x4000000000000000 softirq=0/0 fqs=20 rcuc=21071 jiffies(starved)
      	 rcu:     (t=21035 jiffies g=938281 q=40787 ncpus=6)
      	 rcu: rcu_preempt kthread starved for 20964 jiffies! g938281 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
      	 rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
      	 rcu: RCU grace-period kthread stack dump:
      	 task:rcu_preempt     state:R  running task     stack:14896 pid:14    tgid:14    ppid:2      flags:0x00004000
      	 Call Trace:
      	  <TASK>
      	  __schedule+0x2eb/0xa80
      	  schedule+0x1f/0x90
      	  schedule_timeout+0x163/0x270
      	  ? __pfx_process_timeout+0x10/0x10
      	  rcu_gp_fqs_loop+0x37c/0x5b0
      	  ? __pfx_rcu_gp_kthread+0x10/0x10
      	  rcu_gp_kthread+0x17c/0x200
      	  kthread+0xde/0x110
      	  ? __pfx_kthread+0x10/0x10
      	  ret_from_fork+0x2b/0x40
      	  ? __pfx_kthread+0x10/0x10
      	  ret_from_fork_asm+0x1b/0x30
      	  </TASK>
      
      The situation can't be solved with just unpinning the timer. The hrtimer
      infrastructure and the nohz heuristics involved in finding the best
      remote target for an unpinned timer would then also need to handle
      enqueues from an offline CPU in the most horrendous way.
      
      So fix this on the RCU side instead and defer the wake up to an online
      CPU if it's too late for the local one.
      
      Reported-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Fixes: 5c0930cc
      
       ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarNeeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c817f5c0
    • Dinghao Liu's avatar
      net/mlx5e: fix a potential double-free in fs_any_create_groups · b2fa86b2
      Dinghao Liu authored
      [ Upstream commit aef855df ]
      
      When kcalloc() for ft->g succeeds but kvzalloc() for in fails,
      fs_any_create_groups() will free ft->g. However, its caller
      fs_any_create_table() will free ft->g again through calling
      mlx5e_destroy_flow_table(), which will lead to a double-free.
      Fix this by setting ft->g to NULL in fs_any_create_groups().
      
      Fixes: 0f575c20
      
       ("net/mlx5e: Introduce Flow Steering ANY API")
      Signed-off-by: default avatarDinghao Liu <dinghao.liu@zju.edu.cn>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b2fa86b2
    • Zhipeng Lu's avatar
      net/mlx5e: fix a double-free in arfs_create_groups · 42876db0
      Zhipeng Lu authored
      [ Upstream commit 3c6d5189 ]
      
      When `in` allocated by kvzalloc fails, arfs_create_groups will free
      ft->g and return an error. However, arfs_create_table, the only caller of
      arfs_create_groups, will hold this error and call to
      mlx5e_destroy_flow_table, in which the ft->g will be freed again.
      
      Fixes: 1cabe6b0
      
       ("net/mlx5e: Create aRFS flow tables")
      Signed-off-by: default avatarZhipeng Lu <alexious@zju.edu.cn>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      42876db0
    • Leon Romanovsky's avatar
      net/mlx5e: Allow software parsing when IPsec crypto is enabled · 890881d1
      Leon Romanovsky authored
      [ Upstream commit 20f5468a ]
      
      All ConnectX devices have software parsing capability enabled, but it is
      more correct to set allow_swp only if capability exists, which for IPsec
      means that crypto offload is supported.
      
      Fixes: 2451da08
      
       ("net/mlx5: Unify device IPsec capabilities check")
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      890881d1
    • Rahul Rameshbabu's avatar
      net/mlx5: Use mlx5 device constant for selecting CQ period mode for ASO · 62ce1600
      Rahul Rameshbabu authored
      [ Upstream commit 20cbf8cb ]
      
      mlx5 devices have specific constants for choosing the CQ period mode. These
      constants do not have to match the constants used by the kernel software
      API for DIM period mode selection.
      
      Fixes: cdd04f4d
      
       ("net/mlx5: Add support to create SQ and CQ for ASO")
      Signed-off-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Reviewed-by: default avatarJianbo Liu <jianbol@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      62ce1600
    • Yevgeny Kliteynik's avatar
      net/mlx5: DR, Can't go to uplink vport on RX rule · 75d9ed49
      Yevgeny Kliteynik authored
      [ Upstream commit 5b2a2523 ]
      
      Go-To-Vport action on RX is not allowed when the vport is uplink.
      In such case, the packet should be dropped.
      
      Fixes: 9db810ed
      
       ("net/mlx5: DR, Expose steering action functionality")
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Reviewed-by: default avatarErez Shitrit <erezsh@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      75d9ed49
    • Yevgeny Kliteynik's avatar
      net/mlx5: DR, Use the right GVMI number for drop action · e54aedd4
      Yevgeny Kliteynik authored
      [ Upstream commit 56659542 ]
      
      When FW provides ICM addresses for drop RX/TX, the provided capability
      is 64 bits that contain its GVMI as well as the ICM address itself.
      In case of TX DROP this GVMI is different from the GVMI that the
      domain is operating on.
      
      This patch fixes the action to use these GVMI IDs, as provided by FW.
      
      Fixes: 9db810ed
      
       ("net/mlx5: DR, Expose steering action functionality")
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e54aedd4
    • Zhengchao Shao's avatar
      ipv6: init the accept_queue's spinlocks in inet6_create · f11792c3
      Zhengchao Shao authored
      [ Upstream commit 435e202d ]
      
      In commit 198bc90e("tcp: make sure init the accept_queue's spinlocks
      once"), the spinlocks of accept_queue are initialized only when socket is
      created in the inet4 scenario. The locks are not initialized when socket
      is created in the inet6 scenario. The kernel reports the following error:
      INFO: trying to register non-static key.
      The code is fine but needs lockdep annotation, or maybe
      you didn't initialize this object before use?
      turning off the locking correctness validator.
      Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      Call Trace:
      <TASK>
      	dump_stack_lvl (lib/dump_stack.c:107)
      	register_lock_class (kernel/locking/lockdep.c:1289)
      	__lock_acquire (kernel/locking/lockdep.c:5015)
      	lock_acquire.part.0 (kernel/locking/lockdep.c:5756)
      	_raw_spin_lock_bh (kernel/locking/spinlock.c:178)
      	inet_csk_listen_stop (net/ipv4/inet_connection_sock.c:1386)
      	tcp_disconnect (net/ipv4/tcp.c:2981)
      	inet_shutdown (net/ipv4/af_inet.c:935)
      	__sys_shutdown (./include/linux/file.h:32 net/socket.c:2438)
      	__x64_sys_shutdown (net/socket.c:2445)
      	do_syscall_64 (arch/x86/entry/common.c:52)
      	entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
      RIP: 0033:0x7f52ecd05a3d
      Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7
      48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
      ff 73 01 c3 48 8b 0d ab a3 0e 00 f7 d8 64 89 01 48
      RSP: 002b:00007f52ecf5dde8 EFLAGS: 00000293 ORIG_RAX: 0000000000000030
      RAX: ffffffffffffffda RBX: 00007f52ecf5e640 RCX: 00007f52ecd05a3d
      RDX: 00007f52ecc8b188 RSI: 0000000000000000 RDI: 0000000000000004
      RBP: 00007f52ecf5de20 R08: 00007ffdae45c69f R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000293 R12: 00007f52ecf5e640
      R13: 0000000000000000 R14: 00007f52ecc8b060 R15: 00007ffdae45c6e0
      
      Fixes: 198bc90e
      
       ("tcp: make sure init the accept_queue's spinlocks once")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240122102001.2851701-1-shaozhengchao@huawei.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f11792c3
    • Zhengchao Shao's avatar
      netlink: fix potential sleeping issue in mqueue_flush_file · de061604
      Zhengchao Shao authored
      [ Upstream commit 234ec0b6 ]
      
      I analyze the potential sleeping issue of the following processes:
      Thread A                                Thread B
      ...                                     netlink_create  //ref = 1
      do_mq_notify                            ...
        sock = netlink_getsockbyfilp          ...     //ref = 2
        info->notify_sock = sock;             ...
      ...                                     netlink_sendmsg
      ...                                       skb = netlink_alloc_large_skb  //skb->head is vmalloced
      ...                                       netlink_unicast
      ...                                         sk = netlink_getsockbyportid //ref = 3
      ...                                         netlink_sendskb
      ...                                           __netlink_sendskb
      ...                                             skb_queue_tail //put skb to sk_receive_queue
      ...                                         sock_put //ref = 2
      ...                                     ...
      ...                                     netlink_release
      ...                                       deferred_put_nlk_sk //ref = 1
      mqueue_flush_file
        spin_lock
        remove_notification
          netlink_sendskb
            sock_put  //ref = 0
              sk_free
                ...
                __sk_destruct
                  netlink_sock_destruct
                    skb_queue_purge  //get skb from sk_receive_queue
                      ...
                      __skb_queue_purge_reason
                        kfree_skb_reason
                          __kfree_skb
                          ...
                          skb_release_all
                            skb_release_head_state
                              netlink_skb_destructor
                                vfree(skb->head)  //sleeping while holding spinlock
      
      In netlink_sendmsg, if the memory pointed to by skb->head is allocated by
      vmalloc, and is put to sk_receive_queue queue, also the skb is not freed.
      When the mqueue executes flush, the sleeping bug will occur. Use
      vfree_atomic instead of vfree in netlink_skb_destructor to solve the issue.
      
      Fixes: c05cdb1b
      
       ("netlink: allow large data transfers from user-space")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Link: https://lore.kernel.org/r/20240122011807.2110357-1-shaozhengchao@huawei.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      de061604
    • Salvatore Dipietro's avatar
      tcp: Add memory barrier to tcp_push() · 90fba981
      Salvatore Dipietro authored
      [ Upstream commit 7267e8dc ]
      
      On CPUs with weak memory models, reads and updates performed by tcp_push
      to the sk variables can get reordered leaving the socket throttled when
      it should not. The tasklet running tcp_wfree() may also not observe the
      memory updates in time and will skip flushing any packets throttled by
      tcp_push(), delaying the sending. This can pathologically cause 40ms
      extra latency due to bad interactions with delayed acks.
      
      Adding a memory barrier in tcp_push removes the bug, similarly to the
      previous commit bf06200e ("tcp: tsq: fix nonagle handling").
      smp_mb__after_atomic() is used to not incur in unnecessary overhead
      on x86 since not affected.
      
      Patch has been tested using an AWS c7g.2xlarge instance with Ubuntu
      22.04 and Apache Tomcat 9.0.83 running the basic servlet below:
      
      import java.io.IOException;
      import java.io.OutputStreamWriter;
      import java.io.PrintWriter;
      import javax.servlet.ServletException;
      import javax.servlet.http.HttpServlet;
      import javax.servlet.http.HttpServletRequest;
      import javax.servlet.http.HttpServletResponse;
      
      public class HelloWorldServlet extends HttpServlet {
          @Override
          protected void doGet(HttpServletRequest request, HttpServletResponse response)
            throws ServletException, IOException {
              response.setContentType("text/html;charset=utf-8");
              OutputStreamWriter osw = new OutputStreamWriter(response.getOutputStream(),"UTF-8");
              String s = "a".repeat(3096);
              osw.write(s,0,s.length());
              osw.flush();
          }
      }
      
      Load was applied using wrk2 (https://github.com/kinvolk/wrk2) from an AWS
      c6i.8xlarge instance. Before the patch an additional 40ms latency from P99.99+
      values is observed while, with the patch, the extra latency disappears.
      
      No patch and tcp_autocorking=1
      ./wrk -t32 -c128 -d40s --latency -R10000  http://172.31.60.173:8080/hello/hello
        ...
       50.000%    0.91ms
       75.000%    1.13ms
       90.000%    1.46ms
       99.000%    1.74ms
       99.900%    1.89ms
       99.990%   41.95ms  <<< 40+ ms extra latency
       99.999%   48.32ms
      100.000%   48.96ms
      
      With patch and tcp_autocorking=1
      ./wrk -t32 -c128 -d40s --latency -R10000  http://172.31.60.173:8080/hello/hello
        ...
       50.000%    0.90ms
       75.000%    1.13ms
       90.000%    1.45ms
       99.000%    1.72ms
       99.900%    1.83ms
       99.990%    2.11ms  <<< no 40+ ms extra latency
       99.999%    2.53ms
      100.000%    2.62ms
      
      Patch has been also tested on x86 (m7i.2xlarge instance) which it is not
      affected by this issue and the patch doesn't introduce any additional
      delay.
      
      Fixes: 7aa5470c
      
       ("tcp: tsq: move tsq_flags close to sk_wmem_alloc")
      Signed-off-by: default avatarSalvatore Dipietro <dipiets@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240119190133.43698-1-dipiets@amazon.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      90fba981
    • David Howells's avatar
      afs: Hide silly-rename files from userspace · ab49164c
      David Howells authored
      [ Upstream commit 57e9d49c ]
      
      There appears to be a race between silly-rename files being created/removed
      and various userspace tools iterating over the contents of a directory,
      leading to such errors as:
      
      	find: './kernel/.tmp_cpio_dir/include/dt-bindings/reset/.__afs2080': No such file or directory
      	tar: ./include/linux/greybus/.__afs3C95: File removed before we read it
      
      when building a kernel.
      
      Fix afs_readdir() so that it doesn't return .__afsXXXX silly-rename files
      to userspace.  This doesn't stop them being looked up directly by name as
      we need to be able to look them up from within the kernel as part of the
      silly-rename algorithm.
      
      Fixes: 79ddbfa5
      
       ("afs: Implement sillyrename for unlink and rename")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ab49164c
    • Petr Pavlu's avatar
      tracing: Ensure visibility when inserting an element into tracing_map · f4f7e696
      Petr Pavlu authored
      [ Upstream commit 2b447606 ]
      
      Running the following two commands in parallel on a multi-processor
      AArch64 machine can sporadically produce an unexpected warning about
      duplicate histogram entries:
      
       $ while true; do
           echo hist:key=id.syscall:val=hitcount > \
             /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
           cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
           sleep 0.001
         done
       $ stress-ng --sysbadaddr $(nproc)
      
      The warning looks as follows:
      
      [ 2911.172474] ------------[ cut here ]------------
      [ 2911.173111] Duplicates detected: 1
      [ 2911.173574] WARNING: CPU: 2 PID: 12247 at kernel/trace/tracing_map.c:983 tracing_map_sort_entries+0x3e0/0x408
      [ 2911.174702] Modules linked in: iscsi_ibft(E) iscsi_boot_sysfs(E) rfkill(E) af_packet(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) ena(E) tiny_power_button(E) qemu_fw_cfg(E) button(E) fuse(E) efi_pstore(E) ip_tables(E) x_tables(E) xfs(E) libcrc32c(E) aes_ce_blk(E) aes_ce_cipher(E) crct10dif_ce(E) polyval_ce(E) polyval_generic(E) ghash_ce(E) gf128mul(E) sm4_ce_gcm(E) sm4_ce_ccm(E) sm4_ce(E) sm4_ce_cipher(E) sm4(E) sm3_ce(E) sm3(E) sha3_ce(E) sha512_ce(E) sha512_arm64(E) sha2_ce(E) sha256_arm64(E) nvme(E) sha1_ce(E) nvme_core(E) nvme_auth(E) t10_pi(E) sg(E) scsi_mod(E) scsi_common(E) efivarfs(E)
      [ 2911.174738] Unloaded tainted modules: cppc_cpufreq(E):1
      [ 2911.180985] CPU: 2 PID: 12247 Comm: cat Kdump: loaded Tainted: G            E      6.7.0-default #2 1b58bbb22c97e4399dc09f92d309344f69c44a01
      [ 2911.182398] Hardware name: Amazon EC2 c7g.8xlarge/, BIOS 1.0 11/1/2018
      [ 2911.183208] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
      [ 2911.184038] pc : tracing_map_sort_entries+0x3e0/0x408
      [ 2911.184667] lr : tracing_map_sort_entries+0x3e0/0x408
      [ 2911.185310] sp : ffff8000a1513900
      [ 2911.185750] x29: ffff8000a1513900 x28: ffff0003f272fe80 x27: 0000000000000001
      [ 2911.186600] x26: ffff0003f272fe80 x25: 0000000000000030 x24: 0000000000000008
      [ 2911.187458] x23: ffff0003c5788000 x22: ffff0003c16710c8 x21: ffff80008017f180
      [ 2911.188310] x20: ffff80008017f000 x19: ffff80008017f180 x18: ffffffffffffffff
      [ 2911.189160] x17: 0000000000000000 x16: 0000000000000000 x15: ffff8000a15134b8
      [ 2911.190015] x14: 0000000000000000 x13: 205d373432323154 x12: 5b5d313131333731
      [ 2911.190844] x11: 00000000fffeffff x10: 00000000fffeffff x9 : ffffd1b78274a13c
      [ 2911.191716] x8 : 000000000017ffe8 x7 : c0000000fffeffff x6 : 000000000057ffa8
      [ 2911.192554] x5 : ffff0012f6c24ec0 x4 : 0000000000000000 x3 : ffff2e5b72b5d000
      [ 2911.193404] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0003ff254480
      [ 2911.194259] Call trace:
      [ 2911.194626]  tracing_map_sort_entries+0x3e0/0x408
      [ 2911.195220]  hist_show+0x124/0x800
      [ 2911.195692]  seq_read_iter+0x1d4/0x4e8
      [ 2911.196193]  seq_read+0xe8/0x138
      [ 2911.196638]  vfs_read+0xc8/0x300
      [ 2911.197078]  ksys_read+0x70/0x108
      [ 2911.197534]  __arm64_sys_read+0x24/0x38
      [ 2911.198046]  invoke_syscall+0x78/0x108
      [ 2911.198553]  el0_svc_common.constprop.0+0xd0/0xf8
      [ 2911.199157]  do_el0_svc+0x28/0x40
      [ 2911.199613]  el0_svc+0x40/0x178
      [ 2911.200048]  el0t_64_sync_handler+0x13c/0x158
      [ 2911.200621]  el0t_64_sync+0x1a8/0x1b0
      [ 2911.201115] ---[ end trace 0000000000000000 ]---
      
      The problem appears to be caused by CPU reordering of writes issued from
      __tracing_map_insert().
      
      The check for the presence of an element with a given key in this
      function is:
      
       val = READ_ONCE(entry->val);
       if (val && keys_match(key, val->key, map->key_size)) ...
      
      The write of a new entry is:
      
       elt = get_free_elt(map);
       memcpy(elt->key, key, map->key_size);
       entry->val = elt;
      
      The "memcpy(elt->key, key, map->key_size);" and "entry->val = elt;"
      stores may become visible in the reversed order on another CPU. This
      second CPU might then incorrectly determine that a new key doesn't match
      an already present val->key and subsequently insert a new element,
      resulting in a duplicate.
      
      Fix the problem by adding a write barrier between
      "memcpy(elt->key, key, map->key_size);" and "entry->val = elt;", and for
      good measure, also use WRITE_ONCE(entry->val, elt) for publishing the
      element. The sequence pairs with the mentioned "READ_ONCE(entry->val);"
      and the "val->key" check which has an address dependency.
      
      The barrier is placed on a path executed when adding an element for
      a new key. Subsequent updates targeting the same key remain unaffected.
      
      From the user's perspective, the issue was introduced by commit
      c193707d ("tracing: Remove code which merges duplicates"), which
      followed commit cbf4100e ("tracing: Add support to detect and avoid
      duplicates"). The previous code operated differently; it inherently
      expected potential races which result in duplicates but merged them
      later when they occurred.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20240122150928.27725-1-petr.pavlu@suse.com
      
      Fixes: c193707d
      
       ("tracing: Remove code which merges duplicates")
      Signed-off-by: default avatarPetr Pavlu <petr.pavlu@suse.com>
      Acked-by: default avatarTom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f4f7e696
    • Dan Carpenter's avatar
      netfs, fscache: Prevent Oops in fscache_put_cache() · 82a9bc34
      Dan Carpenter authored
      [ Upstream commit 3be0b3ed ]
      
      This function dereferences "cache" and then checks if it's
      IS_ERR_OR_NULL().  Check first, then dereference.
      
      Fixes: 9549332d
      
       ("fscache: Implement cache registration")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Link: https://lore.kernel.org/r/e84bc740-3502-4f16-982a-a40d5676615c@moroto.mountain/ # v2
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      82a9bc34
    • Sharath Srinivasan's avatar
      net/rds: Fix UBSAN: array-index-out-of-bounds in rds_cmsg_recv · 71024928
      Sharath Srinivasan authored
      [ Upstream commit 13e788de ]
      
      Syzcaller UBSAN crash occurs in rds_cmsg_recv(),
      which reads inc->i_rx_lat_trace[j + 1] with index 4 (3 + 1),
      but with array size of 4 (RDS_RX_MAX_TRACES).
      Here 'j' is assigned from rs->rs_rx_trace[i] and in-turn from
      trace.rx_trace_pos[i] in rds_recv_track_latency(),
      with both arrays sized 3 (RDS_MSG_RX_DGRAM_TRACE_MAX). So fix the
      off-by-one bounds check in rds_recv_track_latency() to prevent
      a potential crash in rds_cmsg_recv().
      
      Found by syzcaller:
      =================================================================
      UBSAN: array-index-out-of-bounds in net/rds/recv.c:585:39
      index 4 is out of range for type 'u64 [4]'
      CPU: 1 PID: 8058 Comm: syz-executor228 Not tainted 6.6.0-gd2f51b3516da #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS 1.15.0-1 04/01/2014
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x136/0x150 lib/dump_stack.c:106
       ubsan_epilogue lib/ubsan.c:217 [inline]
       __ubsan_handle_out_of_bounds+0xd5/0x130 lib/ubsan.c:348
       rds_cmsg_recv+0x60d/0x700 net/rds/recv.c:585
       rds_recvmsg+0x3fb/0x1610 net/rds/recv.c:716
       sock_recvmsg_nosec net/socket.c:1044 [inline]
       sock_recvmsg+0xe2/0x160 net/socket.c:1066
       __sys_recvfrom+0x1b6/0x2f0 net/socket.c:2246
       __do_sys_recvfrom net/socket.c:2264 [inline]
       __se_sys_recvfrom net/socket.c:2260 [inline]
       __x64_sys_recvfrom+0xe0/0x1b0 net/socket.c:2260
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      ==================================================================
      
      Fixes: 3289025a
      
       ("RDS: add receive message trace used by application")
      Reported-by: default avatarChenyuan Yang <chenyuan0y@gmail.com>
      Closes: https://lore.kernel.org/linux-rdma/CALGdzuoVdq-wtQ4Az9iottBqC5cv9ZhcE5q8N7LfYFvkRsOVcw@mail.gmail.com/
      Signed-off-by: default avatarSharath Srinivasan <sharath.srinivasan@oracle.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      71024928
    • Horatiu Vultur's avatar
      net: micrel: Fix PTP frame parsing for lan8814 · fcb0b4b6
      Horatiu Vultur authored
      [ Upstream commit aaf632f7 ]
      
      The HW has the capability to check each frame if it is a PTP frame,
      which domain it is, which ptp frame type it is, different ip address in
      the frame. And if one of these checks fail then the frame is not
      timestamp. Most of these checks were disabled except checking the field
      minorVersionPTP inside the PTP header. Meaning that once a partner sends
      a frame compliant to 8021AS which has minorVersionPTP set to 1, then the
      frame was not timestamp because the HW expected by default a value of 0
      in minorVersionPTP. This is exactly the same issue as on lan8841.
      Fix this issue by removing this check so the userspace can decide on this.
      
      Fixes: ece19502
      
       ("net: phy: micrel: 1588 support for LAN8814 phy")
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Reviewed-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Reviewed-by: default avatarDivya Koppera <divya.koppera@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      fcb0b4b6
    • Yunjian Wang's avatar
      tun: add missing rx stats accounting in tun_xdp_act · 7a581f59
      Yunjian Wang authored
      [ Upstream commit f1084c42 ]
      
      The TUN can be used as vhost-net backend, and it is necessary to
      count the packets transmitted from TUN to vhost-net/virtio-net.
      However, there are some places in the receive path that were not
      taken into account when using XDP. It would be beneficial to also
      include new accounting for successfully received bytes using
      dev_sw_netstats_rx_add.
      
      Fixes: 761876c8
      
       ("tap: XDP support")
      Signed-off-by: default avatarYunjian Wang <wangyunjian@huawei.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7a581f59
    • Yunjian Wang's avatar
      tun: fix missing dropped counter in tun_xdp_act · 41e7decd
      Yunjian Wang authored
      [ Upstream commit 5744ba05 ]
      
      The commit 8ae1aff0 ("tuntap: split out XDP logic") includes
      dropped counter for XDP_DROP, XDP_ABORTED, and invalid XDP actions.
      Unfortunately, that commit missed the dropped counter when error
      occurs during XDP_TX and XDP_REDIRECT actions. This patch fixes
      this issue.
      
      Fixes: 8ae1aff0
      
       ("tuntap: split out XDP logic")
      Signed-off-by: default avatarYunjian Wang <wangyunjian@huawei.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      41e7decd
    • Jakub Kicinski's avatar
      net: fix removing a namespace with conflicting altnames · a2232f29
      Jakub Kicinski authored
      [ Upstream commit d09486a0
      
       ]
      
      Mark reports a BUG() when a net namespace is removed.
      
          kernel BUG at net/core/dev.c:11520!
      
      Physical interfaces moved outside of init_net get "refunded"
      to init_net when that namespace disappears. The main interface
      name may get overwritten in the process if it would have
      conflicted. We need to also discard all conflicting altnames.
      Recent fixes addressed ensuring that altnames get moved
      with the main interface, which surfaced this problem.
      
      Reported-by: default avatarМарк Коренберг <socketpair@gmail.com>
      Link: https://lore.kernel.org/all/CAEmTpZFZ4Sv3KwqFOY2WKDHeZYdi0O7N5H1nTvcGp=SAEavtDg@mail.gmail.com/
      Fixes: 7663d522
      
       ("net: check for altname conflicts when changing netdev's netns")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a2232f29
    • Eric Dumazet's avatar
      udp: fix busy polling · 6646145b
      Eric Dumazet authored
      [ Upstream commit a54d51fb ]
      
      Generic sk_busy_loop_end() only looks at sk->sk_receive_queue
      for presence of packets.
      
      Problem is that for UDP sockets after blamed commit, some packets
      could be present in another queue: udp_sk(sk)->reader_queue
      
      In some cases, a busy poller could spin until timeout expiration,
      even if some packets are available in udp_sk(sk)->reader_queue.
      
      v3: - make sk_busy_loop_end() nicer (Willem)
      
      v2: - add a READ_ONCE(sk->sk_family) in sk_is_inet() to avoid KCSAN splats.
          - add a sk_is_inet() check in sk_is_udp() (Willem feedback)
          - add a sk_is_inet() check in sk_is_tcp().
      
      Fixes: 2276f58a
      
       ("udp: use a separate rx queue for packet reception")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6646145b
    • Kuniyuki Iwashima's avatar
      llc: Drop support for ETH_P_TR_802_2. · 660c3053
      Kuniyuki Iwashima authored
      [ Upstream commit e3f9bed9 ]
      
      syzbot reported an uninit-value bug below. [0]
      
      llc supports ETH_P_802_2 (0x0004) and used to support ETH_P_TR_802_2
      (0x0011), and syzbot abused the latter to trigger the bug.
      
        write$tun(r0, &(0x7f0000000040)={@val={0x0, 0x11}, @val, @mpls={[], @llc={@snap={0xaa, 0x1, ')', "90e5dd"}}}}, 0x16)
      
      llc_conn_handler() initialises local variables {saddr,daddr}.mac
      based on skb in llc_pdu_decode_sa()/llc_pdu_decode_da() and passes
      them to __llc_lookup().
      
      However, the initialisation is done only when skb->protocol is
      htons(ETH_P_802_2), otherwise, __llc_lookup_established() and
      __llc_lookup_listener() will read garbage.
      
      The missing initialisation existed prior to commit 211ed865
      ("net: delete all instances of special processing for token ring").
      
      It removed the part to kick out the token ring stuff but forgot to
      close the door allowing ETH_P_TR_802_2 packets to sneak into llc_rcv().
      
      Let's remove llc_tr_packet_type and complete the deprecation.
      
      [0]:
      BUG: KMSAN: uninit-value in __llc_lookup_established+0xe9d/0xf90
       __llc_lookup_established+0xe9d/0xf90
       __llc_lookup net/llc/llc_conn.c:611 [inline]
       llc_conn_handler+0x4bd/0x1360 net/llc/llc_conn.c:791
       llc_rcv+0xfbb/0x14a0 net/llc/llc_input.c:206
       __netif_receive_skb_one_core net/core/dev.c:5527 [inline]
       __netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5641
       netif_receive_skb_internal net/core/dev.c:5727 [inline]
       netif_receive_skb+0x58/0x660 net/core/dev.c:5786
       tun_rx_batched+0x3ee/0x980 drivers/net/tun.c:1555
       tun_get_user+0x53af/0x66d0 drivers/net/tun.c:2002
       tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048
       call_write_iter include/linux/fs.h:2020 [inline]
       new_sync_write fs/read_write.c:491 [inline]
       vfs_write+0x8ef/0x1490 fs/read_write.c:584
       ksys_write+0x20f/0x4c0 fs/read_write.c:637
       __do_sys_write fs/read_write.c:649 [inline]
       __se_sys_write fs/read_write.c:646 [inline]
       __x64_sys_write+0x93/0xd0 fs/read_write.c:646
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      Local variable daddr created at:
       llc_conn_handler+0x53/0x1360 net/llc/llc_conn.c:783
       llc_rcv+0xfbb/0x14a0 net/llc/llc_input.c:206
      
      CPU: 1 PID: 5004 Comm: syz-executor994 Not tainted 6.6.0-syzkaller-14500-g1c41041124bd #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023
      
      Fixes: 211ed865
      
       ("net: delete all instances of special processing for token ring")
      Reported-by: default avatar <syzbot+b5ad66046b913bc04c6f@syzkaller.appspotmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=b5ad66046b913bc04c6f
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240119015515.61898-1-kuniyu@amazon.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      660c3053
    • Eric Dumazet's avatar
      llc: make llc_ui_sendmsg() more robust against bonding changes · 6d53b813
      Eric Dumazet authored
      [ Upstream commit dad555c8 ]
      
      syzbot was able to trick llc_ui_sendmsg(), allocating an skb with no
      headroom, but subsequently trying to push 14 bytes of Ethernet header [1]
      
      Like some others, llc_ui_sendmsg() releases the socket lock before
      calling sock_alloc_send_skb().
      Then it acquires it again, but does not redo all the sanity checks
      that were performed.
      
      This fix:
      
      - Uses LL_RESERVED_SPACE() to reserve space.
      - Check all conditions again after socket lock is held again.
      - Do not account Ethernet header for mtu limitation.
      
      [1]
      
      skbuff: skb_under_panic: text:ffff800088baa334 len:1514 put:14 head:ffff0000c9c37000 data:ffff0000c9c36ff2 tail:0x5dc end:0x6c0 dev:bond0
      
       kernel BUG at net/core/skbuff.c:193 !
      Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
      Modules linked in:
      CPU: 0 PID: 6875 Comm: syz-executor.0 Not tainted 6.7.0-rc8-syzkaller-00101-g0802e17d9aca-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
      pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
       pc : skb_panic net/core/skbuff.c:189 [inline]
       pc : skb_under_panic+0x13c/0x140 net/core/skbuff.c:203
       lr : skb_panic net/core/skbuff.c:189 [inline]
       lr : skb_under_panic+0x13c/0x140 net/core/skbuff.c:203
      sp : ffff800096f97000
      x29: ffff800096f97010 x28: ffff80008cc8d668 x27: dfff800000000000
      x26: ffff0000cb970c90 x25: 00000000000005dc x24: ffff0000c9c36ff2
      x23: ffff0000c9c37000 x22: 00000000000005ea x21: 00000000000006c0
      x20: 000000000000000e x19: ffff800088baa334 x18: 1fffe000368261ce
      x17: ffff80008e4ed000 x16: ffff80008a8310f8 x15: 0000000000000001
      x14: 1ffff00012df2d58 x13: 0000000000000000 x12: 0000000000000000
      x11: 0000000000000001 x10: 0000000000ff0100 x9 : e28a51f1087e8400
      x8 : e28a51f1087e8400 x7 : ffff80008028f8d0 x6 : 0000000000000000
      x5 : 0000000000000001 x4 : 0000000000000001 x3 : ffff800082b78714
      x2 : 0000000000000001 x1 : 0000000100000000 x0 : 0000000000000089
      Call trace:
        skb_panic net/core/skbuff.c:189 [inline]
        skb_under_panic+0x13c/0x140 net/core/skbuff.c:203
        skb_push+0xf0/0x108 net/core/skbuff.c:2451
        eth_header+0x44/0x1f8 net/ethernet/eth.c:83
        dev_hard_header include/linux/netdevice.h:3188 [inline]
        llc_mac_hdr_init+0x110/0x17c net/llc/llc_output.c:33
        llc_sap_action_send_xid_c+0x170/0x344 net/llc/llc_s_ac.c:85
        llc_exec_sap_trans_actions net/llc/llc_sap.c:153 [inline]
        llc_sap_next_state net/llc/llc_sap.c:182 [inline]
        llc_sap_state_process+0x1ec/0x774 net/llc/llc_sap.c:209
        llc_build_and_send_xid_pkt+0x12c/0x1c0 net/llc/llc_sap.c:270
        llc_ui_sendmsg+0x7bc/0xb1c net/llc/af_llc.c:997
        sock_sendmsg_nosec net/socket.c:730 [inline]
        __sock_sendmsg net/socket.c:745 [inline]
        sock_sendmsg+0x194/0x274 net/socket.c:767
        splice_to_socket+0x7cc/0xd58 fs/splice.c:881
        do_splice_from fs/splice.c:933 [inline]
        direct_splice_actor+0xe4/0x1c0 fs/splice.c:1142
        splice_direct_to_actor+0x2a0/0x7e4 fs/splice.c:1088
        do_splice_direct+0x20c/0x348 fs/splice.c:1194
        do_sendfile+0x4bc/0xc70 fs/read_write.c:1254
        __do_sys_sendfile64 fs/read_write.c:1322 [inline]
        __se_sys_sendfile64 fs/read_write.c:1308 [inline]
        __arm64_sys_sendfile64+0x160/0x3b4 fs/read_write.c:1308
        __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline]
        invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:51
        el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:136
        do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:155
        el0_svc+0x54/0x158 arch/arm64/kernel/entry-common.c:678
        el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:696
        el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:595
      Code: aa1803e6 aa1903e7 a90023f5 94792f6a (d4210000)
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Reported-and-tested-by: default avatar <syzbot+2a7024e9502df538e8ef@syzkaller.appspotmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240118183625.4007013-1-edumazet@google.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6d53b813
    • Lin Ma's avatar
      vlan: skip nested type that is not IFLA_VLAN_QOS_MAPPING · c5e7fa4f
      Lin Ma authored
      [ Upstream commit 6c21660f ]
      
      In the vlan_changelink function, a loop is used to parse the nested
      attributes IFLA_VLAN_EGRESS_QOS and IFLA_VLAN_INGRESS_QOS in order to
      obtain the struct ifla_vlan_qos_mapping. These two nested attributes are
      checked in the vlan_validate_qos_map function, which calls
      nla_validate_nested_deprecated with the vlan_map_policy.
      
      However, this deprecated validator applies a LIBERAL strictness, allowing
      the presence of an attribute with the type IFLA_VLAN_QOS_UNSPEC.
      Consequently, the loop in vlan_changelink may parse an attribute of type
      IFLA_VLAN_QOS_UNSPEC and believe it carries a payload of
      struct ifla_vlan_qos_mapping, which is not necessarily true.
      
      To address this issue and ensure compatibility, this patch introduces two
      type checks that skip attributes whose type is not IFLA_VLAN_QOS_MAPPING.
      
      Fixes: 07b5b17e
      
       ("[VLAN]: Use rtnl_link API")
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240118130306.1644001-1-linma@zju.edu.cn
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c5e7fa4f
    • Michael Chan's avatar
      bnxt_en: Wait for FLR to complete during probe · 4ee06138
      Michael Chan authored
      [ Upstream commit 3c1069fa ]
      
      The first message to firmware may fail if the device is undergoing FLR.
      The driver has some recovery logic for this failure scenario but we must
      wait 100 msec for FLR to complete before proceeding.  Otherwise the
      recovery will always fail.
      
      Fixes: ba02629f
      
       ("bnxt_en: log firmware status on firmware init failure")
      Reviewed-by: default avatarDamodharam Ammepalli <damodharam.ammepalli@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Link: https://lore.kernel.org/r/20240117234515.226944-2-michael.chan@broadcom.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4ee06138
    • Zhengchao Shao's avatar
      tcp: make sure init the accept_queue's spinlocks once · b1e0a68a
      Zhengchao Shao authored
      [ Upstream commit 198bc90e ]
      
      When I run syz's reproduction C program locally, it causes the following
      issue:
      pvqspinlock: lock 0xffff9d181cd5c660 has corrupted value 0x0!
      WARNING: CPU: 19 PID: 21160 at __pv_queued_spin_unlock_slowpath (kernel/locking/qspinlock_paravirt.h:508)
      Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      RIP: 0010:__pv_queued_spin_unlock_slowpath (kernel/locking/qspinlock_paravirt.h:508)
      Code: 73 56 3a ff 90 c3 cc cc cc cc 8b 05 bb 1f 48 01 85 c0 74 05 c3 cc cc cc cc 8b 17 48 89 fe 48 c7 c7
      30 20 ce 8f e8 ad 56 42 ff <0f> 0b c3 cc cc cc cc 0f 0b 0f 1f 40 00 90 90 90 90 90 90 90 90 90
      RSP: 0018:ffffa8d200604cb8 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9d1ef60e0908
      RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9d1ef60e0900
      RBP: ffff9d181cd5c280 R08: 0000000000000000 R09: 00000000ffff7fff
      R10: ffffa8d200604b68 R11: ffffffff907dcdc8 R12: 0000000000000000
      R13: ffff9d181cd5c660 R14: ffff9d1813a3f330 R15: 0000000000001000
      FS:  00007fa110184640(0000) GS:ffff9d1ef60c0000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000000 CR3: 000000011f65e000 CR4: 00000000000006f0
      Call Trace:
      <IRQ>
        _raw_spin_unlock (kernel/locking/spinlock.c:186)
        inet_csk_reqsk_queue_add (net/ipv4/inet_connection_sock.c:1321)
        inet_csk_complete_hashdance (net/ipv4/inet_connection_sock.c:1358)
        tcp_check_req (net/ipv4/tcp_minisocks.c:868)
        tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2260)
        ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205)
        ip_local_deliver_finish (net/ipv4/ip_input.c:234)
        __netif_receive_skb_one_core (net/core/dev.c:5529)
        process_backlog (./include/linux/rcupdate.h:779)
        __napi_poll (net/core/dev.c:6533)
        net_rx_action (net/core/dev.c:6604)
        __do_softirq (./arch/x86/include/asm/jump_label.h:27)
        do_softirq (kernel/softirq.c:454 kernel/softirq.c:441)
      </IRQ>
      <TASK>
        __local_bh_enable_ip (kernel/softirq.c:381)
        __dev_queue_xmit (net/core/dev.c:4374)
        ip_finish_output2 (./include/net/neighbour.h:540 net/ipv4/ip_output.c:235)
        __ip_queue_xmit (net/ipv4/ip_output.c:535)
        __tcp_transmit_skb (net/ipv4/tcp_output.c:1462)
        tcp_rcv_synsent_state_process (net/ipv4/tcp_input.c:6469)
        tcp_rcv_state_process (net/ipv4/tcp_input.c:6657)
        tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1929)
        __release_sock (./include/net/sock.h:1121 net/core/sock.c:2968)
        release_sock (net/core/sock.c:3536)
        inet_wait_for_connect (net/ipv4/af_inet.c:609)
        __inet_stream_connect (net/ipv4/af_inet.c:702)
        inet_stream_connect (net/ipv4/af_inet.c:748)
        __sys_connect (./include/linux/file.h:45 net/socket.c:2064)
        __x64_sys_connect (net/socket.c:2073 net/socket.c:2070 net/socket.c:2070)
        do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:82)
        entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
        RIP: 0033:0x7fa10ff05a3d
        Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89
        c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ab a3 0e 00 f7 d8 64 89 01 48
        RSP: 002b:00007fa110183de8 EFLAGS: 00000202 ORIG_RAX: 000000000000002a
        RAX: ffffffffffffffda RBX: 0000000020000054 RCX: 00007fa10ff05a3d
        RDX: 000000000000001c RSI: 0000000020000040 RDI: 0000000000000003
        RBP: 00007fa110183e20 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000202 R12: 00007fa110184640
        R13: 0000000000000000 R14: 00007fa10fe8b060 R15: 00007fff73e23b20
      </TASK>
      
      The issue triggering process is analyzed as follows:
      Thread A                                       Thread B
      tcp_v4_rcv	//receive ack TCP packet       inet_shutdown
        tcp_check_req                                  tcp_disconnect //disconnect sock
        ...                                              tcp_set_state(sk, TCP_CLOSE)
          inet_csk_complete_hashdance                ...
            inet_csk_reqsk_queue_add                 inet_listen  //start listen
              spin_lock(&queue->rskq_lock)             inet_csk_listen_start
              ...                                        reqsk_queue_alloc
              ...                                          spin_lock_init
              spin_unlock(&queue->rskq_lock)	//warning
      
      When the socket receives the ACK packet during the three-way handshake,
      it will hold spinlock. And then the user actively shutdowns the socket
      and listens to the socket immediately, the spinlock will be initialized.
      When the socket is going to release the spinlock, a warning is generated.
      Also the same issue to fastopenq.lock.
      
      Move init spinlock to inet_create and inet_accept to make sure init the
      accept_queue's spinlocks once.
      
      Fixes: fff1f300 ("tcp: add a spinlock to protect struct request_sock_queue")
      Fixes: 168a8f58
      
       ("tcp: TCP Fast Open Server - main code path")
      Reported-by: default avatarMing Shu <sming56@aliyun.com>
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240118012019.1751966-1-shaozhengchao@huawei.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b1e0a68a
    • Wen Gu's avatar
      net/smc: fix illegal rmb_desc access in SMC-D connection dump · 6994dba0
      Wen Gu authored
      [ Upstream commit dbc153fd ]
      
      A crash was found when dumping SMC-D connections. It can be reproduced
      by following steps:
      
      - run nginx/wrk test:
        smc_run nginx
        smc_run wrk -t 16 -c 1000 -d <duration> -H 'Connection: Close' <URL>
      
      - continuously dump SMC-D connections in parallel:
        watch -n 1 'smcss -D'
      
       BUG: kernel NULL pointer dereference, address: 0000000000000030
       CPU: 2 PID: 7204 Comm: smcss Kdump: loaded Tainted: G	E      6.7.0+ #55
       RIP: 0010:__smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag]
       Call Trace:
        <TASK>
        ? __die+0x24/0x70
        ? page_fault_oops+0x66/0x150
        ? exc_page_fault+0x69/0x140
        ? asm_exc_page_fault+0x26/0x30
        ? __smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag]
        ? __kmalloc_node_track_caller+0x35d/0x430
        ? __alloc_skb+0x77/0x170
        smc_diag_dump_proto+0xd0/0xf0 [smc_diag]
        smc_diag_dump+0x26/0x60 [smc_diag]
        netlink_dump+0x19f/0x320
        __netlink_dump_start+0x1dc/0x300
        smc_diag_handler_dump+0x6a/0x80 [smc_diag]
        ? __pfx_smc_diag_dump+0x10/0x10 [smc_diag]
        sock_diag_rcv_msg+0x121/0x140
        ? __pfx_sock_diag_rcv_msg+0x10/0x10
        netlink_rcv_skb+0x5a/0x110
        sock_diag_rcv+0x28/0x40
        netlink_unicast+0x22a/0x330
        netlink_sendmsg+0x1f8/0x420
        __sock_sendmsg+0xb0/0xc0
        ____sys_sendmsg+0x24e/0x300
        ? copy_msghdr_from_user+0x62/0x80
        ___sys_sendmsg+0x7c/0xd0
        ? __do_fault+0x34/0x160
        ? do_read_fault+0x5f/0x100
        ? do_fault+0xb0/0x110
        ? __handle_mm_fault+0x2b0/0x6c0
        __sys_sendmsg+0x4d/0x80
        do_syscall_64+0x69/0x180
        entry_SYSCALL_64_after_hwframe+0x6e/0x76
      
      It is possible that the connection is in process of being established
      when we dump it. Assumed that the connection has been registered in a
      link group by smc_conn_create() but the rmb_desc has not yet been
      initialized by smc_buf_create(), thus causing the illegal access to
      conn->rmb_desc. So fix it by checking before dump.
      
      Fixes: 4b1b7d3b
      
       ("net/smc: add SMC-D diag support")
      Signed-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Reviewed-by: default avatarDust Li <dust.li@linux.alibaba.com>
      Reviewed-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6994dba0
    • Johannes Berg's avatar
      wifi: mac80211: fix potential sta-link leak · 49aaeb8c
      Johannes Berg authored
      [ Upstream commit b01a74b3 ]
      
      When a station is allocated, links are added but not
      set to valid yet (e.g. during connection to an AP MLD),
      we might remove the station without ever marking links
      valid, and leak them. Fix that.
      
      Fixes: cb71f1d1
      
       ("wifi: mac80211: add sta link addition/removal")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Reviewed-by: default avatarIlan Peer <ilan.peer@intel.com>
      Signed-off-by: default avatarMiri Korenblit <miriam.rachel.korenblit@intel.com>
      Link: https://msgid.link/20240111181514.6573998beaf8.I09ac2e1d41c80f82a5a616b8bd1d9d8dd709a6a6@changeid
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      49aaeb8c
    • Wayne Lin's avatar
      drm/amd/display: pbn_div need be updated for hotplug event · b59e08c8
      Wayne Lin authored
      commit 9cdef4f7
      
       upstream.
      
      link_rate sometime will be changed when DP MST connector hotplug, so
      pbn_div also need be updated; otherwise, it will mismatch with
      link_rate, causes no output in external monitor.
      
      This is a backport to 6.7 and older.
      
      Cc: stable@vger.kernel.org
      Tested-by: default avatarDaniel Wheeler <daniel.wheeler@amd.com>
      Reviewed-by: default avatarJerry Zuo <jerry.zuo@amd.com>
      Acked-by: default avatarRodrigo Siqueira <rodrigo.siqueira@amd.com>
      Signed-off-by: default avatarWade Wang <wade.wang@hp.com>
      Signed-off-by: default avatarWayne Lin <wayne.lin@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b59e08c8
    • Jonathan Gray's avatar
      Revert "drm/amd: Enable PCIe PME from D3" · a5046e5e
      Jonathan Gray authored
      This reverts commit 0c8d252d.
      
      duplicated a change made in 6.1.66
      c6088429
      
      
      
      Cc: stable@vger.kernel.org # 6.1
      Signed-off-by: default avatarJonathan Gray <jsg@jsg.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a5046e5e
    • Namjae Jeon's avatar
      ksmbd: Add missing set_freezable() for freezable kthread · b1c06ee2
      Namjae Jeon authored
      From: Kevin Hao <haokexin@gmail.com>
      
      [ Upstream commit 8fb7b723
      
       ]
      
      The kernel thread function ksmbd_conn_handler_loop() invokes
      the try_to_freeze() in its loop. But all the kernel threads are
      non-freezable by default. So if we want to make a kernel thread to be
      freezable, we have to invoke set_freezable() explicitly.
      
      Signed-off-by: default avatarKevin Hao <haokexin@gmail.com>
      Acked-by: default avatarNamjae Jeon <linkinjeon@kernel.org>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b1c06ee2