Skip to content
  1. Jan 18, 2023
    • Alexander Egorenkov's avatar
      s390/kexec: fix ipl report address for kdump · f6da927c
      Alexander Egorenkov authored
      commit c2337a40 upstream.
      
      This commit addresses the following erroneous situation with file-based
      kdump executed on a system with a valid IPL report.
      
      On s390, a kdump kernel, its initrd and IPL report if present are loaded
      into a special and reserved on boot memory region - crashkernel. When
      a system crashes and kdump was activated before, the purgatory code
      is entered first which swaps the crashkernel and [0 - crashkernel size]
      memory regions. Only after that the kdump kernel is entered. For this
      reason, the pointer to an IPL report in lowcore must point to the IPL report
      after the swap and not to the address of the IPL report that was located in
      crashkernel memory region before the swap. Failing to do so, makes the
      kdump's decompressor try to read memory from the crashkernel memory region
      which already contains the production's kernel memory.
      
      The situation described above caused spontaneous kdump failures/hangs
      on systems where the Secure IPL is activated because on such systems
      an IPL report is always present. In that case kdump's decompressor tried
      to parse an IPL report which frequently lead to illegal memory accesses
      because an IPL report contains addresses to various data.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 99feaa71
      
       ("s390/kexec_file: Create ipl report and pass to next kernel")
      Reviewed-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarAlexander Egorenkov <egorenar@linux.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f6da927c
    • Adrian Hunter's avatar
      perf auxtrace: Fix address filter duplicate symbol selection · 4bf6e11c
      Adrian Hunter authored
      commit cf129830 upstream.
      
      When a match has been made to the nth duplicate symbol, return
      success not error.
      
      Example:
      
        Before:
      
          $ cat file.c
          cat: file.c: No such file or directory
          $ cat file1.c
          #include <stdio.h>
      
          static void func(void)
          {
                  printf("First func\n");
          }
      
          void other(void);
      
          int main()
          {
                  func();
                  other();
                  return 0;
          }
          $ cat file2.c
          #include <stdio.h>
      
          static void func(void)
          {
                  printf("Second func\n");
          }
      
          void other(void)
          {
                  func();
          }
      
          $ gcc -Wall -Wextra -o test file1.c file2.c
          $ perf record -e intel_pt//u --filter 'filter func @ ./test' -- ./test
          Multiple symbols with name 'func'
          #1      0x1149  l       func
                          which is near           main
          #2      0x1179  l       func
                          which is near           other
          Disambiguate symbol name by inserting #n after the name e.g. func #2
          Or select a global symbol by inserting #0 or #g or #G
          Failed to parse address filter: 'filter func @ ./test'
          Filter format is: filter|start|stop|tracestop <start symbol or address> [/ <end symbol or size>] [@<file name>]
          Where multiple filters are separated by space or comma.
          $ perf record -e intel_pt//u --filter 'filter func #2 @ ./test' -- ./test
          Failed to parse address filter: 'filter func #2 @ ./test'
          Filter format is: filter|start|stop|tracestop <start symbol or address> [/ <end symbol or size>] [@<file name>]
          Where multiple filters are separated by space or comma.
      
        After:
      
          $ perf record -e intel_pt//u --filter 'filter func #2 @ ./test' -- ./test
          First func
          Second func
          [ perf record: Woken up 1 times to write data ]
          [ perf record: Captured and wrote 0.016 MB perf.data ]
          $ perf script --itrace=b -Ftime,flags,ip,sym,addr --ns
          1231062.526977619:   tr strt                               0 [unknown] =>     558495708179 func
          1231062.526977619:   tr end  call               558495708188 func =>     558495708050 _init
          1231062.526979286:   tr strt                               0 [unknown] =>     55849570818d func
          1231062.526979286:   tr end  return             55849570818f func =>     55849570819d other
      
      Fixes: 1b36c03e
      
       ("perf record: Add support for using symbols in address filters")
      Reported-by: default avatarDmitrii Dolgov <9erthalion6@gmail.com>
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Tested-by: default avatarDmitry Dolgov <9erthalion6@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20230110185659.15979-1-adrian.hunter@intel.com
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4bf6e11c
    • Jonathan Corbet's avatar
      docs: Fix the docs build with Sphinx 6.0 · 2e4164d3
      Jonathan Corbet authored
      commit 0283189e
      
       upstream.
      
      Sphinx 6.0 removed the execfile_() function, which we use as part of the
      configuration process.  They *did* warn us...  Just open-code the
      functionality as is done in Sphinx itself.
      
      Tested (using SPHINX_CONF, since this code is only executed with an
      alternative config file) on various Sphinx versions from 2.5 through 6.0.
      
      Reported-by: default avatarMartin Liška <mliska@suse.cz>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2e4164d3
    • Ard Biesheuvel's avatar
      efi: tpm: Avoid READ_ONCE() for accessing the event log · 3ed18307
      Ard Biesheuvel authored
      commit d3f45053
      
       upstream.
      
      Nathan reports that recent kernels built with LTO will crash when doing
      EFI boot using Fedora's GRUB and SHIM. The culprit turns out to be a
      misaligned load from the TPM event log, which is annotated with
      READ_ONCE(), and under LTO, this gets translated into a LDAR instruction
      which does not tolerate misaligned accesses.
      
      Interestingly, this does not happen when booting the same kernel
      straight from the UEFI shell, and so the fact that the event log may
      appear misaligned in memory may be caused by a bug in GRUB or SHIM.
      
      However, using READ_ONCE() to access firmware tables is slightly unusual
      in any case, and here, we only need to ensure that 'event' is not
      dereferenced again after it gets unmapped, but this is already taken
      care of by the implicit barrier() semantics of the early_memunmap()
      call.
      
      Cc: <stable@vger.kernel.org>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Jarkko Sakkinen <jarkko@kernel.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Reported-by: default avatarNathan Chancellor <nathan@kernel.org>
      Tested-by: default avatarNathan Chancellor <nathan@kernel.org>
      Link: https://github.com/ClangBuiltLinux/linux/issues/1782
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ed18307
    • Marc Zyngier's avatar
      KVM: arm64: Fix S1PTW handling on RO memslots · 3ad31129
      Marc Zyngier authored
      commit 406504c7 upstream.
      
      A recent development on the EFI front has resulted in guests having
      their page tables baked in the firmware binary, and mapped into the
      IPA space as part of a read-only memslot. Not only is this legitimate,
      but it also results in added security, so thumbs up.
      
      It is possible to take an S1PTW translation fault if the S1 PTs are
      unmapped at stage-2. However, KVM unconditionally treats S1PTW as a
      write to correctly handle hardware AF/DB updates to the S1 PTs.
      Furthermore, KVM injects an exception into the guest for S1PTW writes.
      In the aforementioned case this results in the guest taking an abort
      it won't recover from, as the S1 PTs mapping the vectors suffer from
      the same problem.
      
      So clearly our handling is... wrong.
      
      Instead, switch to a two-pronged approach:
      
      - On S1PTW translation fault, handle the fault as a read
      
      - On S1PTW permission fault, handle the fault as a write
      
      This is of no consequence to SW that *writes* to its PTs (the write
      will trigger a non-S1PTW fault), and SW that uses RO PTs will not
      use HW-assisted AF/DB anyway, as that'd be wrong.
      
      Only in the case described in c4ad98e4 ("KVM: arm64: Assume write
      fault on S1PTW permission fault on instruction fetch") do we end-up
      with two back-to-back faults (page being evicted and faulted back).
      I don't think this is a case worth optimising for.
      
      Fixes: c4ad98e4
      
       ("KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch")
      Reviewed-by: default avatarOliver Upton <oliver.upton@linux.dev>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Regression-tested-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ad31129
    • Frederick Lawler's avatar
      net: sched: disallow noqueue for qdisc classes · 9b83ec63
      Frederick Lawler authored
      commit 96398560 upstream.
      
      While experimenting with applying noqueue to a classful queue discipline,
      we discovered a NULL pointer dereference in the __dev_queue_xmit()
      path that generates a kernel OOPS:
      
          # dev=enp0s5
          # tc qdisc replace dev $dev root handle 1: htb default 1
          # tc class add dev $dev parent 1: classid 1:1 htb rate 10mbit
          # tc qdisc add dev $dev parent 1:1 handle 10: noqueue
          # ping -I $dev -w 1 -c 1 1.1.1.1
      
      [    2.172856] BUG: kernel NULL pointer dereference, address: 0000000000000000
      [    2.173217] #PF: supervisor instruction fetch in kernel mode
      ...
      [    2.178451] Call Trace:
      [    2.178577]  <TASK>
      [    2.178686]  htb_enqueue+0x1c8/0x370
      [    2.178880]  dev_qdisc_enqueue+0x15/0x90
      [    2.179093]  __dev_queue_xmit+0x798/0xd00
      [    2.179305]  ? _raw_write_lock_bh+0xe/0x30
      [    2.179522]  ? __local_bh_enable_ip+0x32/0x70
      [    2.179759]  ? ___neigh_create+0x610/0x840
      [    2.179968]  ? eth_header+0x21/0xc0
      [    2.180144]  ip_finish_output2+0x15e/0x4f0
      [    2.180348]  ? dst_output+0x30/0x30
      [    2.180525]  ip_push_pending_frames+0x9d/0xb0
      [    2.180739]  raw_sendmsg+0x601/0xcb0
      [    2.180916]  ? _raw_spin_trylock+0xe/0x50
      [    2.181112]  ? _raw_spin_unlock_irqrestore+0x16/0x30
      [    2.181354]  ? get_page_from_freelist+0xcd6/0xdf0
      [    2.181594]  ? sock_sendmsg+0x56/0x60
      [    2.181781]  sock_sendmsg+0x56/0x60
      [    2.181958]  __sys_sendto+0xf7/0x160
      [    2.182139]  ? handle_mm_fault+0x6e/0x1d0
      [    2.182366]  ? do_user_addr_fault+0x1e1/0x660
      [    2.182627]  __x64_sys_sendto+0x1b/0x30
      [    2.182881]  do_syscall_64+0x38/0x90
      [    2.183085]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
      ...
      [    2.187402]  </TASK>
      
      Previously in commit d66d6c31 ("net: sched: register noqueue
      qdisc"), NULL was set for the noqueue discipline on noqueue init
      so that __dev_queue_xmit() falls through for the noqueue case. This
      also sets a bypass of the enqueue NULL check in the
      register_qdisc() function for the struct noqueue_disc_ops.
      
      Classful queue disciplines make it past the NULL check in
      __dev_queue_xmit() because the discipline is set to htb (in this case),
      and then in the call to __dev_xmit_skb(), it calls into htb_enqueue()
      which grabs a leaf node for a class and then calls qdisc_enqueue() by
      passing in a queue discipline which assumes ->enqueue() is not set to NULL.
      
      Fix this by not allowing classes to be assigned to the noqueue
      discipline. Linux TC Notes states that classes cannot be set to
      the noqueue discipline. [1] Let's enforce that here.
      
      Links:
      1. https://linux-tc-notes.sourceforge.net/tc/doc/sch_noqueue.txt
      
      Fixes: d66d6c31
      
       ("net: sched: register noqueue qdisc")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarFrederick Lawler <fred@cloudflare.com>
      Reviewed-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Link: https://lore.kernel.org/r/20230109163906.706000-1-fred@cloudflare.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9b83ec63
    • Isaac J. Manjarres's avatar
      driver core: Fix bus_type.match() error handling in __driver_attach() · aa52acef
      Isaac J. Manjarres authored
      commit 27c0d217 upstream.
      
      When a driver registers with a bus, it will attempt to match with every
      device on the bus through the __driver_attach() function. Currently, if
      the bus_type.match() function encounters an error that is not
      -EPROBE_DEFER, __driver_attach() will return a negative error code, which
      causes the driver registration logic to stop trying to match with the
      remaining devices on the bus.
      
      This behavior is not correct; a failure while matching a driver to a
      device does not mean that the driver won't be able to match and bind
      with other devices on the bus. Update the logic in __driver_attach()
      to reflect this.
      
      Fixes: 656b8035
      
       ("ARM: 8524/1: driver cohandle -EPROBE_DEFER from bus_type.match()")
      Cc: stable@vger.kernel.org
      Cc: Saravana Kannan <saravanak@google.com>
      Signed-off-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Link: https://lore.kernel.org/r/20220921001414.4046492-1-isaacmanjarres@google.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aa52acef
    • Muhammad Usama Anjum's avatar
      selftests: set the BUILD variable to absolute path · 8d60a905
      Muhammad Usama Anjum authored
      commit 5ad51ab6
      
       upstream.
      
      The build of kselftests fails if relative path is specified through
      KBUILD_OUTPUT or O=<path> method. BUILD variable is used to determine
      the path of the output objects. When make is run from other directories
      with relative paths, the exact path of the build objects is ambiguous
      and build fails.
      
      	make[1]: Entering directory '/home/usama/repos/kernel/linux_mainline2/tools/testing/selftests/alsa'
      	gcc     mixer-test.c -L/usr/lib/x86_64-linux-gnu -lasound  -o build/kselftest/alsa/mixer-test
      	/usr/bin/ld: cannot open output file build/kselftest/alsa/mixer-test
      
      Set the BUILD variable to the absolute path of the output directory.
      Make the logic readable and easy to follow. Use spaces instead of tabs
      for indentation as if with tab indentation is considered recipe in make.
      
      Signed-off-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Signed-off-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Signed-off-by: default avatarTyler Hicks (Microsoft) <code@tyhicks.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d60a905
    • Shuah Khan's avatar
      selftests: Fix kselftest O=objdir build from cluttering top level objdir · cad6d2bb
      Shuah Khan authored
      commit 29e911ef
      
       upstream.
      
      make kselftest-all O=objdir builds create generated objects in objdir.
      This clutters the top level directory with kselftest objects. Fix it
      to create sub-directory under objdir for kselftest objects.
      
      Signed-off-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Signed-off-by: default avatarTyler Hicks (Microsoft) <code@tyhicks.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cad6d2bb
    • Helge Deller's avatar
      parisc: Align parisc MADV_XXX constants with all other architectures · 320dbbd8
      Helge Deller authored
      commit 71bdea6f
      
       upstream.
      
      Adjust some MADV_XXX constants to be in sync what their values are on
      all other platforms. There is currently no reason to have an own
      numbering on parisc, but it requires workarounds in many userspace
      sources (e.g. glibc, qemu, ...) - which are often forgotten and thus
      introduce bugs and different behaviour on parisc.
      
      A wrapper avoids an ABI breakage for existing userspace applications by
      translating any old values to the new ones, so this change allows us to
      move over all programs to the new ABI over time.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      320dbbd8
    • Jan Kara's avatar
      mbcache: Avoid nesting of cache->c_list_lock under bit locks · d868597b
      Jan Kara authored
      commit 5fc4cbd9 upstream.
      
      Commit 307af6c8 ("mbcache: automatically delete entries from cache
      on freeing") started nesting cache->c_list_lock under the bit locks
      protecting hash buckets of the mbcache hash table in
      mb_cache_entry_create(). This causes problems for real-time kernels
      because there spinlocks are sleeping locks while bitlocks stay atomic.
      Luckily the nesting is easy to avoid by holding entry reference until
      the entry is added to the LRU list. This makes sure we cannot race with
      entry deletion.
      
      Cc: stable@kernel.org
      Fixes: 307af6c8
      
       ("mbcache: automatically delete entries from cache on freeing")
      Reported-by: default avatarMike Galbraith <efault@gmx.de>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20220908091032.10513-1-jack@suse.cz
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d868597b
    • Linus Torvalds's avatar
      hfs/hfsplus: avoid WARN_ON() for sanity check, use proper error handling · da23752d
      Linus Torvalds authored
      commit cb7a95af upstream.
      
      Commit 55d1cbbb
      
       ("hfs/hfsplus: use WARN_ON for sanity check") fixed
      a build warning by turning a comment into a WARN_ON(), but it turns out
      that syzbot then complains because it can trigger said warning with a
      corrupted hfs image.
      
      The warning actually does warn about a bad situation, but we are much
      better off just handling it as the error it is.  So rather than warn
      about us doing bad things, stop doing the bad things and return -EIO.
      
      While at it, also fix a memory leak that was introduced by an earlier
      fix for a similar syzbot warning situation, and add a check for one case
      that historically wasn't handled at all (ie neither comment nor
      subsequent WARN_ON).
      
      Reported-by: default avatar <syzbot+7bb7cd3595533513a9e7@syzkaller.appspotmail.com>
      Fixes: 55d1cbbb ("hfs/hfsplus: use WARN_ON for sanity check")
      Fixes: 8d824e69
      
       ("hfs: fix OOB Read in __hfs_brec_find")
      Link: https://lore.kernel.org/lkml/000000000000dbce4e05f170f289@google.com/
      Tested-by: default avatarMichael Schmitz <schmitzmic@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Viacheslav Dubeyko <slava@dubeyko.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da23752d
    • Arnd Bergmann's avatar
      hfs/hfsplus: use WARN_ON for sanity check · 781fa141
      Arnd Bergmann authored
      commit 55d1cbbb
      
       upstream.
      
      gcc warns about a couple of instances in which a sanity check exists but
      the author wasn't sure how to react to it failing, which makes it look
      like a possible bug:
      
        fs/hfsplus/inode.c: In function 'hfsplus_cat_read_inode':
        fs/hfsplus/inode.c:503:37: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
          503 |                         /* panic? */;
              |                                     ^
        fs/hfsplus/inode.c:524:37: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
          524 |                         /* panic? */;
              |                                     ^
        fs/hfsplus/inode.c: In function 'hfsplus_cat_write_inode':
        fs/hfsplus/inode.c:582:37: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
          582 |                         /* panic? */;
              |                                     ^
        fs/hfsplus/inode.c:608:37: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
          608 |                         /* panic? */;
              |                                     ^
        fs/hfs/inode.c: In function 'hfs_write_inode':
        fs/hfs/inode.c:464:37: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
          464 |                         /* panic? */;
              |                                     ^
        fs/hfs/inode.c:485:37: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
          485 |                         /* panic? */;
              |                                     ^
      
      panic() is probably not the correct choice here, but a WARN_ON
      seems appropriate and avoids the compile-time warning.
      
      Link: https://lkml.kernel.org/r/20210927102149.1809384-1-arnd@kernel.org
      Link: https://lore.kernel.org/all/20210322223249.2632268-1-arnd@kernel.org/
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christian Brauner <christian.brauner@ubuntu.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      781fa141
    • Eric Biggers's avatar
      ext4: don't allow journal inode to have encrypt flag · b47c6901
      Eric Biggers authored
      commit 105c78e1
      
       upstream.
      
      Mounting a filesystem whose journal inode has the encrypt flag causes a
      NULL dereference in fscrypt_limit_io_blocks() when the 'inlinecrypt'
      mount option is used.
      
      The problem is that when jbd2_journal_init_inode() calls bmap(), it
      eventually finds its way into ext4_iomap_begin(), which calls
      fscrypt_limit_io_blocks().  fscrypt_limit_io_blocks() requires that if
      the inode is encrypted, then its encryption key must already be set up.
      That's not the case here, since the journal inode is never "opened" like
      a normal file would be.  Hence the crash.
      
      A reproducer is:
      
          mkfs.ext4 -F /dev/vdb
          debugfs -w /dev/vdb -R "set_inode_field <8> flags 0x80808"
          mount /dev/vdb /mnt -o inlinecrypt
      
      To fix this, make ext4 consider journal inodes with the encrypt flag to
      be invalid.  (Note, maybe other flags should be rejected on the journal
      inode too.  For now, this is just the minimal fix for the above issue.)
      
      I've marked this as fixing the commit that introduced the call to
      fscrypt_limit_io_blocks(), since that's what made an actual crash start
      being possible.  But this fix could be applied to any version of ext4
      that supports the encrypt feature.
      
      Reported-by: default avatar <syzbot+ba9dac45bc76c490b7c3@syzkaller.appspotmail.com>
      Fixes: 38ea50da
      
       ("ext4: support direct I/O with fscrypt using blk-crypto")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Link: https://lore.kernel.org/r/20221102053312.189962-1-ebiggers@kernel.org
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b47c6901
    • Ben Dooks's avatar
      riscv: uaccess: fix type of 0 variable on error in get_user() · 1464feb5
      Ben Dooks authored
      commit b9b916ae
      
       upstream.
      
      If the get_user(x, ptr) has x as a pointer, then the setting
      of (x) = 0 is going to produce the following sparse warning,
      so fix this by forcing the type of 'x' when access_ok() fails.
      
      fs/aio.c:2073:21: warning: Using plain integer as NULL pointer
      
      Signed-off-by: default avatarBen Dooks <ben-linux@fluff.org>
      Reviewed-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      Link: https://lore.kernel.org/r/20221229170545.718264-1-ben-linux@fluff.org
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1464feb5
    • Jeff Layton's avatar
      nfsd: fix handling of readdir in v4root vs. mount upcall timeout · da41069c
      Jeff Layton authored
      commit cad85337
      
       upstream.
      
      If v4 READDIR operation hits a mountpoint and gets back an error,
      then it will include that entry in the reply and set RDATTR_ERROR for it
      to the error.
      
      That's fine for "normal" exported filesystems, but on the v4root, we
      need to be more careful to only expose the existence of dentries that
      lead to exports.
      
      If the mountd upcall times out while checking to see whether a
      mountpoint on the v4root is exported, then we have no recourse other
      than to fail the whole operation.
      
      Cc: Steve Dickson <steved@redhat.com>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216777
      Reported-by: default avatarJianHong Yin <yin-jianhong@163.com>
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da41069c
    • Rodrigo Branco's avatar
      x86/bugs: Flush IBP in ib_prctl_set() · 8cbd7f26
      Rodrigo Branco authored
      commit a664ec91
      
       upstream.
      
      We missed the window between the TIF flag update and the next reschedule.
      
      Signed-off-by: default avatarRodrigo Branco <bsdaemon@google.com>
      Reviewed-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8cbd7f26
    • Hans de Goede's avatar
      ASoC: Intel: bytcr_rt5640: Add quirk for the Advantech MICA-071 tablet · ba780bff
      Hans de Goede authored
      [ Upstream commit a1dec9d7
      
       ]
      
      The Advantech MICA-071 tablet deviates from the defaults for
      a non CR Bay Trail based tablet in several ways:
      
      1. It uses an analog MIC on IN3 rather then using DMIC1
      2. It only has 1 speaker
      3. It needs the OVCD current threshold to be set to 1500uA instead of
         the default 2000uA to reliable differentiate between headphones vs
         headsets
      
      Add a quirk with these settings for this tablet.
      
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Acked-by: default avatarPierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
      Link: https://lore.kernel.org/r/20221213123246.11226-1-hdegoede@redhat.com
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ba780bff
    • Jan Kara's avatar
      udf: Fix extension of the last extent in the file · e66ae100
      Jan Kara authored
      [ Upstream commit 83c7423d ]
      
      When extending the last extent in the file within the last block, we
      wrongly computed the length of the last extent. This is mostly a
      cosmetical problem since the extent does not contain any data and the
      length will be fixed up by following operations but still.
      
      Fixes: 1f3868f0
      
       ("udf: Fix extending file within last block")
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e66ae100
    • Zhengchao Shao's avatar
      caif: fix memory leak in cfctrl_linkup_request() · 33df9c5d
      Zhengchao Shao authored
      [ Upstream commit fe69230f ]
      
      When linktype is unknown or kzalloc failed in cfctrl_linkup_request(),
      pkt is not released. Add release process to error path.
      
      Fixes: b482cd20 ("net-caif: add CAIF core protocol stack")
      Fixes: 8d545c8f
      
       ("caif: Disconnect without waiting for response")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/20230104065146.1153009-1-shaozhengchao@huawei.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      33df9c5d
    • Dan Carpenter's avatar
      drm/i915: unpin on error in intel_vgpu_shadow_mm_pin() · 3cb1ee82
      Dan Carpenter authored
      [ Upstream commit 3792fc50 ]
      
      Call intel_vgpu_unpin_mm() on this error path.
      
      Fixes: 41874148
      
       ("drm/i915/gvt: Adding ppgtt to GVT GEM context after shadow pdps settled.")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarZhenyu Wang <zhenyuw@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/Y3OQ5tgZIVxyQ/WV@kili
      Reviewed-by: default avatarZhenyu Wang <zhenyuw@linux.intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3cb1ee82
    • Szymon Heidrich's avatar
      usb: rndis_host: Secure rndis_query check against int overflow · ebe6d2fc
      Szymon Heidrich authored
      [ Upstream commit c7dd1380 ]
      
      Variables off and len typed as uint32 in rndis_query function
      are controlled by incoming RNDIS response message thus their
      value may be manipulated. Setting off to a unexpectetly large
      value will cause the sum with len and 8 to overflow and pass
      the implemented validation step. Consequently the response
      pointer will be referring to a location past the expected
      buffer boundaries allowing information leakage e.g. via
      RNDIS_OID_802_3_PERMANENT_ADDRESS OID.
      
      Fixes: ddda0862
      
       ("USB: rndis_host, various cleanups")
      Signed-off-by: default avatarSzymon Heidrich <szymon.heidrich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ebe6d2fc
    • Daniil Tatianin's avatar
      drivers/net/bonding/bond_3ad: return when there's no aggregator · a07b4895
      Daniil Tatianin authored
      [ Upstream commit 9c807965 ]
      
      Otherwise we would dereference a NULL aggregator pointer when calling
      __set_agg_ports_ready on the line below.
      
      Found by Linux Verification Center (linuxtesting.org) with the SVACE
      static analysis tool.
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarDaniil Tatianin <d-tatianin@yandex-team.ru>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a07b4895
    • Miaoqian Lin's avatar
      perf tools: Fix resources leak in perf_data__open_dir() · 2f7a09c1
      Miaoqian Lin authored
      [ Upstream commit 0a6564eb ]
      
      In perf_data__open_dir(), opendir() opens the directory stream.  Add
      missing closedir() to release it after use.
      
      Fixes: eb617670
      
       ("perf data: Add perf_data__open_dir_data function")
      Reviewed-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarMiaoqian Lin <linmq006@gmail.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20221229090903.1402395-1-linmq006@gmail.com
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2f7a09c1
    • Jamal Hadi Salim's avatar
      net: sched: cbq: dont intepret cls results when asked to drop · 6b17b846
      Jamal Hadi Salim authored
      [ Upstream commit caa4b35b ]
      
      If asked to drop a packet via TC_ACT_SHOT it is unsafe to assume that
      res.class contains a valid pointer
      
      Sample splat reported by Kyle Zeng
      
      [    5.405624] 0: reclassify loop, rule prio 0, protocol 800
      [    5.406326] ==================================================================
      [    5.407240] BUG: KASAN: slab-out-of-bounds in cbq_enqueue+0x54b/0xea0
      [    5.407987] Read of size 1 at addr ffff88800e3122aa by task poc/299
      [    5.408731]
      [    5.408897] CPU: 0 PID: 299 Comm: poc Not tainted 5.10.155+ #15
      [    5.409516] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS 1.15.0-1 04/01/2014
      [    5.410439] Call Trace:
      [    5.410764]  dump_stack+0x87/0xcd
      [    5.411153]  print_address_description+0x7a/0x6b0
      [    5.411687]  ? vprintk_func+0xb9/0xc0
      [    5.411905]  ? printk+0x76/0x96
      [    5.412110]  ? cbq_enqueue+0x54b/0xea0
      [    5.412323]  kasan_report+0x17d/0x220
      [    5.412591]  ? cbq_enqueue+0x54b/0xea0
      [    5.412803]  __asan_report_load1_noabort+0x10/0x20
      [    5.413119]  cbq_enqueue+0x54b/0xea0
      [    5.413400]  ? __kasan_check_write+0x10/0x20
      [    5.413679]  __dev_queue_xmit+0x9c0/0x1db0
      [    5.413922]  dev_queue_xmit+0xc/0x10
      [    5.414136]  ip_finish_output2+0x8bc/0xcd0
      [    5.414436]  __ip_finish_output+0x472/0x7a0
      [    5.414692]  ip_finish_output+0x5c/0x190
      [    5.414940]  ip_output+0x2d8/0x3c0
      [    5.415150]  ? ip_mc_finish_output+0x320/0x320
      [    5.415429]  __ip_queue_xmit+0x753/0x1760
      [    5.415664]  ip_queue_xmit+0x47/0x60
      [    5.415874]  __tcp_transmit_skb+0x1ef9/0x34c0
      [    5.416129]  tcp_connect+0x1f5e/0x4cb0
      [    5.416347]  tcp_v4_connect+0xc8d/0x18c0
      [    5.416577]  __inet_stream_connect+0x1ae/0xb40
      [    5.416836]  ? local_bh_enable+0x11/0x20
      [    5.417066]  ? lock_sock_nested+0x175/0x1d0
      [    5.417309]  inet_stream_connect+0x5d/0x90
      [    5.417548]  ? __inet_stream_connect+0xb40/0xb40
      [    5.417817]  __sys_connect+0x260/0x2b0
      [    5.418037]  __x64_sys_connect+0x76/0x80
      [    5.418267]  do_syscall_64+0x31/0x50
      [    5.418477]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
      [    5.418770] RIP: 0033:0x473bb7
      [    5.418952] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00
      00 00 90 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2a 00 00
      00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 18 89 54 24 0c 48 89 34
      24 89
      [    5.420046] RSP: 002b:00007fffd20eb0f8 EFLAGS: 00000246 ORIG_RAX:
      000000000000002a
      [    5.420472] RAX: ffffffffffffffda RBX: 00007fffd20eb578 RCX: 0000000000473bb7
      [    5.420872] RDX: 0000000000000010 RSI: 00007fffd20eb110 RDI: 0000000000000007
      [    5.421271] RBP: 00007fffd20eb150 R08: 0000000000000001 R09: 0000000000000004
      [    5.421671] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
      [    5.422071] R13: 00007fffd20eb568 R14: 00000000004fc740 R15: 0000000000000002
      [    5.422471]
      [    5.422562] Allocated by task 299:
      [    5.422782]  __kasan_kmalloc+0x12d/0x160
      [    5.423007]  kasan_kmalloc+0x5/0x10
      [    5.423208]  kmem_cache_alloc_trace+0x201/0x2e0
      [    5.423492]  tcf_proto_create+0x65/0x290
      [    5.423721]  tc_new_tfilter+0x137e/0x1830
      [    5.423957]  rtnetlink_rcv_msg+0x730/0x9f0
      [    5.424197]  netlink_rcv_skb+0x166/0x300
      [    5.424428]  rtnetlink_rcv+0x11/0x20
      [    5.424639]  netlink_unicast+0x673/0x860
      [    5.424870]  netlink_sendmsg+0x6af/0x9f0
      [    5.425100]  __sys_sendto+0x58d/0x5a0
      [    5.425315]  __x64_sys_sendto+0xda/0xf0
      [    5.425539]  do_syscall_64+0x31/0x50
      [    5.425764]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
      [    5.426065]
      [    5.426157] The buggy address belongs to the object at ffff88800e312200
      [    5.426157]  which belongs to the cache kmalloc-128 of size 128
      [    5.426955] The buggy address is located 42 bytes to the right of
      [    5.426955]  128-byte region [ffff88800e312200, ffff88800e312280)
      [    5.427688] The buggy address belongs to the page:
      [    5.427992] page:000000009875fabc refcount:1 mapcount:0
      mapping:0000000000000000 index:0x0 pfn:0xe312
      [    5.428562] flags: 0x100000000000200(slab)
      [    5.428812] raw: 0100000000000200 dead000000000100 dead000000000122
      ffff888007843680
      [    5.429325] raw: 0000000000000000 0000000000100010 00000001ffffffff
      ffff88800e312401
      [    5.429875] page dumped because: kasan: bad access detected
      [    5.430214] page->mem_cgroup:ffff88800e312401
      [    5.430471]
      [    5.430564] Memory state around the buggy address:
      [    5.430846]  ffff88800e312180: fc fc fc fc fc fc fc fc fc fc fc fc
      fc fc fc fc
      [    5.431267]  ffff88800e312200: 00 00 00 00 00 00 00 00 00 00 00 00
      00 00 00 fc
      [    5.431705] >ffff88800e312280: fc fc fc fc fc fc fc fc fc fc fc fc
      fc fc fc fc
      [    5.432123]                                   ^
      [    5.432391]  ffff88800e312300: 00 00 00 00 00 00 00 00 00 00 00 00
      00 00 00 fc
      [    5.432810]  ffff88800e312380: fc fc fc fc fc fc fc fc fc fc fc fc
      fc fc fc fc
      [    5.433229] ==================================================================
      [    5.433648] Disabling lock debugging due to kernel taint
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Reported-by: default avatarKyle Zeng <zengyhkyle@gmail.com>
      Signed-off-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6b17b846
    • Jamal Hadi Salim's avatar
      net: sched: atm: dont intepret cls results when asked to drop · 63e469cb
      Jamal Hadi Salim authored
      [ Upstream commit a2965c7b ]
      
      If asked to drop a packet via TC_ACT_SHOT it is unsafe to assume
      res.class contains a valid pointer
      Fixes: b0188d4d
      
       ("[NET_SCHED]: sch_atm: Lindent")
      
      Signed-off-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      63e469cb
    • Maor Gottlieb's avatar
      RDMA/mlx5: Fix validation of max_rd_atomic caps for DC · d16e5fef
      Maor Gottlieb authored
      [ Upstream commit 8de8482f ]
      
      Currently, when modifying DC, we validate max_rd_atomic user attribute
      against the RC cap, validate against DC. RC and DC QP types have different
      device limitations.
      
      This can cause userspace created DC QPs to malfunction.
      
      Fixes: c32a4f29
      
       ("IB/mlx5: Add support for DC Initiator QP")
      Link: https://lore.kernel.org/r/0c5aee72cea188c3bb770f4207cce7abc9b6fc74.1672231736.git.leonro@nvidia.com
      Signed-off-by: default avatarMaor Gottlieb <maorg@nvidia.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d16e5fef
    • Leon Romanovsky's avatar
      RDMA/uverbs: Silence shiftTooManyBitsSigned warning · 564fdc2f
      Leon Romanovsky authored
      [ Upstream commit 9b8d8469
      
       ]
      
      Fix reported by kbuild warning.
      
         drivers/infiniband/core/uverbs_cmd.c:1897:47: warning: Shifting signed 32-bit value by 31 bits is undefined behaviour [shiftTooManyBitsSigned]
          BUILD_BUG_ON(IB_USER_LAST_QP_ATTR_MASK == (1 << 31));
                                                       ^
      Link: https://lore.kernel.org/r/20200720175627.1273096-3-leon@kernel.org
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Stable-dep-of: 8de8482f
      
       ("RDMA/mlx5: Fix validation of max_rd_atomic caps for DC")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      564fdc2f
    • Miaoqian Lin's avatar
      net: phy: xgmiitorgmii: Fix refcount leak in xgmiitorgmii_probe · 00616bd1
      Miaoqian Lin authored
      [ Upstream commit d0395358 ]
      
      of_phy_find_device() return device node with refcount incremented.
      Call put_device() to relese it when not needed anymore.
      
      Fixes: ab4e6ee5
      
       ("net: phy: xgmiitorgmii: Check phy_driver ready before accessing")
      Signed-off-by: default avatarMiaoqian Lin <linmq006@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      00616bd1
    • Jiguang Xiao's avatar
      net: amd-xgbe: add missed tasklet_kill · 904ad95b
      Jiguang Xiao authored
      [ Upstream commit d530ece7 ]
      
      The driver does not call tasklet_kill in several places.
      Add the calls to fix it.
      
      Fixes: 85b85c85
      
       ("amd-xgbe: Re-issue interrupt if interrupt status not cleared")
      Signed-off-by: default avatarJiguang Xiao <jiguang.xiao@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      904ad95b
    • Stefano Garzarella's avatar
      vhost: fix range used in translate_desc() · a15cbe9b
      Stefano Garzarella authored
      [ Upstream commit 98047313 ]
      
      vhost_iotlb_itree_first() requires `start` and `last` parameters
      to search for a mapping that overlaps the range.
      
      In translate_desc() we cyclically call vhost_iotlb_itree_first(),
      incrementing `addr` by the amount already translated, so rightly
      we move the `start` parameter passed to vhost_iotlb_itree_first(),
      but we should hold the `last` parameter constant.
      
      Let's fix it by saving the `last` parameter value before incrementing
      `addr` in the loop.
      
      Fixes: a9709d68
      
       ("vhost: convert pre sorted vhost memory array to interval tree")
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Message-Id: <20221109102503.18816-3-sgarzare@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a15cbe9b
    • Miaoqian Lin's avatar
      nfc: Fix potential resource leaks · d8e41031
      Miaoqian Lin authored
      [ Upstream commit df49908f ]
      
      nfc_get_device() take reference for the device, add missing
      nfc_put_device() to release it when not need anymore.
      Also fix the style warnning by use error EOPNOTSUPP instead of
      ENOTSUPP.
      
      Fixes: 5ce3f32b ("NFC: netlink: SE API implementation")
      Fixes: 29e76924
      
       ("nfc: netlink: Add capability to reply to vendor_cmd with data")
      Signed-off-by: default avatarMiaoqian Lin <linmq006@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d8e41031
    • Daniil Tatianin's avatar
      qlcnic: prevent ->dcb use-after-free on qlcnic_dcb_enable() failure · 8f97eeb0
      Daniil Tatianin authored
      [ Upstream commit 13a7c896 ]
      
      adapter->dcb would get silently freed inside qlcnic_dcb_enable() in
      case qlcnic_dcb_attach() would return an error, which always happens
      under OOM conditions. This would lead to use-after-free because both
      of the existing callers invoke qlcnic_dcb_get_info() on the obtained
      pointer, which is potentially freed at that point.
      
      Propagate errors from qlcnic_dcb_enable(), and instead free the dcb
      pointer at callsite using qlcnic_dcb_free(). This also removes the now
      unused qlcnic_clear_dcb_ops() helper, which was a simple wrapper around
      kfree() also causing memory leaks for partially initialized dcb.
      
      Found by Linux Verification Center (linuxtesting.org) with the SVACE
      static analysis tool.
      
      Fixes: 3c44bba1
      
       ("qlcnic: Disable DCB operations from SR-IOV VFs")
      Reviewed-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Signed-off-by: default avatarDaniil Tatianin <d-tatianin@yandex-team.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8f97eeb0
    • Hawkins Jiawei's avatar
      net: sched: fix memory leak in tcindex_set_parms · 55ac68b5
      Hawkins Jiawei authored
      [ Upstream commit 399ab7fe ]
      
      Syzkaller reports a memory leak as follows:
      ====================================
      BUG: memory leak
      unreferenced object 0xffff88810c287f00 (size 256):
        comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046
          [<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline]
          [<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline]
          [<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline]
          [<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline]
          [<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342
          [<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553
          [<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147
          [<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082
          [<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540
          [<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
          [<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345
          [<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921
          [<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline]
          [<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734
          [<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482
          [<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536
          [<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622
          [<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline]
          [<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline]
          [<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648
          [<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
          [<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
          [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
      ====================================
      
      Kernel uses tcindex_change() to change an existing
      filter properties.
      
      Yet the problem is that, during the process of changing,
      if `old_r` is retrieved from `p->perfect`, then
      kernel uses tcindex_alloc_perfect_hash() to newly
      allocate filter results, uses tcindex_filter_result_init()
      to clear the old filter result, without destroying
      its tcf_exts structure, which triggers the above memory leak.
      
      To be more specific, there are only two source for the `old_r`,
      according to the tcindex_lookup(). `old_r` is retrieved from
      `p->perfect`, or `old_r` is retrieved from `p->h`.
      
        * If `old_r` is retrieved from `p->perfect`, kernel uses
      tcindex_alloc_perfect_hash() to newly allocate the
      filter results. Then `r` is assigned with `cp->perfect + handle`,
      which is newly allocated. So condition `old_r && old_r != r` is
      true in this situation, and kernel uses tcindex_filter_result_init()
      to clear the old filter result, without destroying
      its tcf_exts structure
      
        * If `old_r` is retrieved from `p->h`, then `p->perfect` is NULL
      according to the tcindex_lookup(). Considering that `cp->h`
      is directly copied from `p->h` and `p->perfect` is NULL,
      `r` is assigned with `tcindex_lookup(cp, handle)`, whose value
      should be the same as `old_r`, so condition `old_r && old_r != r`
      is false in this situation, kernel ignores using
      tcindex_filter_result_init() to clear the old filter result.
      
      So only when `old_r` is retrieved from `p->perfect` does kernel use
      tcindex_filter_result_init() to clear the old filter result, which
      triggers the above memory leak.
      
      Considering that there already exists a tc_filter_wq workqueue
      to destroy the old tcindex_data by tcindex_partial_destroy_work()
      at the end of tcindex_set_parms(), this patch solves
      this memory leak bug by removing this old filter result
      clearing part and delegating it to the tc_filter_wq workqueue.
      
      Note that this patch doesn't introduce any other issues. If
      `old_r` is retrieved from `p->perfect`, this patch just
      delegates old filter result clearing part to the
      tc_filter_wq workqueue; If `old_r` is retrieved from `p->h`,
      kernel doesn't reach the old filter result clearing part, so
      removing this part has no effect.
      
      [Thanks to the suggestion from Jakub Kicinski, Cong Wang, Paolo Abeni
      and Dmitry Vyukov]
      
      Fixes: b9a24bb7
      
       ("net_sched: properly handle failure case of tcf_exts_init()")
      Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/
      Reported-by: default avatar <syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com>
      Tested-by: default avatar <syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com>
      Cc: Cong Wang <cong.wang@bytedance.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarHawkins Jiawei <yin31149@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      55ac68b5
    • Jie Wang's avatar
      net: hns3: add interrupts re-initialization while doing VF FLR · b6a0623f
      Jie Wang authored
      [ Upstream commit 09e6b30e ]
      
      Currently keep alive message between PF and VF may be lost and the VF is
      unalive in PF. So the VF will not do reset during PF FLR reset process.
      This would make the allocated interrupt resources of VF invalid and VF
      would't receive or respond to PF any more.
      
      So this patch adds VF interrupts re-initialization during VF FLR for VF
      recovery in above cases.
      
      Fixes: 862d969a
      
       ("net: hns3: do VF's pci re-initialization while PF doing FLR")
      Signed-off-by: default avatarJie Wang <wangjie125@huawei.com>
      Signed-off-by: default avatarHao Lan <lanhao@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b6a0623f
    • Jeff Layton's avatar
      nfsd: shut down the NFSv4 state objects before the filecache · f9c551d8
      Jeff Layton authored
      [ Upstream commit 789e1e10
      
       ]
      
      Currently, we shut down the filecache before trying to clean up the
      stateids that depend on it. This leads to the kernel trying to free an
      nfsd_file twice, and a refcount overput on the nf_mark.
      
      Change the shutdown procedure to tear down all of the stateids prior
      to shutting down the filecache.
      
      Reported-and-tested-by: default avatarWang Yugui <wangyugui@e16-tech.com>
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Fixes: 5e113224
      
       ("nfsd: nfsd_file cache entries should be per net namespace")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f9c551d8
    • Jakub Kicinski's avatar
      bpf: pull before calling skb_postpull_rcsum() · 7eaaee52
      Jakub Kicinski authored
      [ Upstream commit 54c3f1a8 ]
      
      Anand hit a BUG() when pulling off headers on egress to a SW tunnel.
      We get to skb_checksum_help() with an invalid checksum offset
      (commit d7ea0d9d
      
       ("net: remove two BUG() from skb_checksum_help()")
      converted those BUGs to WARN_ONs()).
      He points out oddness in how skb_postpull_rcsum() gets used.
      Indeed looks like we should pull before "postpull", otherwise
      the CHECKSUM_PARTIAL fixup from skb_postpull_rcsum() will not
      be able to do its job:
      
      	if (skb->ip_summed == CHECKSUM_PARTIAL &&
      	    skb_checksum_start_offset(skb) < 0)
      		skb->ip_summed = CHECKSUM_NONE;
      
      Reported-by: default avatarAnand Parthasarathy <anpartha@meta.com>
      Fixes: 6578171a
      
       ("bpf: add bpf_skb_change_proto helper")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Link: https://lore.kernel.org/r/20221220004701.402165-1-kuba@kernel.org
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7eaaee52
    • minoura makoto's avatar
      SUNRPC: ensure the matching upcall is in-flight upon downcall · 1d449cd2
      minoura makoto authored
      [ Upstream commit b18cba09 ]
      
      Commit 9130b8db
      
       ("SUNRPC: allow for upcalls for the same uid
      but different gss service") introduced `auth` argument to
      __gss_find_upcall(), but in gss_pipe_downcall() it was left as NULL
      since it (and auth->service) was not (yet) determined.
      
      When multiple upcalls with the same uid and different service are
      ongoing, it could happen that __gss_find_upcall(), which returns the
      first match found in the pipe->in_downcall list, could not find the
      correct gss_msg corresponding to the downcall we are looking for.
      Moreover, it might return a msg which is not sent to rpc.gssd yet.
      
      We could see mount.nfs process hung in D state with multiple mount.nfs
      are executed in parallel.  The call trace below is of CentOS 7.9
      kernel-3.10.0-1160.24.1.el7.x86_64 but we observed the same hang w/
      elrepo kernel-ml-6.0.7-1.el7.
      
      PID: 71258  TASK: ffff91ebd4be0000  CPU: 36  COMMAND: "mount.nfs"
       #0 [ffff9203ca3234f8] __schedule at ffffffffa3b8899f
       #1 [ffff9203ca323580] schedule at ffffffffa3b88eb9
       #2 [ffff9203ca323590] gss_cred_init at ffffffffc0355818 [auth_rpcgss]
       #3 [ffff9203ca323658] rpcauth_lookup_credcache at ffffffffc0421ebc
      [sunrpc]
       #4 [ffff9203ca3236d8] gss_lookup_cred at ffffffffc0353633 [auth_rpcgss]
       #5 [ffff9203ca3236e8] rpcauth_lookupcred at ffffffffc0421581 [sunrpc]
       #6 [ffff9203ca323740] rpcauth_refreshcred at ffffffffc04223d3 [sunrpc]
       #7 [ffff9203ca3237a0] call_refresh at ffffffffc04103dc [sunrpc]
       #8 [ffff9203ca3237b8] __rpc_execute at ffffffffc041e1c9 [sunrpc]
       #9 [ffff9203ca323820] rpc_execute at ffffffffc0420a48 [sunrpc]
      
      The scenario is like this. Let's say there are two upcalls for
      services A and B, A -> B in pipe->in_downcall, B -> A in pipe->pipe.
      
      When rpc.gssd reads pipe to get the upcall msg corresponding to
      service B from pipe->pipe and then writes the response, in
      gss_pipe_downcall the msg corresponding to service A will be picked
      because only uid is used to find the msg and it is before the one for
      B in pipe->in_downcall.  And the process waiting for the msg
      corresponding to service A will be woken up.
      
      Actual scheduing of that process might be after rpc.gssd processes the
      next msg.  In rpc_pipe_generic_upcall it clears msg->errno (for A).
      The process is scheduled to see gss_msg->ctx == NULL and
      gss_msg->msg.errno == 0, therefore it cannot break the loop in
      gss_create_upcall and is never woken up after that.
      
      This patch adds a simple check to ensure that a msg which is not
      sent to rpc.gssd yet is not chosen as the matching upcall upon
      receiving a downcall.
      
      Signed-off-by: default avatarminoura makoto <minoura@valinux.co.jp>
      Signed-off-by: default avatarHiroshi Shimamoto <h-shimamoto@nec.com>
      Tested-by: default avatarHiroshi Shimamoto <h-shimamoto@nec.com>
      Cc: Trond Myklebust <trondmy@hammerspace.com>
      Fixes: 9130b8db
      
       ("SUNRPC: allow for upcalls for same uid but different gss service")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1d449cd2
    • Jan Kara's avatar
      ext4: fix deadlock due to mbcache entry corruption · af530652
      Jan Kara authored
      [ Upstream commit a44e84a9 ]
      
      When manipulating xattr blocks, we can deadlock infinitely looping
      inside ext4_xattr_block_set() where we constantly keep finding xattr
      block for reuse in mbcache but we are unable to reuse it because its
      reference count is too big. This happens because cache entry for the
      xattr block is marked as reusable (e_reusable set) although its
      reference count is too big. When this inconsistency happens, this
      inconsistent state is kept indefinitely and so ext4_xattr_block_set()
      keeps retrying indefinitely.
      
      The inconsistent state is caused by non-atomic update of e_reusable bit.
      e_reusable is part of a bitfield and e_reusable update can race with
      update of e_referenced bit in the same bitfield resulting in loss of one
      of the updates. Fix the problem by using atomic bitops instead.
      
      This bug has been around for many years, but it became *much* easier
      to hit after commit 65f8b800 ("ext4: fix race when reusing xattr
      blocks").
      
      Cc: stable@vger.kernel.org
      Fixes: 6048c64b ("mbcache: add reusable flag to cache entries")
      Fixes: 65f8b800
      
       ("ext4: fix race when reusing xattr blocks")
      Reported-and-tested-by: default avatarJeremi Piotrowski <jpiotrowski@linux.microsoft.com>
      Reported-by: default avatarThilo Fromm <t-lo@linux.microsoft.com>
      Link: https://lore.kernel.org/r/c77bf00f-4618-7149-56f1-b8d1664b9d07@linux.microsoft.com/
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Link: https://lore.kernel.org/r/20221123193950.16758-1-jack@suse.cz
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      af530652
    • Jan Kara's avatar
      mbcache: automatically delete entries from cache on freeing · 711ef736
      Jan Kara authored
      [ Upstream commit 307af6c8
      
       ]
      
      Use the fact that entries with elevated refcount are not removed from
      the hash and just move removal of the entry from the hash to the entry
      freeing time. When doing this we also change the generic code to hold
      one reference to the cache entry, not two of them, which makes code
      somewhat more obvious.
      
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20220712105436.32204-10-jack@suse.cz
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Stable-dep-of: a44e84a9
      
       ("ext4: fix deadlock due to mbcache entry corruption")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      711ef736