Skip to content
  1. Sep 30, 2023
    • Domenico Cerasuolo's avatar
      mm: zswap: fix potential memory corruption on duplicate store · ca56489c
      Domenico Cerasuolo authored
      While stress-testing zswap a memory corruption was happening when writing
      back pages.  __frontswap_store used to check for duplicate entries before
      attempting to store a page in zswap, this was because if the store fails
      the old entry isn't removed from the tree.  This change removes duplicate
      entries in zswap_store before the actual attempt.
      
      [cerasuolodomenico@gmail.com: add a warning and a comment, per Johannes]
        Link: https://lkml.kernel.org/r/20230925130002.1929369-1-cerasuolodomenico@gmail.com
      Link: https://lkml.kernel.org/r/20230922172211.1704917-1-cerasuolodomenico@gmail.com
      
      
      Fixes: 42c06a0e ("mm: kill frontswap")
      Signed-off-by: default avatarDomenico Cerasuolo <cerasuolodomenico@gmail.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarNhat Pham <nphamcs@gmail.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ca56489c
    • Ryan Roberts's avatar
      arm64: hugetlb: fix set_huge_pte_at() to work with all swap entries · 6f1bace9
      Ryan Roberts authored
      When called with a swap entry that does not embed a PFN (e.g. 
      PTE_MARKER_POISONED or PTE_MARKER_UFFD_WP), the previous implementation of
      set_huge_pte_at() would either cause a BUG() to fire (if CONFIG_DEBUG_VM
      is enabled) or cause a dereference of an invalid address and subsequent
      panic.
      
      arm64's huge pte implementation supports multiple huge page sizes, some of
      which are implemented in the page table with multiple contiguous entries. 
      So set_huge_pte_at() needs to work out how big the logical pte is, so that
      it can also work out how many physical ptes (or pmds) need to be written. 
      It previously did this by grabbing the folio out of the pte and querying
      its size.
      
      However, there are cases when the pte being set is actually a swap entry. 
      But this also used to work fine, because for huge ptes, we only ever saw
      migration entries and hwpoison entries.  And both of these types of swap
      entries have a PFN embedded, so the code would grab that and everything
      still worked out.
      
      But over time, more calls to set_huge_pte_at() have been added that set
      swap entry types that do not embed a PFN.  And this causes the code to go
      bang.  The triggering case is for the uffd poison test, commit
      99aa7721 ("selftests/mm: add uffd unit test for UFFDIO_POISON"), which
      causes a PTE_MARKER_POISONED swap entry to be set, coutesey of commit
      8a13897f ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") -
      added in v6.5-rc7.  Although review shows that there are other call sites
      that set PTE_MARKER_UFFD_WP (which also has no PFN), these don't trigger
      on arm64 because arm64 doesn't support UFFD WP.
      
      Arguably, the root cause is really due to commit 18f39629 ("mm:
      hugetlb: kill set_huge_swap_pte_at()"), which aimed to simplify the
      interface to the core code by removing set_huge_swap_pte_at() (which took
      a page size parameter) and replacing it with calls to set_huge_pte_at()
      where the size was inferred from the folio, as descibed above.  While that
      commit didn't break anything at the time, it did break the interface
      because it couldn't handle swap entries without PFNs.  And since then new
      callers have come along which rely on this working.  But given the
      brokeness is only observable after commit 8a13897f ("mm: userfaultfd:
      support UFFDIO_POISON for hugetlbfs"), that one gets the Fixes tag.
      
      Now that we have modified the set_huge_pte_at() interface to pass the huge
      page size in the previous patch, we can trivially fix this issue.
      
      Link: https://lkml.kernel.org/r/20230922115804.2043771-3-ryan.roberts@arm.com
      
      
      Fixes: 8a13897f ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
      Signed-off-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Alexandre Ghiti <alex@ghiti.fr>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: <stable@vger.kernel.org>	[6.5+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6f1bace9
    • Ryan Roberts's avatar
      mm: hugetlb: add huge page size param to set_huge_pte_at() · 935d4f0c
      Ryan Roberts authored
      Patch series "Fix set_huge_pte_at() panic on arm64", v2.
      
      This series fixes a bug in arm64's implementation of set_huge_pte_at(),
      which can result in an unprivileged user causing a kernel panic.  The
      problem was triggered when running the new uffd poison mm selftest for
      HUGETLB memory.  This test (and the uffd poison feature) was merged for
      v6.5-rc7.
      
      Ideally, I'd like to get this fix in for v6.6 and I've cc'ed stable
      (correctly this time) to get it backported to v6.5, where the issue first
      showed up.
      
      
      Description of Bug
      ==================
      
      arm64's huge pte implementation supports multiple huge page sizes, some of
      which are implemented in the page table with multiple contiguous entries. 
      So set_huge_pte_at() needs to work out how big the logical pte is, so that
      it can also work out how many physical ptes (or pmds) need to be written. 
      It previously did this by grabbing the folio out of the pte and querying
      its size.
      
      However, there are cases when the pte being set is actually a swap entry. 
      But this also used to work fine, because for huge ptes, we only ever saw
      migration entries and hwpoison entries.  And both of these types of swap
      entries have a PFN embedded, so the code would grab that and everything
      still worked out.
      
      But over time, more calls to set_huge_pte_at() have been added that set
      swap entry types that do not embed a PFN.  And this causes the code to go
      bang.  The triggering case is for the uffd poison test, commit
      99aa7721 ("selftests/mm: add uffd unit test for UFFDIO_POISON"), which
      causes a PTE_MARKER_POISONED swap entry to be set, coutesey of commit
      8a13897f ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") -
      added in v6.5-rc7.  Although review shows that there are other call sites
      that set PTE_MARKER_UFFD_WP (which also has no PFN), these don't trigger
      on arm64 because arm64 doesn't support UFFD WP.
      
      If CONFIG_DEBUG_VM is enabled, we do at least get a BUG(), but otherwise,
      it will dereference a bad pointer in page_folio():
      
          static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry)
          {
              VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry));
      
              return page_folio(pfn_to_page(swp_offset_pfn(entry)));
          }
      
      
      Fix
      ===
      
      The simplest fix would have been to revert the dodgy cleanup commit
      18f39629 ("mm: hugetlb: kill set_huge_swap_pte_at()"), but since
      things have moved on, this would have required an audit of all the new
      set_huge_pte_at() call sites to see if they should be converted to
      set_huge_swap_pte_at().  As per the original intent of the change, it
      would also leave us open to future bugs when people invariably get it
      wrong and call the wrong helper.
      
      So instead, I've added a huge page size parameter to set_huge_pte_at(). 
      This means that the arm64 code has the size in all cases.  It's a bigger
      change, due to needing to touch the arches that implement the function,
      but it is entirely mechanical, so in my view, low risk.
      
      I've compile-tested all touched arches; arm64, parisc, powerpc, riscv,
      s390, sparc (and additionally x86_64).  I've additionally booted and run
      mm selftests against arm64, where I observe the uffd poison test is fixed,
      and there are no other regressions.
      
      
      This patch (of 2):
      
      In order to fix a bug, arm64 needs to be told the size of the huge page
      for which the pte is being set in set_huge_pte_at().  Provide for this by
      adding an `unsigned long sz` parameter to the function.  This follows the
      same pattern as huge_pte_clear().
      
      This commit makes the required interface modifications to the core mm as
      well as all arches that implement this function (arm64, parisc, powerpc,
      riscv, s390, sparc).  The actual arm64 bug will be fixed in a separate
      commit.
      
      No behavioral changes intended.
      
      Link: https://lkml.kernel.org/r/20230922115804.2043771-1-ryan.roberts@arm.com
      Link: https://lkml.kernel.org/r/20230922115804.2043771-2-ryan.roberts@arm.com
      
      
      Fixes: 8a13897f ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
      Signed-off-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>	[powerpc 8xx]
      Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>	[vmalloc change]
      Cc: Alexandre Ghiti <alex@ghiti.fr>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: <stable@vger.kernel.org>	[6.5+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      935d4f0c
    • Liam R. Howlett's avatar
      maple_tree: add MAS_UNDERFLOW and MAS_OVERFLOW states · a8091f03
      Liam R. Howlett authored
      When updating the maple tree iterator to avoid rewalks, an issue was
      introduced when shifting beyond the limits.  This can be seen by trying to
      go to the previous address of 0, which would set the maple node to
      MAS_NONE and keep the range as the last entry.
      
      Subsequent calls to mas_find() would then search upwards from mas->last
      and skip the value at mas->index/mas->last.  This showed up as a bug in
      mprotect which skips the actual VMA at the current range after attempting
      to go to the previous VMA from 0.
      
      Since MAS_NONE may already be set when searching for a value that isn't
      contained within a node, changing the handling of MAS_NONE in mas_find()
      would make the code more complicated and error prone.  Furthermore, there
      was no way to tell which limit was hit, and thus which action to take
      (next or the entry at the current range).
      
      This solution is to add two states to track what happened with the
      previous iterator action.  This allows for the expected behaviour of the
      next command to return the correct item (either the item at the range
      requested, or the next/previous).
      
      Tests are also added and updated accordingly.
      
      Link: https://lkml.kernel.org/r/20230921181236.509072-3-Liam.Howlett@oracle.com
      Link: https://gist.github.com/heatd/85d2971fae1501b55b6ea401fbbe485b
      Link: https://lore.kernel.org/linux-mm/20230921181236.509072-1-Liam.Howlett@oracle.com/
      
      
      Fixes: 39193685 ("maple_tree: try harder to keep active node with mas_prev()")
      Signed-off-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Reported-by: default avatarPedro Falcato <pedro.falcato@gmail.com>
      Closes: https://gist.github.com/heatd/85d2971fae1501b55b6ea401fbbe485b
      Closes: https://bugs.archlinux.org/task/79656
      
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a8091f03
    • Liam R. Howlett's avatar
      maple_tree: add mas_is_active() to detect in-tree walks · 5c590804
      Liam R. Howlett authored
      Patch series "maple_tree: Fix mas_prev() state regression".
      
      Pedro Falcato retported an mprotect regression [1] which was bisected back
      to the iterator changes for maple tree.  Root cause analysis showed the
      mas_prev() running off the end of the VMA space (previous from 0) followed
      by mas_find(), would skip the first value.
      
      This patchset introduces maple state underflow/overflow so the sequence of
      calls on the maple state will return what the user expects.
      
      Users who encounter this bug may see mprotect(), userfaultfd_register(),
      and mlock() fail on VMAs mapped with address 0.
      
      
      This patch (of 2):
      
      Instead of constantly checking each possibility of the maple state,
      create a fast path that will skip over checking unlikely states.
      
      Link: https://lkml.kernel.org/r/20230921181236.509072-1-Liam.Howlett@oracle.com
      Link: https://lkml.kernel.org/r/20230921181236.509072-2-Liam.Howlett@oracle.com
      
      
      Signed-off-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Pedro Falcato <pedro.falcato@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      5c590804
    • Pan Bian's avatar
      nilfs2: fix potential use after free in nilfs_gccache_submit_read_data() · 7ee29fac
      Pan Bian authored
      In nilfs_gccache_submit_read_data(), brelse(bh) is called to drop the
      reference count of bh when the call to nilfs_dat_translate() fails.  If
      the reference count hits 0 and its owner page gets unlocked, bh may be
      freed.  However, bh->b_page is dereferenced to put the page after that,
      which may result in a use-after-free bug.  This patch moves the release
      operation after unlocking and putting the page.
      
      NOTE: The function in question is only called in GC, and in combination
      with current userland tools, address translation using DAT does not occur
      in that function, so the code path that causes this issue will not be
      executed.  However, it is possible to run that code path by intentionally
      modifying the userland GC library or by calling the GC ioctl directly.
      
      [konishi.ryusuke@gmail.com: NOTE added to the commit log]
      Link: https://lkml.kernel.org/r/1543201709-53191-1-git-send-email-bianpan2016@163.com
      Link: https://lkml.kernel.org/r/20230921141731.10073-1-konishi.ryusuke@gmail.com
      
      
      Fixes: a3d93f70 ("nilfs2: block cache for garbage collection")
      Signed-off-by: default avatarPan Bian <bianpan2016@163.com>
      Reported-by: default avatarFerry Meng <mengferry@linux.alibaba.com>
      Closes: https://lkml.kernel.org/r/20230818092022.111054-1-mengferry@linux.alibaba.com
      
      
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7ee29fac
    • Matthew Wilcox (Oracle)'s avatar
      mm: abstract moving to the next PFN · ce60f27b
      Matthew Wilcox (Oracle) authored
      In order to fix the L1TF vulnerability, x86 can invert the PTE bits for
      PROT_NONE VMAs, which means we cannot move from one PTE to the next by
      adding 1 to the PFN field of the PTE.  This results in the BUG reported at
      [1].
      
      Abstract advancing the PTE to the next PFN through a pte_next_pfn()
      function/macro.
      
      Link: https://lkml.kernel.org/r/20230920040958.866520-1-willy@infradead.org
      
      
      Fixes: bcc6cc83 ("mm: add default definition of set_ptes()")
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reported-by: default avatar <syzbot+55cc72f8cc3a549119df@syzkaller.appspotmail.com>
      Closes: https://lkml.kernel.org/r/000000000000d099fa0604f03351@google.com
      
       [1]
      Reviewed-by: default avatarYin Fengwei <fengwei.yin@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ce60f27b
    • Matthew Wilcox (Oracle)'s avatar
      mm: report success more often from filemap_map_folio_range() · a501a070
      Matthew Wilcox (Oracle) authored
      Even though we had successfully mapped the relevant page, we would rarely
      return success from filemap_map_folio_range().  That leads to falling back
      from the VMA lock path to the mmap_lock path, which is a speed &
      scalability issue.  Found by inspection.
      
      Link: https://lkml.kernel.org/r/20230920035336.854212-1-willy@infradead.org
      
      
      Fixes: 617c28ec ("filemap: batch PTE mappings")
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarYin Fengwei <fengwei.yin@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a501a070
    • Greg Ungerer's avatar
      fs: binfmt_elf_efpic: fix personality for ELF-FDPIC · 7c315158
      Greg Ungerer authored
      The elf-fdpic loader hard sets the process personality to either
      PER_LINUX_FDPIC for true elf-fdpic binaries or to PER_LINUX for normal ELF
      binaries (in this case they would be constant displacement compiled with
      -pie for example).  The problem with that is that it will lose any other
      bits that may be in the ELF header personality (such as the "bug
      emulation" bits).
      
      On the ARM architecture the ADDR_LIMIT_32BIT flag is used to signify a
      normal 32bit binary - as opposed to a legacy 26bit address binary.  This
      matters since start_thread() will set the ARM CPSR register as required
      based on this flag.  If the elf-fdpic loader loses this bit the process
      will be mis-configured and crash out pretty quickly.
      
      Modify elf-fdpic loader personality setting so that it preserves the upper
      three bytes by using the SET_PERSONALITY macro to set it.  This macro in
      the generic case sets PER_LINUX and preserves the upper bytes. 
      Architectures can override this for their specific use case, and ARM does
      exactly this.
      
      The problem shows up quite easily running under qemu using the ARM
      architecture, but not necessarily on all types of real ARM hardware.  If
      the underlying ARM processor does not support the legacy 26-bit addressing
      mode then everything will work as expected.
      
      Link: https://lkml.kernel.org/r/20230907011808.2985083-1-gerg@kernel.org
      
      
      Fixes: 1bde925d ("fs/binfmt_elf_fdpic.c: provide NOMMU loader for regular ELF binaries")
      Signed-off-by: default avatarGreg Ungerer <gerg@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Greg Ungerer <gerg@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7c315158
  2. Sep 20, 2023
  3. Sep 18, 2023
  4. Sep 17, 2023
    • Song Liu's avatar
      x86/purgatory: Remove LTO flags · 75b2f7e4
      Song Liu authored
      
      
      -flto* implies -ffunction-sections. With LTO enabled, ld.lld generates
      multiple .text sections for purgatory.ro:
      
        $ readelf -S purgatory.ro  | grep " .text"
          [ 1] .text             PROGBITS         0000000000000000  00000040
          [ 7] .text.purgatory   PROGBITS         0000000000000000  000020e0
          [ 9] .text.warn        PROGBITS         0000000000000000  000021c0
          [13] .text.sha256_upda PROGBITS         0000000000000000  000022f0
          [15] .text.sha224_upda PROGBITS         0000000000000000  00002be0
          [17] .text.sha256_fina PROGBITS         0000000000000000  00002bf0
          [19] .text.sha224_fina PROGBITS         0000000000000000  00002cc0
      
      This causes WARNING from kexec_purgatory_setup_sechdrs():
      
        WARNING: CPU: 26 PID: 110894 at kernel/kexec_file.c:919
        kexec_load_purgatory+0x37f/0x390
      
      Fix this by disabling LTO for purgatory.
      
      [ AFAICT, x86 is the only arch that supports LTO and purgatory. ]
      
      We could also fix this with an explicit linker script to rejoin .text.*
      sections back into .text. However, given the benefit of LTOing purgatory
      is small, simply disable the production of more .text.* sections for now.
      
      Fixes: b33fff07 ("x86, build: allow LTO to be selected")
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: default avatarSami Tolvanen <samitolvanen@google.com>
      Link: https://lore.kernel.org/r/20230914170138.995606-1-song@kernel.org
      75b2f7e4
    • Kirill A. Shutemov's avatar
      x86/boot/compressed: Reserve more memory for page tables · f530ee95
      Kirill A. Shutemov authored
      
      
      The decompressor has a hard limit on the number of page tables it can
      allocate. This limit is defined at compile-time and will cause boot
      failure if it is reached.
      
      The kernel is very strict and calculates the limit precisely for the
      worst-case scenario based on the current configuration. However, it is
      easy to forget to adjust the limit when a new use-case arises. The
      worst-case scenario is rarely encountered during sanity checks.
      
      In the case of enabling 5-level paging, a use-case was overlooked. The
      limit needs to be increased by one to accommodate the additional level.
      This oversight went unnoticed until Aaron attempted to run the kernel
      via kexec with 5-level paging and unaccepted memory enabled.
      
      Update wost-case calculations to include 5-level paging.
      
      To address this issue, let's allocate some extra space for page tables.
      128K should be sufficient for any use-case. The logic can be simplified
      by using a single value for all kernel configurations.
      
      [ Also add a warning, should this memory run low - by Dave Hansen. ]
      
      Fixes: 34bbb000 ("x86/boot/compressed: Enable 5-level paging during decompression stage")
      Reported-by: default avatarAaron Lu <aaron.lu@intel.com>
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20230915070221.10266-1-kirill.shutemov@linux.intel.com
      f530ee95
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v6.6' of... · f0b0d403
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - Fix kernel-devel RPM and linux-headers Deb package
      
       - Fix too long argument list error in 'make modules_install'
      
      * tag 'kbuild-fixes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        kbuild: avoid long argument lists in make modules_install
        kbuild: fix kernel-devel RPM package and linux-headers Deb package
      f0b0d403
    • Linus Torvalds's avatar
      vm: fix move_vma() memory accounting being off · 3cec5049
      Linus Torvalds authored
      
      
      Commit 408579cd ("mm: Update do_vmi_align_munmap() return
      semantics") seems to have updated one of the callers of do_vmi_munmap()
      incorrectly: it used to check for the error case (which didn't
      change: negative means error).
      
      That commit changed the check to the success case (which did change:
      before that commit, 0 was success, and 1 was "success and lock
      downgraded".  After the change, it's always 0 for success, and the lock
      will have been released if requested).
      
      This didn't change any actual VM behavior _except_ for memory accounting
      when 'VM_ACCOUNT' was set on the vma.  Which made the wrong return value
      test fairly subtle, since everything continues to work.
      
      Or rather - it continues to work but the "Committed memory" accounting
      goes all wonky (Committed_AS value in /proc/meminfo), and depending on
      settings that then causes problems much much later as the VM relies on
      bogus statistics for its heuristics.
      
      Revert that one line of the change back to the original logic.
      
      Fixes: 408579cd ("mm: Update do_vmi_align_munmap() return semantics")
      Reported-by: default avatarChristoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
      Reported-bisected-and-tested-by: default avatarMichael Labiuk <michael.labiuk@virtuozzo.com>
      Cc: Bagas Sanjaya <bagasdotme@gmail.com>
      Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
      Link: https://lore.kernel.org/all/1694366957@msgid.manchmal.in-ulm.de/
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3cec5049
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · ad8a69f3
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "16 small(ish) fixes all in drivers.
      
        The major fixes are in pm8001 (fixes MSI-X issue going back to its
        origin), the qla2xxx endianness fix, which fixes a bug on big endian
        and the lpfc ones which can cause an oops on module removal without
        them"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: lpfc: Prevent use-after-free during rmmod with mapped NVMe rports
        scsi: lpfc: Early return after marking final NLP_DROPPED flag in dev_loss_tmo
        scsi: lpfc: Fix the NULL vs IS_ERR() bug for debugfs_create_file()
        scsi: target: core: Fix target_cmd_counter leak
        scsi: pm8001: Setup IRQs on resume
        scsi: pm80xx: Avoid leaking tags when processing OPC_INB_SET_CONTROLLER_CONFIG command
        scsi: pm80xx: Use phy-specific SAS address when sending PHY_START command
        scsi: ufs: core: Poll HCS.UCRDY before issuing a UIC command
        scsi: ufs: core: Move __ufshcd_send_uic_cmd() outside host_lock
        scsi: qedf: Add synchronization between I/O completions and abort
        scsi: target: Replace strlcpy() with strscpy()
        scsi: qla2xxx: Fix NULL vs IS_ERR() bug for debugfs_create_dir()
        scsi: qla2xxx: Use raw_smp_processor_id() instead of smp_processor_id()
        scsi: qla2xxx: Correct endianness for rqstlen and rsplen
        scsi: ppa: Fix accidentally reversed conditions for 16-bit and 32-bit EPP
        scsi: megaraid_sas: Fix deadlock on firmware crashdump
      ad8a69f3
    • Linus Torvalds's avatar
      Merge tag 'ata-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata · cc3e5afc
      Linus Torvalds authored
      Pull ata fixes from Damien Le Moal:
      
       - Fix link power management transitions to disallow unsupported states
         (Niklas)
      
       - A small string handling fix for the sata_mv driver (Christophe)
      
       - Clear port pending interrupts before reset, as per AHCI
         specifications (Szuying).
      
         Followup fixes for this one are to not clear ATA_PFLAG_EH_PENDING in
         ata_eh_reset() to allow EH to continue on with other actions recorded
         with error interrupts triggered before EH completes. And an
         additional fix to avoid thawing a port twice in EH (Niklas)
      
       - Small code style fixes in the pata_parport driver to silence the
         build bot as it keeps complaining about bad indentation (me)
      
       - A fix for the recent CDL code to avoid fetching sense data for
         successful commands when not necessary for correct operation (Niklas)
      
      * tag 'ata-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
        ata: libata-core: fetch sense data for successful commands iff CDL enabled
        ata: libata-eh: do not thaw the port twice in ata_eh_reset()
        ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset()
        ata: pata_parport: Fix code style issues
        ata: libahci: clear pending interrupt status
        ata: sata_mv: Fix incorrect string length computation in mv_dump_mem()
        ata: libata: disallow dev-initiated LPM transitions to unsupported states
      cc3e5afc
    • Linus Torvalds's avatar
      Merge tag 'usb-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · cce67b6b
      Linus Torvalds authored
      Pull USB fix from Greg KH:
       "Here is a single USB fix for a much-reported regression for 6.6-rc1.
      
        It resolves a crash in the typec debugfs code for many systems. It's
        been in linux-next with no reported issues, and many people have
        reported it resolving their problem with 6.6-rc1"
      
      * tag 'usb-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        usb: typec: ucsi: Fix NULL pointer dereference
      cce67b6b
    • Linus Torvalds's avatar
      Merge tag 'driver-core-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core · 205d0494
      Linus Torvalds authored
      Pull driver core fixes from Greg KH:
       "Here is a single driver core fix for a much-reported-by-sysbot issue
        that showed up in 6.6-rc1. It's been submitted by many people, all in
        the same way, so it obviously fixes things for them all.
      
        Also in here is a single documentation update adding riscv to the
        embargoed hardware document in case there are any future issues with
        that processor family.
      
        Both of these have been in linux-next with no reported problems"
      
      * tag 'driver-core-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
        Documentation: embargoed-hardware-issues.rst: Add myself for RISC-V
        driver core: return an error when dev_set_name() hasn't happened
      205d0494
    • Linus Torvalds's avatar
      Merge tag 'char-misc-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · fd455e77
      Linus Torvalds authored
      Pull char/misc fix from Greg KH:
       "Here is a single patch for 6.6-rc2 that reverts a 6.5 change for the
        comedi subsystem that has ended up being incorrect and caused drivers
        that were working for people to be unable to be able to be selected to
        build at all.
      
        To fix this, the Kconfig change needs to be reverted and a future set
        of fixes for the ioport dependancies will show up in 6.7-rc1 (there's
        no rush for them.)
      
        This has been in linux-next with no reported issues"
      
      * tag 'char-misc-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        Revert "comedi: add HAS_IOPORT dependencies"
      fd455e77