Skip to content
  1. Sep 27, 2022
    • Sergei Antonov's avatar
      mm: bring back update_mmu_cache() to finish_fault() · 70427f6e
      Sergei Antonov authored
      Running this test program on ARMv4 a few times (sometimes just once)
      reproduces the bug.
      
      int main()
      {
              unsigned i;
              char paragon[SIZE];
              void* ptr;
      
              memset(paragon, 0xAA, SIZE);
              ptr = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
                         MAP_ANON | MAP_SHARED, -1, 0);
              if (ptr == MAP_FAILED) return 1;
              printf("ptr = %p\n", ptr);
              for (i=0;i<10000;i++){
                      memset(ptr, 0xAA, SIZE);
                      if (memcmp(ptr, paragon, SIZE)) {
                              printf("Unexpected bytes on iteration %u!!!\n", i);
                              break;
                      }
              }
              munmap(ptr, SIZE);
      }
      
      In the "ptr" buffer there appear runs of zero bytes which are aligned
      by 16 and their lengths are multiple of 16.
      
      Linux v5.11 does not have the bug, "git bisect" finds the first bad commit:
      f9ce0be7 ("mm: Cleanup faultaround and finish_fault() codepaths")
      
      Before the commit update_mmu_cache() was called during a call to
      filemap_map_pages() as well as finish_fault(). After the commit
      finish_fault() lacks it.
      
      Bring back update_mmu_cache() to finish_fault() to fix the bug.
      Also call update_mmu_tlb() only when returning VM_FAULT_NOPAGE to more
      closely reproduce the code of alloc_set_pte() function that existed before
      the commit.
      
      On many platforms update_mmu_cache() is nop:
       x86, see arch/x86/include/asm/pgtable
       ARMv6+, see arch/arm/include/asm/tlbflush.h
      So, it seems, few users ran into this bug.
      
      Link: https://lkml.kernel.org/r/20220908204809.2012451-1-saproj@gmail.com
      Fixes: f9ce0be7
      
       ("mm: Cleanup faultaround and finish_fault() codepaths")
      Signed-off-by: default avatarSergei Antonov <saproj@gmail.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      70427f6e
    • Christoph Hellwig's avatar
      frontswap: don't call ->init if no ops are registered · 37dcc673
      Christoph Hellwig authored
      If no frontswap module (i.e.  zswap) was registered, frontswap_ops will be
      NULL.  In such situation, swapon crashes with the following stack trace:
      
        Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000000000000000
        Mem abort info:
          ESR = 0x0000000096000004
          EC = 0x25: DABT (current EL), IL = 32 bits
          SET = 0, FnV = 0
          EA = 0, S1PTW = 0
          FSC = 0x04: level 0 translation fault
        Data abort info:
          ISV = 0, ISS = 0x00000004
          CM = 0, WnR = 0
        user pgtable: 4k pages, 48-bit VAs, pgdp=00000020a4fab000
        [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
        Internal error: Oops: 96000004 [#1] SMP
        Modules linked in: zram fsl_dpaa2_eth pcs_lynx phylink ahci_qoriq crct10dif_ce ghash_ce sbsa_gwdt fsl_mc_dpio nvme lm90 nvme_core at803x xhci_plat_hcd rtc_fsl_ftm_alarm xgmac_mdio ahci_platform i2c_imx ip6_tables ip_tables fuse
        Unloaded tainted modules: cppc_cpufreq():1
        CPU: 10 PID: 761 Comm: swapon Not tainted 6.0.0-rc2-00454-g22100432cf14 #1
        Hardware name: SolidRun Ltd. SolidRun CEX7 Platform, BIOS EDK II Jun 21 2022
        pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
        pc : frontswap_init+0x38/0x60
        lr : __do_sys_swapon+0x8a8/0x9f4
        sp : ffff80000969bcf0
        x29: ffff80000969bcf0 x28: ffff37bee0d8fc00 x27: ffff80000a7f5000
        x26: fffffcdefb971e80 x25: ffffaba797453b90 x24: 0000000000000064
        x23: ffff37c1f209d1a8 x22: ffff37bee880e000 x21: ffffaba797748560
        x20: ffff37bee0d8fce4 x19: ffffaba797748488 x18: 0000000000000014
        x17: 0000000030ec029a x16: ffffaba795a479b0 x15: 0000000000000000
        x14: 0000000000000000 x13: 0000000000000030 x12: 0000000000000001
        x11: ffff37c63c0aba18 x10: 0000000000000000 x9 : ffffaba7956b8c88
        x8 : ffff80000969bcd0 x7 : 0000000000000000 x6 : 0000000000000000
        x5 : 0000000000000001 x4 : 0000000000000000 x3 : ffffaba79730f000
        x2 : ffff37bee0d8fc00 x1 : 0000000000000000 x0 : 0000000000000000
        Call trace:
        frontswap_init+0x38/0x60
        __do_sys_swapon+0x8a8/0x9f4
        __arm64_sys_swapon+0x28/0x3c
        invoke_syscall+0x78/0x100
        el0_svc_common.constprop.0+0xd4/0xf4
        do_el0_svc+0x38/0x4c
        el0_svc+0x34/0x10c
        el0t_64_sync_handler+0x11c/0x150
        el0t_64_sync+0x190/0x194
        Code: d000e283 910003fd f9006c41 f946d461 (f9400021)
        ---[ end trace 0000000000000000 ]---
      
      Link: https://lkml.kernel.org/r/20220909130829.3262926-1-hch@lst.de
      Fixes: 1da0d94a
      
       ("frontswap: remove support for multiple ops")
      Reported-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarLiu Shixin <liushixin2@huawei.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      37dcc673
    • Naoya Horiguchi's avatar
      mm/huge_memory: use pfn_to_online_page() in split_huge_pages_all() · 2b7aa91b
      Naoya Horiguchi authored
      NULL pointer dereference is triggered when calling thp split via debugfs
      on the system with offlined memory blocks.  With debug option enabled, the
      following kernel messages are printed out:
      
        page:00000000467f4890 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x121c000
        flags: 0x17fffc00000000(node=0|zone=2|lastcpupid=0x1ffff)
        raw: 0017fffc00000000 0000000000000000 dead000000000122 0000000000000000
        raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
        page dumped because: unmovable page
        page:000000007d7ab72e is uninitialized and poisoned
        page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
        ------------[ cut here ]------------
        kernel BUG at include/linux/mm.h:1248!
        invalid opcode: 0000 [#1] PREEMPT SMP PTI
        CPU: 16 PID: 20964 Comm: bash Tainted: G          I        6.0.0-rc3-foll-numa+ #41
        ...
        RIP: 0010:split_huge_pages_write+0xcf4/0xe30
      
      This shows that page_to_nid() in page_zone() is unexpectedly called for an
      offlined memmap.
      
      Use pfn_to_online_page() to get struct page in PFN walker.
      
      Link: https://lkml.kernel.org/r/20220908041150.3430269-1-naoya.horiguchi@linux.dev
      Fixes: f1dd2cd1 ("mm, memory_hotplug: do not associate hotadded memory to zones until online")      [visible after d0dc12e8
      
      ]
      Signed-off-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Co-developed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: <stable@vger.kernel.org>	[5.10+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2b7aa91b
    • Minchan Kim's avatar
      mm: fix madivse_pageout mishandling on non-LRU page · 58d426a7
      Minchan Kim authored
      MADV_PAGEOUT tries to isolate non-LRU pages and gets a warning from
      isolate_lru_page below.
      
      Fix it by checking PageLRU in advance.
      
      ------------[ cut here ]------------
      trying to isolate tail page
      WARNING: CPU: 0 PID: 6175 at mm/folio-compat.c:158 isolate_lru_page+0x130/0x140
      Modules linked in:
      CPU: 0 PID: 6175 Comm: syz-executor.0 Not tainted 5.18.12 #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
      RIP: 0010:isolate_lru_page+0x130/0x140
      
      Link: https://lore.kernel.org/linux-mm/485f8c33.2471b.182d5726afb.Coremail.hantianshuo@iie.ac.cn/
      Link: https://lkml.kernel.org/r/20220908151204.762596-1-minchan@kernel.org
      Fixes: 1a4e58cc
      
       ("mm: introduce MADV_PAGEOUT")
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Reported-by: default avatar韩天ç`• <hantianshuo@iie.ac.cn>
      Suggested-by: default avatarYang Shi <shy828301@gmail.com>
      Acked-by: default avatarYang Shi <shy828301@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      58d426a7
    • Yang Shi's avatar
      powerpc/64s/radix: don't need to broadcast IPI for radix pmd collapse flush · bedf0341
      Yang Shi authored
      The IPI broadcast is used to serialize against fast-GUP, but fast-GUP will
      move to use RCU instead of disabling local interrupts in fast-GUP.  Using
      an IPI is the old-styled way of serializing against fast-GUP although it
      still works as expected now.
      
      And fast-GUP now fixed the potential race with THP collapse by checking
      whether PMD is changed or not.  So IPI broadcast in radix pmd collapse
      flush is not necessary anymore.  But it is still needed for hash TLB.
      
      Link: https://lkml.kernel.org/r/20220907180144.555485-2-shy828301@gmail.com
      
      
      Suggested-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarYang Shi <shy828301@gmail.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      bedf0341
    • Yang Shi's avatar
      mm: gup: fix the fast GUP race against THP collapse · 70cbc3cc
      Yang Shi authored
      Since general RCU GUP fast was introduced in commit 2667f50e ("mm:
      introduce a general RCU get_user_pages_fast()"), a TLB flush is no longer
      sufficient to handle concurrent GUP-fast in all cases, it only handles
      traditional IPI-based GUP-fast correctly.  On architectures that send an
      IPI broadcast on TLB flush, it works as expected.  But on the
      architectures that do not use IPI to broadcast TLB flush, it may have the
      below race:
      
         CPU A                                          CPU B
      THP collapse                                     fast GUP
                                                    gup_pmd_range() <-- see valid pmd
                                                        gup_pte_range() <-- work on pte
      pmdp_collapse_flush() <-- clear pmd and flush
      __collapse_huge_page_isolate()
          check page pinned <-- before GUP bump refcount
                                                            pin the page
                                                            check PTE <-- no change
      __collapse_huge_page_copy()
          copy data to huge page
          ptep_clear()
      install huge pmd for the huge page
                                                            return the stale page
      discard the stale page
      
      The race can be fixed by checking whether PMD is changed or not after
      taking the page pin in fast GUP, just like what it does for PTE.  If the
      PMD is changed it means there may be parallel THP collapse, so GUP should
      back off.
      
      Also update the stale comment about serializing against fast GUP in
      khugepaged.
      
      Link: https://lkml.kernel.org/r/20220907180144.555485-1-shy828301@gmail.com
      Fixes: 2667f50e
      
       ("mm: introduce a general RCU get_user_pages_fast()")
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarYang Shi <shy828301@gmail.com>
      Reviewed-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      70cbc3cc
  2. Sep 12, 2022
  3. Aug 29, 2022