Skip to content
  1. Jun 24, 2023
    • Yajun Deng's avatar
      mm: pass nid to reserve_bootmem_region() · 61167ad5
      Yajun Deng authored
      
      
      early_pfn_to_nid() is called frequently in init_reserved_page(), it
      returns the node id of the PFN.  These PFN are probably from the same
      memory region, they have the same node id.  It's not necessary to call
      early_pfn_to_nid() for each PFN.
      
      Pass nid to reserve_bootmem_region() and drop the call to
      early_pfn_to_nid() in init_reserved_page().  Also, set nid on all reserved
      pages before doing this, as some reserved memory regions may not be set
      nid.
      
      The most beneficial function is memmap_init_reserved_pages() if
      CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled.
      
      The following data was tested on an x86 machine with 190GB of RAM.
      
      before:
      memmap_init_reserved_pages()  67ms
      
      after:
      memmap_init_reserved_pages()  20ms
      
      Link: https://lkml.kernel.org/r/20230619023406.424298-1-yajun.deng@linux.dev
      Signed-off-by: default avatarYajun Deng <yajun.deng@linux.dev>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      61167ad5
    • Jason Gunthorpe's avatar
      mm/gup: do not return 0 from pin_user_pages_fast() for bad args · 9883c7f8
      Jason Gunthorpe authored
      
      
      These routines are not intended to return zero, the callers cannot do
      anything sane with a 0 return.  They should return an error which means
      future calls to GUP will not succeed, or they should return some non-zero
      number of pinned pages which means GUP should be called again.
      
      If start + nr_pages overflows it should return -EOVERFLOW to signal the
      arguments are invalid.
      
      Syzkaller keeps tripping on this when fuzzing GUP arguments.
      
      Link: https://lkml.kernel.org/r/0-v1-3d5ed1f20d50+104-gup_overflow_jgg@nvidia.com
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Reported-by: default avatar <syzbot+353c7be4964c6253f24a@syzkaller.appspotmail.com>
      Closes: https://lore.kernel.org/all/000000000000094fdd05faa4d3a4@google.com
      Reviewed-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Reviewed-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9883c7f8
    • Jan Glauber's avatar
      mm: fix shmem THP counters on migration · 0b52c420
      Jan Glauber authored
      
      
      The per node numa_stat values for shmem don't change on page migration for
      THP:
      
        grep shmem /sys/fs/cgroup/machine.slice/.../memory.numa_stat:
      
          shmem N0=1092616192 N1=10485760
          shmem_thp N0=1092616192 N1=10485760
      
        migratepages 9181 0 1:
      
          shmem N0=0 N1=1103101952
          shmem_thp N0=1092616192 N1=10485760
      
      Fix that by updating shmem_thp counters likewise to shmem counters on page
      migration.
      
      [jglauber@digitalocean.com: use folio_test_pmd_mappable instead of folio_test_transhuge]
        Link: https://lkml.kernel.org/r/20230622094720.510540-1-jglauber@digitalocean.com
      Link: https://lkml.kernel.org/r/20230619103351.234837-1-jglauber@digitalocean.com
      Signed-off-by: default avatarJan Glauber <jglauber@digitalocean.com>
      Reviewed-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0b52c420
    • Haifeng Xu's avatar
      selftests: cgroup: fix unexpected failure on test_memcg_sock · 3360cd30
      Haifeng Xu authored
      
      
      Before server got a client connection, there were some memory allocations
      in the test memcg, such as user stack.  So do not count those allocations
      which are not related to socket when checking socket memory accounting.
      
      Link: https://lkml.kernel.org/r/20230619124735.2124-1-haifeng.xu@shopee.com
      Signed-off-by: default avatarHaifeng Xu <haifeng.xu@shopee.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Koutný <mkoutny@suse.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Shakeel Butt <shakeelb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3360cd30
    • Haifeng Xu's avatar
      mm/memcontrol: do not tweak node in mem_cgroup_init() · 91f0dcce
      Haifeng Xu authored
      mem_cgroup_init() request for allocations from each possible node, and
      it's used to be a problem because NODE_DATA is not allocated for offline
      node. Things have already changed since commit 09f49dca
      
       ("mm: handle
      uninitialized numa nodes gracefully"), so it's unnecessary to check for
      !node_online nodes here.
      
      How to test?
      
      qemu-system-x86_64 \
        -kernel vmlinux \
        -initrd full.rootfs.cpio.gz \
        -append "console=ttyS0,115200 root=/dev/ram0 nokaslr earlyprintk=serial oops=panic panic_on_warn" \
        -drive format=qcow2,file=vm_disk.qcow2,media=disk,if=ide \
        -enable-kvm \
        -cpu host \
        -m 8G,slots=2,maxmem=16G \
        -smp cores=4,threads=1,sockets=2  \
        -object memory-backend-ram,id=mem0,size=4G \
        -object memory-backend-ram,id=mem1,size=4G \
        -numa node,memdev=mem0,cpus=0-3,nodeid=0 \
        -numa node,memdev=mem1,cpus=4-7,nodeid=1 \
        -numa node,nodeid=2 \
        -net nic,model=virtio,macaddr=52:54:00:12:34:58 \
        -net user \
        -nographic \
        -rtc base=localtime \
        -gdb tcp::6000
      
      Guest state when booting:
      
      [    0.048881] NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x00000000-0xbfffffff]
      [    0.050489] NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x13fffffff] -> [mem 0x00000000-0x13fffffff]
      [    0.052173] NODE_DATA(0) allocated [mem 0x13fffc000-0x13fffffff]
      [    0.053164] NODE_DATA(1) allocated [mem 0x23fffa000-0x23fffdfff]
      [    0.054187] Zone ranges:
      [    0.054587]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
      [    0.055551]   DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
      [    0.056515]   Normal   [mem 0x0000000100000000-0x000000023fffffff]
      [    0.057484] Movable zone start for each node
      [    0.058149] Early memory node ranges
      [    0.058705]   node   0: [mem 0x0000000000001000-0x000000000009efff]
      [    0.059679]   node   0: [mem 0x0000000000100000-0x00000000bffdffff]
      [    0.060659]   node   0: [mem 0x0000000100000000-0x000000013fffffff]
      [    0.061649]   node   1: [mem 0x0000000140000000-0x000000023fffffff]
      [    0.062638] Initmem setup node 0 [mem 0x0000000000001000-0x000000013fffffff]
      [    0.063745] Initmem setup node 1 [mem 0x0000000140000000-0x000000023fffffff]
      [    0.064855]   DMA zone: 158 reserved pages exceeds freesize 0
      [    0.065746] Initializing node 2 as memoryless
      [    0.066437] Initmem setup node 2 as memoryless
      [    0.067132]   DMA zone: 158 reserved pages exceeds freesize 0
      [    0.068037] On node 0, zone DMA: 1 pages in unavailable ranges
      [    0.068265] On node 0, zone DMA: 97 pages in unavailable ranges
      [    0.124755] On node 0, zone Normal: 32 pages in unavailable ranges
      
      cat /sys/devices/system/node/online
      0-1
      cat /sys/devices/system/node/possible
      0-2
      
      Link: https://lkml.kernel.org/r/20230619130442.2487-1-haifeng.xu@shopee.com
      Signed-off-by: default avatarHaifeng Xu <haifeng.xu@shopee.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Shakeel Butt <shakeelb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      91f0dcce
    • Tetsuo Handa's avatar
      kasan,kmsan: remove __GFP_KSWAPD_RECLAIM usage from kasan/kmsan · 726ccdba
      Tetsuo Handa authored
      
      
      syzbot is reporting lockdep warning in __stack_depot_save(), for
      the caller of __stack_depot_save() (i.e. __kasan_record_aux_stack() in
      this report) is responsible for masking __GFP_KSWAPD_RECLAIM flag in
      order not to wake kswapd which in turn wakes kcompactd.
      
      Since kasan/kmsan functions might be called with arbitrary locks held,
      mask __GFP_KSWAPD_RECLAIM flag from all GFP_NOWAIT/GFP_ATOMIC allocations
      in kasan/kmsan.
      
      Note that kmsan_save_stack_with_flags() is changed to mask both
      __GFP_DIRECT_RECLAIM flag and __GFP_KSWAPD_RECLAIM flag, for
      wakeup_kswapd() from wake_all_kswapds() from __alloc_pages_slowpath()
      calls wakeup_kcompactd() if __GFP_KSWAPD_RECLAIM flag is set and
      __GFP_DIRECT_RECLAIM flag is not set.
      
      Link: https://lkml.kernel.org/r/656cb4f5-998b-c8d7-3c61-c2d37aa90f9a@I-love.SAKURA.ne.jp
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: default avatarsyzbot <syzbot+ece2915262061d6e0ac1@syzkaller.appspotmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=ece2915262061d6e0ac1
      Reviewed-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Reviewed-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      726ccdba
    • Baolin Wang's avatar
      mm: compaction: skip memory hole rapidly when isolating migratable pages · 9721fd82
      Baolin Wang authored
      
      
      On some machines, the normal zone can have a large memory hole like below
      memory layout, and we can see the range from 0x100000000 to 0x1800000000
      is a hole.  So when isolating some migratable pages, the scanner can meet
      the hole and it will take more time to skip the large hole.  From my
      measurement, I can see the isolation scanner will take 80us ~ 100us to
      skip the large hole [0x100000000 - 0x1800000000].
      
      So adding a new helper to fast search next online memory section to skip
      the large hole can help to find next suitable pageblock efficiently.  With
      this patch, I can see the large hole scanning only takes < 1us.
      
      [    0.000000] Zone ranges:
      [    0.000000]   DMA      [mem 0x0000000040000000-0x00000000ffffffff]
      [    0.000000]   DMA32    empty
      [    0.000000]   Normal   [mem 0x0000000100000000-0x0000001fa7ffffff]
      [    0.000000] Movable zone start for each node
      [    0.000000] Early memory node ranges
      [    0.000000]   node   0: [mem 0x0000000040000000-0x0000000fffffffff]
      [    0.000000]   node   0: [mem 0x0000001800000000-0x0000001fa3c7ffff]
      [    0.000000]   node   0: [mem 0x0000001fa3c80000-0x0000001fa3ffffff]
      [    0.000000]   node   0: [mem 0x0000001fa4000000-0x0000001fa402ffff]
      [    0.000000]   node   0: [mem 0x0000001fa4030000-0x0000001fa40effff]
      [    0.000000]   node   0: [mem 0x0000001fa40f0000-0x0000001fa73cffff]
      [    0.000000]   node   0: [mem 0x0000001fa73d0000-0x0000001fa745ffff]
      [    0.000000]   node   0: [mem 0x0000001fa7460000-0x0000001fa746ffff]
      [    0.000000]   node   0: [mem 0x0000001fa7470000-0x0000001fa758ffff]
      [    0.000000]   node   0: [mem 0x0000001fa7590000-0x0000001fa7ffffff]
      
      [baolin.wang@linux.alibaba.com: limit next_ptn to not exceed cc->free_pfn]
        Link: https://lkml.kernel.org/r/a1d859c28af0c7e85e91795e7473f553eb180a9d.1686813379.git.baolin.wang@linux.alibaba.com
      Link: https://lkml.kernel.org/r/75b4c8ca36bf44ad8c42bf0685ac19d272e426ec.1686705221.git.baolin.wang@linux.alibaba.com
      Signed-off-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Suggested-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9721fd82
    • Marco Elver's avatar
      kasan, doc: note kasan.fault=panic_on_write behaviour for async modes · 8c293a63
      Marco Elver authored
      Note the behaviour of kasan.fault=panic_on_write for async modes, since
      all asynchronous faults will result in panic (even if they are reads).
      
      Link: https://lkml.kernel.org/r/ZJHfL6vavKUZ3Yd8@elver.google.com
      Fixes: 452c03fd
      
       ("kasan: add support for kasan.fault=panic_on_write")
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Cc: Aleksandr Nogikh <nogikh@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Taras Madan <tarasmadan@google.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8c293a63
    • Andrew Morton's avatar
    • Yu Zhao's avatar
      mm/mglru: make memcg_lru->lock irq safe · 814bc1de
      Yu Zhao authored
      lru_gen_rotate_memcg() can happen in softirq if memory.soft_limit_in_bytes
      is set.  This requires memcg_lru->lock to be irq safe.  Lockdep warns on
      this.
      
      This problem only affects memcg v1.
      
      Link: https://lkml.kernel.org/r/20230619193821.2710944-1-yuzhao@google.com
      Fixes: e4dde56c
      
       ("mm: multi-gen LRU: per-node lru_gen_folio lists")
      Signed-off-by: default avatarYu Zhao <yuzhao@google.com>
      Reported-by: default avatar <syzbot+87c490fd2be656269b6a@syzkaller.appspotmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=87c490fd2be656269b6a
      Reviewed-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      814bc1de
  2. Jun 20, 2023