Skip to content
  1. Oct 27, 2018
    • Roman Gushchin's avatar
      mm: don't raise MEMCG_OOM event due to failed high-order allocation · 7a1adfdd
      Roman Gushchin authored
      
      
      It was reported that on some of our machines containers were restarted
      with OOM symptoms without an obvious reason.  Despite there were almost no
      memory pressure and plenty of page cache, MEMCG_OOM event was raised
      occasionally, causing the container management software to think, that OOM
      has happened.  However, no tasks have been killed.
      
      The following investigation showed that the problem is caused by a failing
      attempt to charge a high-order page.  In such case, the OOM killer is
      never invoked.  As shown below, it can happen under conditions, which are
      very far from a real OOM: e.g.  there is plenty of clean page cache and no
      memory pressure.
      
      There is no sense in raising an OOM event in this case, as it might
      confuse a user and lead to wrong and excessive actions (e.g.  restart the
      workload, as in my case).
      
      Let's look at the charging path in try_charge().  If the memory usage is
      about memory.max, which is absolutely natural for most memory cgroups, we
      try to reclaim some pages.  Even if we were able to reclaim enough memory
      for the allocation, the following check can fail due to a race with
      another concurrent allocation:
      
          if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
              goto retry;
      
      For regular pages the following condition will save us from triggering
      the OOM:
      
         if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER))
             goto retry;
      
      But for high-order allocation this condition will intentionally fail.  The
      reason behind is that we'll likely fall to regular pages anyway, so it's
      ok and even preferred to return ENOMEM.
      
      In this case the idea of raising MEMCG_OOM looks dubious.
      
      Fix this by moving MEMCG_OOM raising to mem_cgroup_oom() after allocation
      order check, so that the event won't be raised for high order allocations.
      This change doesn't affect regular pages allocation and charging.
      
      Link: http://lkml.kernel.org/r/20181004214050.7417-1-guro@fb.com
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@kernel.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7a1adfdd
    • Dave Chinner's avatar
      mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock · 64081362
      Dave Chinner authored
      
      
      We've recently seen a workload on XFS filesystems with a repeatable
      deadlock between background writeback and a multi-process application
      doing concurrent writes and fsyncs to a small range of a file.
      
      range_cyclic
      writeback		Process 1		Process 2
      
      xfs_vm_writepages
        write_cache_pages
          writeback_index = 2
          cycled = 0
          ....
          find page 2 dirty
          lock Page 2
          ->writepage
            page 2 writeback
            page 2 clean
            page 2 added to bio
          no more pages
      			write()
      			locks page 1
      			dirties page 1
      			locks page 2
      			dirties page 1
      			fsync()
      			....
      			xfs_vm_writepages
      			write_cache_pages
      			  start index 0
      			  find page 1 towrite
      			  lock Page 1
      			  ->writepage
      			    page 1 writeback
      			    page 1 clean
      			    page 1 added to bio
      			  find page 2 towrite
      			  lock Page 2
      			  page 2 is writeback
      			  <blocks>
      						write()
      						locks page 1
      						dirties page 1
      						fsync()
      						....
      						xfs_vm_writepages
      						write_cache_pages
      						  start index 0
      
          !done && !cycled
            sets index to 0, restarts lookup
          find page 1 dirty
      						  find page 1 towrite
      						  lock Page 1
      						  page 1 is writeback
      						  <blocks>
      
          lock Page 1
          <blocks>
      
      DEADLOCK because:
      
      	- process 1 needs page 2 writeback to complete to make
      	  enough progress to issue IO pending for page 1
      	- writeback needs page 1 writeback to complete so process 2
      	  can progress and unlock the page it is blocked on, then it
      	  can issue the IO pending for page 2
      	- process 2 can't make progress until process 1 issues IO
      	  for page 1
      
      The underlying cause of the problem here is that range_cyclic writeback is
      processing pages in descending index order as we hold higher index pages
      in a structure controlled from above write_cache_pages().  The
      write_cache_pages() caller needs to be able to submit these pages for IO
      before write_cache_pages restarts writeback at mapping index 0 to avoid
      wcp inverting the page lock/writeback wait order.
      
      generic_writepages() is not susceptible to this bug as it has no private
      context held across write_cache_pages() - filesystems using this
      infrastructure always submit pages in ->writepage immediately and so there
      is no problem with range_cyclic going back to mapping index 0.
      
      However:
      	mpage_writepages() has a private bio context,
      	exofs_writepages() has page_collect
      	fuse_writepages() has fuse_fill_wb_data
      	nfs_writepages() has nfs_pageio_descriptor
      	xfs_vm_writepages() has xfs_writepage_ctx
      
      All of these ->writepages implementations can hold pages under writeback
      in their private structures until write_cache_pages() returns, and hence
      they are all susceptible to this deadlock.
      
      Also worth noting is that ext4 has it's own bastardised version of
      write_cache_pages() and so it /may/ have an equivalent deadlock.  I looked
      at the code long enough to understand that it has a similar retry loop for
      range_cyclic writeback reaching the end of the file and then promptly ran
      away before my eyes bled too much.  I'll leave it for the ext4 developers
      to determine if their code is actually has this deadlock and how to fix it
      if it has.
      
      There's a few ways I can see avoid this deadlock.  There's probably more,
      but these are the first I've though of:
      
      1. get rid of range_cyclic altogether
      
      2. range_cyclic always stops at EOF, and we start again from
      writeback index 0 on the next call into write_cache_pages()
      
      2a. wcp also returns EAGAIN to ->writepages implementations to
      indicate range cyclic has hit EOF. writepages implementations can
      then flush the current context and call wpc again to continue. i.e.
      lift the retry into the ->writepages implementation
      
      3. range_cyclic uses trylock_page() rather than lock_page(), and it
      skips pages it can't lock without blocking. It will already do this
      for pages under writeback, so this seems like a no-brainer
      
      3a. all non-WB_SYNC_ALL writeback uses trylock_page() to avoid
      blocking as per pages under writeback.
      
      I don't think #1 is an option - range_cyclic prevents frequently
      dirtied lower file offset from starving background writeback of
      rarely touched higher file offsets.
      
      #2 is simple, and I don't think it will have any impact on
      performance as going back to the start of the file implies an
      immediate seek. We'll have exactly the same number of seeks if we
      switch writeback to another inode, and then come back to this one
      later and restart from index 0.
      
      #2a is pretty much "status quo without the deadlock". Moving the
      retry loop up into the wcp caller means we can issue IO on the
      pending pages before calling wcp again, and so avoid locking or
      waiting on pages in the wrong order. I'm not convinced we need to do
      this given that we get the same thing from #2 on the next writeback
      call from the writeback infrastructure.
      
      #3 is really just a band-aid - it doesn't fix the access/wait
      inversion problem, just prevents it from becoming a deadlock
      situation. I'd prefer we fix the inversion, not sweep it under the
      carpet like this.
      
      #3a is really an optimisation that just so happens to include the
      band-aid fix of #3.
      
      So it seems that the simplest way to fix this issue is to implement
      solution #2
      
      Link: http://lkml.kernel.org/r/20181005054526.21507-1-david@fromorbit.com
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.de>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      64081362
    • Pavel Tatashin's avatar
      mm: move mirrored memory specific code outside of memmap_init_zone · a9a9e77f
      Pavel Tatashin authored
      
      
      memmap_init_zone, is getting complex, because it is called from different
      contexts: hotplug, and during boot, and also because it must handle some
      architecture quirks.  One of them is mirrored memory.
      
      Move the code that decides whether to skip mirrored memory outside of
      memmap_init_zone, into a separate function.
      
      [pasha.tatashin@oracle.com: uninline overlap_memmap_init()]
        Link: http://lkml.kernel.org/r/20180726193509.3326-4-pasha.tatashin@oracle.com
      Link: http://lkml.kernel.org/r/20180724235520.10200-4-pasha.tatashin@oracle.com
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9a9e77f
    • Pavel Tatashin's avatar
      mm: calculate deferred pages after skipping mirrored memory · d3035be4
      Pavel Tatashin authored
      
      
      update_defer_init() should be called only when struct page is about to be
      initialized. Because it counts number of initialized struct pages, but
      there we may skip struct pages if there is some mirrored memory.
      
      So move, update_defer_init() after checking for mirrored memory.
      
      Also, rename update_defer_init() to defer_init() and reverse the return
      boolean to emphasize that this is a boolean function, that tells that the
      reset of memmap initialization should be deferred.
      
      Make this function self-contained: do not pass number of already
      initialized pages in this zone by using static counters.
      
      I found this bug by reading the code.  The effect is that fewer than
      expected struct pages are initialized early in boot, and it is possible
      that in some corner cases we may fail to boot when mirrored pages are
      used.  The deferred on demand code should somewhat mitigate this.  But
      this still brings some inconsistencies compared to when booting without
      mirrored pages, so it is better to fix.
      
      [pasha.tatashin@oracle.com: add comment about defer_init's lack of locking]
        Link: http://lkml.kernel.org/r/20180726193509.3326-3-pasha.tatashin@oracle.com
      [akpm@linux-foundation.org: make defer_init non-inline, __meminit]
      Link: http://lkml.kernel.org/r/20180724235520.10200-3-pasha.tatashin@oracle.com
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d3035be4
    • Pavel Tatashin's avatar
      mm: make memmap_init a proper function · dfb3ccd0
      Pavel Tatashin authored
      
      
      memmap_init is sometimes a macro sometimes a function based on
      __HAVE_ARCH_MEMMAP_INIT.  It is only a function on ia64.  Make memmap_init
      a weak function instead, and let ia64 redefine it.
      
      Link: http://lkml.kernel.org/r/20180724235520.10200-2-pasha.tatashin@oracle.com
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dfb3ccd0
    • Kirill Tkhai's avatar
      mm/memcontrol.c: convert mem_cgroup_id::ref to refcount_t type · 1c2d479a
      Kirill Tkhai authored
      
      
      This will allow to use generic refcount_t interfaces to check counters
      overflow instead of currently existing VM_BUG_ON().  The only difference
      after the patch is VM_BUG_ON() may cause BUG(), while refcount_t fires
      with WARN().  But this seems not to be significant here, since such the
      problems are usually caught by syzbot with panic-on-warn enabled.
      
      Link: http://lkml.kernel.org/r/153910718919.7006.13400779039257185427.stgit@localhost.localdomain
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1c2d479a
    • David Rientjes's avatar
      mm/page_alloc.c: initialize num_movable in move_freepages() · 4a222127
      David Rientjes authored
      
      
      If move_freepages_block() returns 0 because !zone_spans_pfn(),
      *num_movable can hold the value from the stack because it does not get
      initialized in move_freepages().
      
      Move the initialization to move_freepages_block() to guarantee the value
      actually makes sense.
      
      This currently doesn't affect its only caller where num_movable != NULL,
      so no bug fix, but just more robust.
      
      Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1810051355490.212229@chino.kir.corp.google.com
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Greg Thelen <gthelen@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4a222127
    • Gustavo A. R. Silva's avatar
      mm/zsmalloc.c: fix fall-through annotation · 61855f02
      Gustavo A. R. Silva authored
      
      
      Replace "fallthru" with a proper "fall through" annotation.
      
      This fix is part of the ongoing efforts to enabling
      -Wimplicit-fallthrough
      
      Link: http://lkml.kernel.org/r/20181003105114.GA24423@embeddedor.com
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Reviewed-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      61855f02
    • Peter Xu's avatar
      userfaultfd: selftest: recycle lock threads first · 7eaa8c96
      Peter Xu authored
      
      
      Now we recycle the uffd servicing threads earlier than the lock threads.
      It might happen that when the lock thread is still blocked at a pthread
      mutex lock while the servicing thread has already quitted for the cpu so
      the lock thread will be blocked forever and hang the test program.  To fix
      the possible race, recycle the lock threads first.
      
      This never happens with current missing-only tests, but when I start to
      run the write-protection tests (the feature is not yet posted upstream) it
      happens every time of the run possibly because in that new test we'll need
      to service two page faults for each lock operation.
      
      Link: http://lkml.kernel.org/r/20180930074259.18229-4-peterx@redhat.com
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Acked-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Zi Yan <zi.yan@cs.rutgers.edu>
      Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7eaa8c96
    • Peter Xu's avatar
      userfaultfd: selftest: generalize read and poll · 04d87731
      Peter Xu authored
      
      
      We do very similar things in read and poll modes, but we're copying the
      codes around.  Share the codes properly on reading the message and
      handling the page fault to make the code cleaner.  Meanwhile this solves
      previous mismatch of behaviors between the two modes on that the old code:
      
      - did not check EAGAIN case in read() mode
      - ignored BOUNCE_VERIFY check in read() mode
      
      Link: http://lkml.kernel.org/r/20180930074259.18229-3-peterx@redhat.com
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Acked-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Zi Yan <zi.yan@cs.rutgers.edu>
      Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      04d87731
    • Peter Xu's avatar
      userfaultfd: selftest: cleanup help messages · 439de0d7
      Peter Xu authored
      
      
      Firstly, the help in the comment region is obsolete, now we support
      three parameters.  Since at it, change it and move it into the help
      message of the program.
      
      Also, the help messages dumped here and there is obsolete too.  Use a
      single usage() helper.
      
      Link: http://lkml.kernel.org/r/20180930074259.18229-2-peterx@redhat.com
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Acked-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Zi Yan <zi.yan@cs.rutgers.edu>
      Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      439de0d7
    • Jann Horn's avatar
      mm/vmstat.c: assert that vmstat_text is in sync with stat_items_size · f0ecf25a
      Jann Horn authored
      
      
      Having two gigantic arrays that must manually be kept in sync, including
      ifdefs, isn't exactly robust.  To make it easier to catch such issues in
      the future, add a BUILD_BUG_ON().
      
      Link: http://lkml.kernel.org/r/20181001143138.95119-3-jannh@google.com
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Kemi Wang <kemi.wang@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f0ecf25a
    • Aneesh Kumar K.V's avatar
      mm/memory.c: recheck page table entry with page table lock held · ff09d7ec
      Aneesh Kumar K.V authored
      
      
      We clear the pte temporarily during read/modify/write update of the pte.
      If we take a page fault while the pte is cleared, the application can get
      SIGBUS.  One such case is with remap_pfn_range without a backing
      vm_ops->fault callback.  do_fault will return SIGBUS in that case.
      
      cpu 0		 				cpu1
      mprotect()
      ptep_modify_prot_start()/pte cleared.
      .
      .						page fault.
      .
      .
      prep_modify_prot_commit()
      
      Fix this by taking page table lock and rechecking for pte_none.
      
      [aneesh.kumar@linux.ibm.com: fix crash observed with syzkaller run]
        Link: http://lkml.kernel.org/r/87va6bwlfg.fsf@linux.ibm.com
      Link: http://lkml.kernel.org/r/20180926031858.9692-1-aneesh.kumar@linux.ibm.com
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Ido Schimmel <idosch@idosch.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ff09d7ec
    • Yang Shi's avatar
      mm: dax: add comment for PFN_SPECIAL · cc4b8c79
      Yang Shi authored
      
      
      The comment for PFN_SPECIAL is missed in pfn_t.h. Add comment to get
      consistent with other pfn flags.
      
      Link: http://lkml.kernel.org/r/1538086549-100536-1-git-send-email-yang.shi@linux.alibaba.com
      Signed-off-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
      Suggested-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cc4b8c79
    • Yang Shi's avatar
      mm: brk: downgrade mmap_sem to read when shrinking · 9bc8039e
      Yang Shi authored
      
      
      brk might be used to shrink memory mapping too other than munmap().  So,
      it may hold write mmap_sem for long time when shrinking large mapping, as
      what commit ("mm: mmap: zap pages with read mmap_sem in munmap")
      described.
      
      The brk() will not manipulate vmas anymore after __do_munmap() call for
      the mapping shrink use case.  But, it may set mm->brk after __do_munmap(),
      which needs hold write mmap_sem.
      
      However, a simple trick can workaround this by setting mm->brk before
      __do_munmap().  Then restore the original value if __do_munmap() fails.
      With this trick, it is safe to downgrade to read mmap_sem.
      
      So, the same optimization, which downgrades mmap_sem to read for zapping
      pages, is also feasible and reasonable to this case.
      
      The period of holding exclusive mmap_sem for shrinking large mapping would
      be reduced significantly with this optimization.
      
      [akpm@linux-foundation.org: tweak comment]
      [yang.shi@linux.alibaba.com: fix unsigned compare against 0 issue]
        Link: http://lkml.kernel.org/r/1538687672-17795-1-git-send-email-yang.shi@linux.alibaba.com
      Link: http://lkml.kernel.org/r/1538067582-60038-2-git-send-email-yang.shi@linux.alibaba.com
      Signed-off-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9bc8039e
    • Yang Shi's avatar
      mm: mremap: downgrade mmap_sem to read when shrinking · 85a06835
      Yang Shi authored
      
      
      Other than munmap, mremap might be used to shrink memory mapping too.
      So, it may hold write mmap_sem for long time when shrinking large
      mapping, as what commit ("mm: mmap: zap pages with read mmap_sem in
      munmap") described.
      
      The mremap() will not manipulate vmas anymore after __do_munmap() call for
      the mapping shrink use case, so it is safe to downgrade to read mmap_sem.
      
      So, the same optimization, which downgrades mmap_sem to read for zapping
      pages, is also feasible and reasonable to this case.
      
      The period of holding exclusive mmap_sem for shrinking large mapping
      would be reduced significantly with this optimization.
      
      MREMAP_FIXED and MREMAP_MAYMOVE are more complicated to adopt this
      optimization since they need manipulate vmas after do_munmap(),
      downgrading mmap_sem may create race window.
      
      Simple mapping shrink is the low hanging fruit, and it may cover the
      most cases of unmap with munmap together.
      
      [akpm@linux-foundation.org: tweak comment]
      [yang.shi@linux.alibaba.com: fix unsigned compare against 0 issue]
        Link: http://lkml.kernel.org/r/1538687672-17795-2-git-send-email-yang.shi@linux.alibaba.com
      Link: http://lkml.kernel.org/r/1538067582-60038-1-git-send-email-yang.shi@linux.alibaba.com
      Signed-off-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      85a06835
    • Souptick Joarder's avatar
      mm/filemap.c: use vmf_error() · 3c051324
      Souptick Joarder authored
      
      
      These codes can be replaced with new inline vmf_error().
      
      Link: http://lkml.kernel.org/r/20180927171411.GA23331@jordon-HP-15-Notebook-PC
      Signed-off-by: default avatarSouptick Joarder <jrdr.linux@gmail.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3c051324
    • Alexandre Ghiti's avatar
      hugetlb: introduce generic version of huge_ptep_get · 544db759
      Alexandre Ghiti authored
      
      
      ia64, mips, parisc, powerpc, sh, sparc, x86 architectures use the same
      version of huge_ptep_get, so move this generic implementation into
      asm-generic/hugetlb.h.
      
      [arnd@arndb.de: fix ARM 3level page tables]
        Link: http://lkml.kernel.org/r/20181005161722.904274-1-arnd@arndb.de
      Link: http://lkml.kernel.org/r/20180920060358.16606-12-alex@ghiti.fr
      Signed-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Reviewed-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Tested-by: Helge Deller <deller@gmx.de>			[parisc]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Paul Burton <paul.burton@mips.com>		[MIPS]
      Acked-by: Ingo Molnar <mingo@kernel.org>		[x86]
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      544db759
    • Alexandre Ghiti's avatar
      hugetlb: introduce generic version of huge_ptep_set_access_flags() · facf6d5b
      Alexandre Ghiti authored
      
      
      arm, ia64, sh, x86 architectures use the same version
      of huge_ptep_set_access_flags, so move this generic implementation
      into asm-generic/hugetlb.h.
      
      Link: http://lkml.kernel.org/r/20180920060358.16606-11-alex@ghiti.fr
      Signed-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Reviewed-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Tested-by: Helge Deller <deller@gmx.de>			[parisc]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Paul Burton <paul.burton@mips.com>		[MIPS]
      Acked-by: Ingo Molnar <mingo@kernel.org>		[x86]
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      facf6d5b
    • Alexandre Ghiti's avatar
      hugetlb: introduce generic version of huge_ptep_set_wrprotect() · 8e581d43
      Alexandre Ghiti authored
      
      
      arm, ia64, mips, powerpc, sh, x86 architectures use the same version of
      huge_ptep_set_wrprotect, so move this generic implementation into
      asm-generic/hugetlb.h.
      
      Link: http://lkml.kernel.org/r/20180920060358.16606-10-alex@ghiti.fr
      Signed-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Reviewed-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Tested-by: Helge Deller <deller@gmx.de>			[parisc]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Paul Burton <paul.burton@mips.com>		[MIPS]
      Acked-by: Ingo Molnar <mingo@kernel.org>		[x86]
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8e581d43
    • Alexandre Ghiti's avatar
      hugetlb: introduce generic version of prepare_hugepage_range · 78d6e4e8
      Alexandre Ghiti authored
      
      
      arm, arm64, powerpc, sparc, x86 architectures use the same version of
      prepare_hugepage_range, so move this generic implementation into
      asm-generic/hugetlb.h.
      
      Link: http://lkml.kernel.org/r/20180920060358.16606-9-alex@ghiti.fr
      Signed-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Reviewed-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Tested-by: Helge Deller <deller@gmx.de>			[parisc]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Paul Burton <paul.burton@mips.com>		[MIPS]
      Acked-by: Ingo Molnar <mingo@kernel.org>		[x86]
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      78d6e4e8
    • Alexandre Ghiti's avatar
      hugetlb: introduce generic version of huge_pte_wrprotect · c4916a00
      Alexandre Ghiti authored
      
      
      arm, arm64, ia64, mips, parisc, powerpc, sh, sparc, x86 architectures use
      the same version of huge_pte_wrprotect, so move this generic
      implementation into asm-generic/hugetlb.h.
      
      Link: http://lkml.kernel.org/r/20180920060358.16606-8-alex@ghiti.fr
      Signed-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Reviewed-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Tested-by: Helge Deller <deller@gmx.de>			[parisc]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Paul Burton <paul.burton@mips.com>		[MIPS]
      Acked-by: Ingo Molnar <mingo@kernel.org>		[x86]
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c4916a00
    • Alexandre Ghiti's avatar
      hugetlb: introduce generic version of huge_pte_none() · cae72abc
      Alexandre Ghiti authored
      
      
      arm, arm64, ia64, mips, parisc, powerpc, sh, sparc, x86 architectures use
      the same version of huge_pte_none, so move this generic implementation
      into asm-generic/hugetlb.h.
      
      Link: http://lkml.kernel.org/r/20180920060358.16606-7-alex@ghiti.fr
      Signed-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Reviewed-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Tested-by: Helge Deller <deller@gmx.de>			[parisc]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Paul Burton <paul.burton@mips.com>		[MIPS]
      Acked-by: Ingo Molnar <mingo@kernel.org>		[x86]
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cae72abc
    • Alexandre Ghiti's avatar
      hugetlb: introduce generic version of huge_ptep_clear_flush · fe632225
      Alexandre Ghiti authored
      
      
      arm, x86 architectures use the same version of huge_ptep_clear_flush, so
      move this generic implementation into asm-generic/hugetlb.h.
      
      Link: http://lkml.kernel.org/r/20180920060358.16606-6-alex@ghiti.fr
      Signed-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Reviewed-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Tested-by: Helge Deller <deller@gmx.de>			[parisc]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Paul Burton <paul.burton@mips.com>		[MIPS]
      Acked-by: Ingo Molnar <mingo@kernel.org>		[x86]
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fe632225
    • Alexandre Ghiti's avatar
      hugetlb: introduce generic version of huge_ptep_get_and_clear() · a4d83853
      Alexandre Ghiti authored
      
      
      arm, ia64, sh, x86 architectures use the same version of
      huge_ptep_get_and_clear, so move this generic implementation into
      asm-generic/hugetlb.h.
      
      Link: http://lkml.kernel.org/r/20180920060358.16606-5-alex@ghiti.fr
      Signed-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Reviewed-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Tested-by: Helge Deller <deller@gmx.de>			[parisc]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Paul Burton <paul.burton@mips.com>		[MIPS]
      Acked-by: Ingo Molnar <mingo@kernel.org>		[x86]
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a4d83853
    • Alexandre Ghiti's avatar
      hugetlb: introduce generic version of set_huge_pte_at() · cea685d5
      Alexandre Ghiti authored
      
      
      arm, ia64, mips, powerpc, sh, x86 architectures use the same version of
      set_huge_pte_at, so move this generic implementation into
      asm-generic/hugetlb.h.
      
      Link: http://lkml.kernel.org/r/20180920060358.16606-4-alex@ghiti.fr
      Signed-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Reviewed-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Tested-by: Helge Deller <deller@gmx.de>			[parisc]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Paul Burton <paul.burton@mips.com>		[MIPS]
      Acked-by: Ingo Molnar <mingo@kernel.org>		[x86]
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cea685d5
    • Alexandre Ghiti's avatar
      hugetlb: introduce generic version of hugetlb_free_pgd_range · 1e5f50fc
      Alexandre Ghiti authored
      
      
      arm, arm64, mips, parisc, sh, x86 architectures use the same version of
      hugetlb_free_pgd_range, so move this generic implementation into
      asm-generic/hugetlb.h.
      
      Link: http://lkml.kernel.org/r/20180920060358.16606-3-alex@ghiti.fr
      Signed-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Reviewed-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Tested-by: Helge Deller <deller@gmx.de>			[parisc]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Paul Burton <paul.burton@mips.com>		[MIPS]
      Acked-by: Ingo Molnar <mingo@kernel.org>		[x86]
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1e5f50fc
    • Alexandre Ghiti's avatar
      hugetlb: harmonize hugetlb.h arch specific defines with pgtable.h · d018498c
      Alexandre Ghiti authored
      
      
      In order to reduce copy/paste of functions across architectures and then
      make riscv hugetlb port (and future ports) simpler and smaller, this
      patchset intends to factorize the numerous hugetlb primitives that are
      defined across all the architectures.
      
      Except for prepare_hugepage_range, this patchset moves the versions that
      are just pass-through to standard pte primitives into
      asm-generic/hugetlb.h by using the same #ifdef semantic that can be found
      in asm-generic/pgtable.h, i.e.  __HAVE_ARCH_***.
      
      s390 architecture has not been tackled in this serie since it does not use
      asm-generic/hugetlb.h at all.
      
      This patchset has been compiled on all addressed architectures with
      success (except for parisc, but the problem does not come from this
      series).
      
      This patch (of 11):
      
      asm-generic/hugetlb.h proposes generic implementations of hugetlb related
      functions: use __HAVE_ARCH_HUGE* defines in order to make arch specific
      implementations of hugetlb functions consistent with pgtable.h scheme.
      
      Link: http://lkml.kernel.org/r/20180920060358.16606-2-alex@ghiti.fr
      Signed-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Reviewed-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>		[x86]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d018498c
    • Wei Yang's avatar
      mm: remove unnecessary local variable addr in __get_user_pages_fast() · d4faa402
      Wei Yang authored
      
      
      The local variable `addr' in __get_user_pages_fast() is just a shadow of
      `start'.  Since `start' never changes after assignment to `addr', it is
      fine to replace `start' with it.
      
      Also the meaning of [start, end] is more obvious than [addr, end] when
      passed to gup_pgd_range().
      
      Link: http://lkml.kernel.org/r/20180925021448.20265-1-richard.weiyang@gmail.com
      Signed-off-by: default avatarWei Yang <richard.weiyang@gmail.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d4faa402
    • Alexander Duyck's avatar
      mm: defer ZONE_DEVICE page initialization to the point where we init pgmap · 966cf44f
      Alexander Duyck authored
      
      
      The ZONE_DEVICE pages were being initialized in two locations.  One was
      with the memory_hotplug lock held and another was outside of that lock.
      The problem with this is that it was nearly doubling the memory
      initialization time.  Instead of doing this twice, once while holding a
      global lock and once without, I am opting to defer the initialization to
      the one outside of the lock.  This allows us to avoid serializing the
      overhead for memory init and we can instead focus on per-node init times.
      
      One issue I encountered is that devm_memremap_pages and
      hmm_devmmem_pages_create were initializing only the pgmap field the same
      way.  One wasn't initializing hmm_data, and the other was initializing it
      to a poison value.  Since this is something that is exposed to the driver
      in the case of hmm I am opting for a third option and just initializing
      hmm_data to 0 since this is going to be exposed to unknown third party
      drivers.
      
      [alexander.h.duyck@linux.intel.com: fix reference count for pgmap in devm_memremap_pages]
        Link: http://lkml.kernel.org/r/20181008233404.1909.37302.stgit@localhost.localdomain
      Link: http://lkml.kernel.org/r/20180925202053.3576.66039.stgit@localhost.localdomain
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Tested-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      966cf44f
    • Alexander Duyck's avatar
      mm: create non-atomic version of SetPageReserved for init use · d483da5b
      Alexander Duyck authored
      It doesn't make much sense to use the atomic SetPageReserved at init time
      when we are using memset to clear the memory and manipulating the page
      flags via simple "&=" and "|=" operations in __init_single_page.
      
      This patch adds a non-atomic version __SetPageReserved that can be used
      during page init and shows about a 10% improvement in initialization times
      on the systems I have available for testing.  On those systems I saw
      initialization times drop from around 35 seconds to around 32 seconds to
      initialize a 3TB block of persistent memory.  I believe the main advantage
      of this is that it allows for more compiler optimization as the __set_bit
      operation can be reordered whereas the atomic version cannot.
      
      I tried adding a bit of documentation based on f1dd2cd1
      
       ("mm,
      memory_hotplug: do not associate hotadded memory to zones until online").
      
      Ideally the reserved flag should be set earlier since there is a brief
      window where the page is initialization via __init_single_page and we have
      not set the PG_Reserved flag.  I'm leaving that for a future patch set as
      that will require a more significant refactor.
      
      Link: http://lkml.kernel.org/r/20180925202018.3576.11607.stgit@localhost.localdomain
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d483da5b
    • Alexander Duyck's avatar
      mm: provide kernel parameter to allow disabling page init poisoning · f682a97a
      Alexander Duyck authored
      
      
      Patch series "Address issues slowing persistent memory initialization", v5.
      
      The main thing this patch set achieves is that it allows us to initialize
      each node worth of persistent memory independently.  As a result we reduce
      page init time by about 2 minutes because instead of taking 30 to 40
      seconds per node and going through each node one at a time, we process all
      4 nodes in parallel in the case of a 12TB persistent memory setup spread
      evenly over 4 nodes.
      
      This patch (of 3):
      
      On systems with a large amount of memory it can take a significant amount
      of time to initialize all of the page structs with the PAGE_POISON_PATTERN
      value.  I have seen it take over 2 minutes to initialize a system with
      over 12TB of RAM.
      
      In order to work around the issue I had to disable CONFIG_DEBUG_VM and
      then the boot time returned to something much more reasonable as the
      arch_add_memory call completed in milliseconds versus seconds.  However in
      doing that I had to disable all of the other VM debugging on the system.
      
      In order to work around a kernel that might have CONFIG_DEBUG_VM enabled
      on a system that has a large amount of memory I have added a new kernel
      parameter named "vm_debug" that can be set to "-" in order to disable it.
      
      Link: http://lkml.kernel.org/r/20180925201921.3576.84239.stgit@localhost.localdomain
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f682a97a
    • Shakeel Butt's avatar
      memcg: remove memcg_kmem_skip_account · 85cfb245
      Shakeel Butt authored
      
      
      The flag memcg_kmem_skip_account was added during the era of opt-out kmem
      accounting.  There is no need for such flag in the opt-in world as there
      aren't any __GFP_ACCOUNT allocations within memcg_create_cache_enqueue().
      
      Link: http://lkml.kernel.org/r/20180919004501.178023-1-shakeelb@google.com
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Greg Thelen <gthelen@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      85cfb245
    • Oscar Salvador's avatar
      mm/memory_hotplug.c: clean up node_states_check_changes_offline() · 86b27bea
      Oscar Salvador authored
      
      
      This patch, as the previous one, gets rid of the wrong if statements.
      While at it, I realized that the comments are sometimes very confusing,
      to say the least, and wrong.
      For example:
      
      ___
      zone_last = ZONE_MOVABLE;
      
      /*
       * check whether node_states[N_HIGH_MEMORY] will be changed
       * If we try to offline the last present @nr_pages from the node,
       * we can determind we will need to clear the node from
       * node_states[N_HIGH_MEMORY].
       */
      
      for (; zt <= zone_last; zt++)
              present_pages += pgdat->node_zones[zt].present_pages;
      if (nr_pages >= present_pages)
              arg->status_change_nid = zone_to_nid(zone);
      else
              arg->status_change_nid = -1;
      ___
      
      In case the node gets empry, it must be removed from N_MEMORY.  We already
      check N_HIGH_MEMORY a bit above within the CONFIG_HIGHMEM ifdef code.  Not
      to say that status_change_nid is for N_MEMORY, and not for N_HIGH_MEMORY.
      
      So I re-wrote some of the comments to what I think is better.
      
      [osalvador@suse.de: address feedback from Pavel]
        Link: http://lkml.kernel.org/r/20180921132634.10103-5-osalvador@techadventures.net
      Link: http://lkml.kernel.org/r/20180919100819.25518-6-osalvador@techadventures.net
      Signed-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: <yasu.isimatu@gmail.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      86b27bea
    • Oscar Salvador's avatar
      mm/memory_hotplug.c: simplify node_states_check_changes_online · 8efe33f4
      Oscar Salvador authored
      
      
      While looking at node_states_check_changes_online, I stumbled upon some
      confusing things.
      
      Right after entering the function, we find this:
      
      if (N_MEMORY == N_NORMAL_MEMORY)
              zone_last = ZONE_MOVABLE;
      
      This is wrong.
      N_MEMORY cannot really be equal to N_NORMAL_MEMORY.
      My guess is that this wanted to be something like:
      
      if (N_NORMAL_MEMORY == N_HIGH_MEMORY)
      
      to check if we have CONFIG_HIGHMEM.
      
      Later on, in the CONFIG_HIGHMEM block, we have:
      
      if (N_MEMORY == N_HIGH_MEMORY)
              zone_last = ZONE_MOVABLE;
      
      Again, this is wrong, and will never be evaluated to true.
      
      Besides removing these wrong if statements, I simplified the function a
      bit.
      
      [osalvador@suse.de: address feedback from Pavel]
        Link: http://lkml.kernel.org/r/20180921132634.10103-4-osalvador@techadventures.net
      Link: http://lkml.kernel.org/r/20180919100819.25518-5-osalvador@techadventures.net
      Signed-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: <yasu.isimatu@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8efe33f4
    • Oscar Salvador's avatar
      mm/memory_hotplug.c: tidy up node_states_clear_node() · cf01f6f5
      Oscar Salvador authored
      
      
      node_states_clear has the following if statements:
      
      if ((N_MEMORY != N_NORMAL_MEMORY) &&
          (arg->status_change_nid_high >= 0))
              ...
      
      if ((N_MEMORY != N_HIGH_MEMORY) &&
          (arg->status_change_nid >= 0))
              ...
      
      N_MEMORY can never be equal to neither N_NORMAL_MEMORY nor
      N_HIGH_MEMORY.
      
      Similar problem was found in [1].
      Since this is wrong, let us get rid of it.
      
      [1] https://patchwork.kernel.org/patch/10579155/
      
      Link: http://lkml.kernel.org/r/20180919100819.25518-4-osalvador@techadventures.net
      Signed-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: <yasu.isimatu@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cf01f6f5
    • Oscar Salvador's avatar
      mm/memory_hotplug.c: spare unnecessary calls to node_set_state · 83d83612
      Oscar Salvador authored
      
      
      In node_states_check_changes_online, we check if the node will have to be
      set for any of the N_*_MEMORY states after the pages have been onlined.
      
      Later on, we perform the activation in node_states_set_node.  Currently,
      in node_states_set_node we set the node to N_MEMORY unconditionally.
      
      This means that we call node_set_state for N_MEMORY every time pages go
      online, but we only need to do it if the node has not yet been set for
      N_MEMORY.
      
      Fix this by checking status_change_nid.
      
      Link: http://lkml.kernel.org/r/20180919100819.25518-2-osalvador@techadventures.net
      Signed-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: <yasu.isimatu@gmail.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      83d83612
    • haiqing.shq's avatar
      mm/filemap.c: Use existing variable · 3cb7b121
      haiqing.shq authored
      
      
      Use the variable write_len instead of ov_iter_count(from).
      
      Link: http://lkml.kernel.org/r/1537375855-2088-1-git-send-email-leviathan0992@gmail.com
      Signed-off-by: default avatarhaiqing.shq <leviathan0992@gmail.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3cb7b121
    • Yang Shi's avatar
      mm: unmap VM_PFNMAP mappings with optimized path · cb492249
      Yang Shi authored
      
      
      When unmapping VM_PFNMAP mappings, vm flags need to be updated.  Since the
      vmas have been detached, so it sounds safe to update vm flags with read
      mmap_sem.
      
      Link: http://lkml.kernel.org/r/1537376621-51150-4-git-send-email-yang.shi@linux.alibaba.com
      Signed-off-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
      Reviewed-by: default avatarMatthew Wilcox <willy@infradead.org>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cb492249
    • Yang Shi's avatar
      mm: unmap VM_HUGETLB mappings with optimized path · b4cefb36
      Yang Shi authored
      
      
      When unmapping VM_HUGETLB mappings, vm flags need to be updated.  Since
      the vmas have been detached, so it sounds safe to update vm flags with
      read mmap_sem.
      
      Link: http://lkml.kernel.org/r/1537376621-51150-3-git-send-email-yang.shi@linux.alibaba.com
      Signed-off-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
      Reviewed-by: default avatarMatthew Wilcox <willy@infradead.org>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b4cefb36