Skip to content
  1. May 15, 2019
    • Vitaly Wool's avatar
      mm/z3fold.c: support page migration · 1f862989
      Vitaly Wool authored
      Now that we are not using page address in handles directly, we can make
      z3fold pages movable to decrease the memory fragmentation z3fold may
      create over time.
      
      This patch starts advertising non-headless z3fold pages as movable and
      uses the existing kernel infrastructure to implement moving of such pages
      per memory management subsystem's request.  It thus implements 3 required
      callbacks for page migration:
      
      * isolation callback: z3fold_page_isolate(): try to isolate the page by
        removing it from all lists.  Pages scheduled for some activity and
        mapped pages will not be isolated.  Return true if isolation was
        successful or false otherwise
      
      * migration callback: z3fold_page_migrate(): re-check critical
        conditions and migrate page contents to the new page provided by the
        memory subsystem.  Returns 0 on success or negative error code otherwise
      
      * putback callback: z3fold_page_putback(): put back the page if
        z3fold_page_migrate() for it failed permanently (i.  e.  not with
        -EAGAIN code).
      
      [lkp@intel.com: z3fold_page_isolate() can be static]
        Link: http://lkml.kernel.org/r/20190419130924.GA161478@ivb42
      Link: http://lkml.kernel.org/r/20190417103922.31253da5c366c4ebe0419cfc@gmail.com
      
      
      Signed-off-by: default avatarVitaly Wool <vitaly.vul@sony.com>
      Signed-off-by: default avatarkbuild test robot <lkp@intel.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com>
      Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
      Cc: Uladzislau Rezki <urezki@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1f862989
    • Vitaly Wool's avatar
      mm/z3fold.c: add structure for buddy handles · 7c2b8baa
      Vitaly Wool authored
      For z3fold to be able to move its pages per request of the memory
      subsystem, it should not use direct object addresses in handles.  Instead,
      it will create abstract handles (3 per page) which will contain pointers
      to z3fold objects.  Thus, it will be possible to change these pointers
      when z3fold page is moved.
      
      Link: http://lkml.kernel.org/r/20190417103826.484eaf18c1294d682769880f@gmail.com
      
      
      Signed-off-by: default avatarVitaly Wool <vitaly.vul@sony.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com>
      Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
      Cc: Uladzislau Rezki <urezki@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7c2b8baa
    • Vitaly Wool's avatar
      mm/z3fold.c: improve compression by extending search · 351618b2
      Vitaly Wool authored
      The current z3fold implementation only searches this CPU's page lists for
      a fitting page to put a new object into.  This patch adds quick search for
      very well fitting pages (i.  e.  those having exactly the required number
      of free space) on other CPUs too, before allocating a new page for that
      object.
      
      Link: http://lkml.kernel.org/r/20190417103733.72ae81abe1552397c95a008e@gmail.com
      
      
      Signed-off-by: default avatarVitaly Wool <vitaly.vul@sony.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com>
      Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
      Cc: Uladzislau Rezki <urezki@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      351618b2
    • Vitaly Wool's avatar
      mm/z3fold.c: introduce helper functions · 9050cce1
      Vitaly Wool authored
      Patch series "z3fold: support page migration", v2.
      
      This patchset implements page migration support and slightly better buddy
      search.  To implement page migration support, z3fold has to move away from
      the current scheme of handle encoding.  i.  e.  stop encoding page address
      in handles.  Instead, a small per-page structure is created which will
      contain actual addresses for z3fold objects, while pointers to fields of
      that structure will be used as handles.
      
      Thus, it will be possible to change the underlying addresses to reflect
      page migration.
      
      To support migration itself, 3 callbacks will be implemented:
      
      1: isolation callback: z3fold_page_isolate(): try to isolate the page
         by removing it from all lists.  Pages scheduled for some activity and
         mapped pages will not be isolated.  Return true if isolation was
         successful or false otherwise
      
      2: migration callback: z3fold_page_migrate(): re-check critical
         conditions and migrate page contents to the new page provided by the
         system.  Returns 0 on success or negative error code otherwise
      
      3: putback callback: z3fold_page_putback(): put back the page if
         z3fold_page_migrate() for it failed permanently (i.  e.  not with
         -EAGAIN code).
      
      To make sure an isolated page doesn't get freed, its kref is incremented
      in z3fold_page_isolate() and decremented during post-migration compaction,
      if migration was successful, or by z3fold_page_putback() in the other
      case.
      
      Since the new handle encoding scheme implies slight memory consumption
      increase, better buddy search (which decreases memory consumption) is
      included in this patchset.
      
      This patch (of 4):
      
      Introduce a separate helper function for object allocation, as well as 2
      smaller helpers to add a buddy to the list and to get a pointer to the
      pool from the z3fold header.  No functional changes here.
      
      Link: http://lkml.kernel.org/r/20190417103633.a4bb770b5bf0fb7e43ce1666@gmail.com
      
      
      Signed-off-by: default avatarVitaly Wool <vitaly.vul@sony.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com>
      Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
      Cc: Uladzislau Rezki <urezki@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9050cce1
    • Yafang Shao's avatar
      mm/page_alloc.c: remove unnecessary parameter in rmqueue_pcplist · 1c52e6d0
      Yafang Shao authored
      Because rmqueue_pcplist() is only called when order is 0, we don't need to
      use order as a parameter.
      
      Link: http://lkml.kernel.org/r/1555591709-11744-1-git-send-email-laoar.shao@gmail.com
      
      
      Signed-off-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarPankaj Gupta <pagupta@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1c52e6d0
    • Jérôme Glisse's avatar
      mm/hmm: add ARCH_HAS_HMM_MIRROR ARCH_HAS_HMM_DEVICE Kconfig · 2c8fc3dc
      Jérôme Glisse authored
      Add 2 new Kconfig variables that are not used by anyone.  I check that
      various make ARCH=somearch allmodconfig do work and do not complain.  This
      new Kconfig needs to be added first so that device drivers that depend on
      HMM can be updated.
      
      Once drivers are updated then I can update the HMM Kconfig to depend on
      this new Kconfig in a followup patch.
      
      This is about solving Kconfig for HMM given that device driver are
      going through their own tree we want to avoid changing them from the mm
      tree.  So plan is:
      
      1 - Kernel release N add the new Kconfig to mm/Kconfig (this patch)
      2 - Kernel release N+1 update driver to depend on new Kconfig ie
          stop using ARCH_HASH_HMM and start using ARCH_HAS_HMM_MIRROR
          and ARCH_HAS_HMM_DEVICE (one or the other or both depending
          on the driver)
      3 - Kernel release N+2 remove ARCH_HASH_HMM and do final Kconfig
          update in mm/Kconfig
      
      Link: http://lkml.kernel.org/r/20190417211141.17580-1-jglisse@redhat.com
      
      
      Signed-off-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Leon Romanovsky <leonro@mellanox.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2c8fc3dc
    • Kirill Tkhai's avatar
      mm/vmscan.c: simplify shrink_inactive_list() · f46b7912
      Kirill Tkhai authored
      This merges together duplicated patterns of code.  Also, replace
      count_memcg_events() with its irq-careless namesake, because they are
      already called in interrupts disabled context.
      
      Link: http://lkml.kernel.org/r/2ece1df4-2989-bc9b-6172-61e9fdde5bfd@virtuozzo.com
      
      
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarDaniel Jordan <daniel.m.jordan@oracle.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f46b7912
    • Amir Goldstein's avatar
      fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback · c553ea4f
      Amir Goldstein authored
      23d01270 ("fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE
      writeback") claims that sync_file_range(2) syscall was "created for
      userspace to be able to issue background writeout and so waiting for
      in-flight IO is undesirable there" and changes the writeback (back) to
      WB_SYNC_NONE.
      
      This claim is only partially true.  It is true for users that use the flag
      SYNC_FILE_RANGE_WRITE by itself, as does PostgreSQL, the user that was the
      reason for changing to WB_SYNC_NONE writeback.
      
      However, that claim is not true for users that use that flag combination
      SYNC_FILE_RANGE_{WAIT_BEFORE|WRITE|_WAIT_AFTER}.  Those users explicitly
      requested to wait for in-flight IO as well as to writeback of dirty pages.
      
      Re-brand that flag combination as SYNC_FILE_RANGE_WRITE_AND_WAIT and use
      WB_SYNC_ALL writeback to perform the full range sync request.
      
      Link: http://lkml.kernel.org/r/20190409114922.30095-1-amir73il@gmail.com
      Link: http://lkml.kernel.org/r/20190419072938.31320-1-amir73il@gmail.com
      Fixes: 23d01270
      
       ("fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE")
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Acked-by: default avatarJan Kara <jack@suse.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c553ea4f
    • Souptick Joarder's avatar
      xen/privcmd-buf.c: convert to use vm_map_pages_zero() · 53269057
      Souptick Joarder authored
      Convert to use vm_map_pages_zero() to map range of kernel memory to user
      vma.
      
      This driver has ignored vm_pgoff.  We could later "fix" these drivers to
      behave according to the normal vm_pgoff offsetting simply by removing the
      _zero suffix on the function name and if that causes regressions, it gives
      us an easy way to revert.
      
      Link: http://lkml.kernel.org/r/acf678e81d554d01a9b590716ac0ccbdcdf71c25.1552921225.git.jrdr.linux@gmail.com
      
      
      Signed-off-by: default avatarSouptick Joarder <jrdr.linux@gmail.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Heiko Stuebner <heiko@sntech.de>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
      Cc: Pawel Osciak <pawel@osciak.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sandy Huang <hjc@rock-chips.com>
      Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Thierry Reding <treding@nvidia.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      53269057
    • Souptick Joarder's avatar
      xen/gntdev.c: convert to use vm_map_pages() · df9bde01
      Souptick Joarder authored
      Convert to use vm_map_pages() to map range of kernel memory to user vma.
      
      map->count is passed to vm_map_pages() and internal API verify map->count
      against count ( count = vma_pages(vma)) for page array boundary overrun
      condition.
      
      Link: http://lkml.kernel.org/r/88e56e82d2db98705c2d842e9c9806c00b366d67.1552921225.git.jrdr.linux@gmail.com
      
      
      Signed-off-by: default avatarSouptick Joarder <jrdr.linux@gmail.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Heiko Stuebner <heiko@sntech.de>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
      Cc: Pawel Osciak <pawel@osciak.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sandy Huang <hjc@rock-chips.com>
      Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Thierry Reding <treding@nvidia.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      df9bde01
    • Souptick Joarder's avatar
      videobuf2/videobuf2-dma-sg.c: convert to use vm_map_pages() · a17ae147
      Souptick Joarder authored
      Convert to use vm_map_pages() to map range of kernel memory to user vma.
      
      vm_pgoff is treated in V4L2 API as a 'cookie' to select a buffer, not as a
      in-buffer offset by design and it always want to mmap a whole buffer from
      its beginning.
      
      Link: http://lkml.kernel.org/r/a953fe6b3056de1cc6eab654effdd4a22f125375.1552921225.git.jrdr.linux@gmail.com
      
      
      Signed-off-by: default avatarSouptick Joarder <jrdr.linux@gmail.com>
      Suggested-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Reviewed-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Heiko Stuebner <heiko@sntech.de>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
      Cc: Pawel Osciak <pawel@osciak.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sandy Huang <hjc@rock-chips.com>
      Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Thierry Reding <treding@nvidia.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a17ae147
    • Souptick Joarder's avatar
      iommu/dma-iommu.c: convert to use vm_map_pages() · b0d0084f
      Souptick Joarder authored
      Convert to use vm_map_pages() to map range of kernel memory to user vma.
      
      Link: http://lkml.kernel.org/r/80c3d220fc6ada73a88ce43ca049afb55a889258.1552921225.git.jrdr.linux@gmail.com
      
      
      Signed-off-by: default avatarSouptick Joarder <jrdr.linux@gmail.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Heiko Stuebner <heiko@sntech.de>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
      Cc: Pawel Osciak <pawel@osciak.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sandy Huang <hjc@rock-chips.com>
      Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Thierry Reding <treding@nvidia.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b0d0084f
    • Souptick Joarder's avatar
      drm/xen/xen_drm_front_gem.c: convert to use vm_map_pages() · e60b72b1
      Souptick Joarder authored
      Convert to use vm_map_pages() to map range of kernel memory to user vma.
      
      Link: http://lkml.kernel.org/r/ff8e10ba778d79419c66ee8215bccf01560540fd.1552921225.git.jrdr.linux@gmail.com
      
      
      Signed-off-by: default avatarSouptick Joarder <jrdr.linux@gmail.com>
      Reviewed-by: default avatarOleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Heiko Stuebner <heiko@sntech.de>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Pawel Osciak <pawel@osciak.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sandy Huang <hjc@rock-chips.com>
      Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Thierry Reding <treding@nvidia.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e60b72b1
    • Souptick Joarder's avatar
      drm/rockchip/rockchip_drm_gem.c: convert to use vm_map_pages() · 2f69b3c8
      Souptick Joarder authored
      Convert to use vm_map_pages() to map range of kernel memory to user vma.
      
      Tested on Rockchip hardware and display is working, including talking to
      Lima via prime.
      
      Link: http://lkml.kernel.org/r/7ba359eb1aceac388d05983c1f29b915bdf291f9.1552921225.git.jrdr.linux@gmail.com
      
      
      Signed-off-by: default avatarSouptick Joarder <jrdr.linux@gmail.com>
      Tested-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
      Cc: Pawel Osciak <pawel@osciak.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sandy Huang <hjc@rock-chips.com>
      Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Thierry Reding <treding@nvidia.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2f69b3c8
    • Souptick Joarder's avatar
      drivers/firewire/core-iso.c: convert to use vm_map_pages_zero() · 22660db8
      Souptick Joarder authored
      Convert to use vm_map_pages_zero() to map range of kernel memory to user
      vma.
      
      This driver has ignored vm_pgoff and mapped the entire pages.  We could
      later "fix" these drivers to behave according to the normal vm_pgoff
      offsetting simply by removing the _zero suffix on the function name and if
      that causes regressions, it gives us an easy way to revert.
      
      Link: http://lkml.kernel.org/r/88645f5ea8202784a8baaf389e592aeb8c505e8e.1552921225.git.jrdr.linux@gmail.com
      
      
      Signed-off-by: default avatarSouptick Joarder <jrdr.linux@gmail.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Heiko Stuebner <heiko@sntech.de>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
      Cc: Pawel Osciak <pawel@osciak.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sandy Huang <hjc@rock-chips.com>
      Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Thierry Reding <treding@nvidia.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      22660db8
    • Souptick Joarder's avatar
      arm: mm: dma-mapping: convert to use vm_map_pages() · 6248461d
      Souptick Joarder authored
      Convert to use vm_map_pages() to map range of kernel memory to user vma.
      
      Link: http://lkml.kernel.org/r/936e5e107c746a7310e3a3c471188ca3ac8f9754.1552921225.git.jrdr.linux@gmail.com
      
      
      Signed-off-by: default avatarSouptick Joarder <jrdr.linux@gmail.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Heiko Stuebner <heiko@sntech.de>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
      Cc: Pawel Osciak <pawel@osciak.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sandy Huang <hjc@rock-chips.com>
      Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Thierry Reding <treding@nvidia.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6248461d
    • Souptick Joarder's avatar
      mm: introduce new vm_map_pages() and vm_map_pages_zero() API · a667d745
      Souptick Joarder authored
      Patch series "mm: Use vm_map_pages() and vm_map_pages_zero() API", v5.
      
      This patch (of 5):
      
      Previouly drivers have their own way of mapping range of kernel
      pages/memory into user vma and this was done by invoking vm_insert_page()
      within a loop.
      
      As this pattern is common across different drivers, it can be generalized
      by creating new functions and using them across the drivers.
      
      vm_map_pages() is the API which can be used to map kernel memory/pages in
      drivers which have considered vm_pgoff
      
      vm_map_pages_zero() is the API which can be used to map a range of kernel
      memory/pages in drivers which have not considered vm_pgoff.  vm_pgoff is
      passed as default 0 for those drivers.
      
      We _could_ then at a later "fix" these drivers which are using
      vm_map_pages_zero() to behave according to the normal vm_pgoff offsetting
      simply by removing the _zero suffix on the function name and if that
      causes regressions, it gives us an easy way to revert.
      
      Tested on Rockchip hardware and display is working, including talking to
      Lima via prime.
      
      Link: http://lkml.kernel.org/r/751cb8a0f4c3e67e95c58a3b072937617f338eea.1552921225.git.jrdr.linux@gmail.com
      
      
      Signed-off-by: default avatarSouptick Joarder <jrdr.linux@gmail.com>
      Suggested-by: default avatarRussell King <linux@armlinux.org.uk>
      Suggested-by: default avatarMatthew Wilcox <willy@infradead.org>
      Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Tested-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Thierry Reding <treding@nvidia.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
      Cc: Sandy Huang <hjc@rock-chips.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Pawel Osciak <pawel@osciak.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a667d745
    • Bartlomiej Zolnierkiewicz's avatar
      mm: remove redundant 'default n' from Kconfig-s · 62afcd1c
      Bartlomiej Zolnierkiewicz authored
      'default n' is the default value for any bool or tristate Kconfig
      setting so there is no need to write it explicitly.
      
      Also since commit f467c564 ("kconfig: only write '# CONFIG_FOO
      is not set' for visible symbols") the Kconfig behavior is the same
      regardless of 'default n' being present or not:
      
          ...
          One side effect of (and the main motivation for) this change is making
          the following two definitions behave exactly the same:
      
              config FOO
                      bool
      
              config FOO
                      bool
                      default n
      
          With this change, neither of these will generate a
          '# CONFIG_FOO is not set' line (assuming FOO isn't selected/implied).
          That might make it clearer to people that a bare 'default n' is
          redundant.
          ...
      
      Link: http://lkml.kernel.org/r/c3385916-e4d4-37d3-b330-e6b7dff83a52@samsung.com
      
      
      Signed-off-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      62afcd1c
    • Johannes Weiner's avatar
      mm: fix false-positive OVERCOMMIT_GUESS failures · 8c7829b0
      Johannes Weiner authored
      With the default overcommit==guess we occasionally run into mmap
      rejections despite plenty of memory that would get dropped under
      pressure but just isn't accounted reclaimable. One example of this is
      dying cgroups pinned by some page cache. A previous case was auxiliary
      path name memory associated with dentries; we have since annotated
      those allocations to avoid overcommit failures (see d79f7aa4 ("mm:
      treat indirectly reclaimable memory as free in overcommit logic")).
      
      But trying to classify all allocated memory reliably as reclaimable
      and unreclaimable is a bit of a fool's errand. There could be a myriad
      of dependencies that constantly change with kernel versions.
      
      It becomes even more questionable of an effort when considering how
      this estimate of available memory is used: it's not compared to the
      system-wide allocated virtual memory in any way. It's not even
      compared to the allocating process's address space. It's compared to
      the single allocation request at hand!
      
      So we have an elaborate left-hand side of the equation that tries to
      assess the exact breathing room the system has available down to a
      page - and then compare it to an isolated allocation request with no
      additional context. We could fail an allocation of N bytes, but for
      two allocations of N/2 bytes we'd do this elaborate dance twice in a
      row and then still let N bytes of virtual memory through. This doesn't
      make a whole lot of sense.
      
      Let's take a step back and look at the actual goal of the
      heuristic. From the documentation:
      
         Heuristic overcommit handling. Obvious overcommits of address
         space are refused. Used for a typical system. It ensures a
         seriously wild allocation fails while allowing overcommit to
         reduce swap usage.  root is allowed to allocate slightly more
         memory in this mode. This is the default.
      
      If all we want to do is catch clearly bogus allocation requests
      irrespective of the general virtual memory situation, the physical
      memory counter-part doesn't need to be that complicated, either.
      
      When in GUESS mode, catch wild allocations by comparing their request
      size to total amount of ram and swap in the system.
      
      Link: http://lkml.kernel.org/r/20190412191418.26333-1-hannes@cmpxchg.org
      
      
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarRoman Gushchin <guro@fb.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8c7829b0
    • David Hildenbrand's avatar
      mm/memory_hotplug: make __remove_pages() and arch_remove_memory() never fail · ac5c9426
      David Hildenbrand authored
      All callers of arch_remove_memory() ignore errors.  And we should really
      try to remove any errors from the memory removal path.  No more errors are
      reported from __remove_pages().  BUG() in s390x code in case
      arch_remove_memory() is triggered.  We may implement that properly later.
      WARN in case powerpc code failed to remove the section mapping, which is
      better than ignoring the error completely right now.
      
      Link: http://lkml.kernel.org/r/20190409100148.24703-5-david@redhat.com
      
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Stefan Agner <stefan@agner.ch>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mike Travis <mike.travis@hpe.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ac5c9426
    • David Hildenbrand's avatar
      mm/memory_hotplug: make __remove_section() never fail · 9d1d887d
      David Hildenbrand authored
      Let's just warn in case a section is not valid instead of failing to
      remove somewhere in the middle of the process, returning an error that
      will be mostly ignored by callers.
      
      Link: http://lkml.kernel.org/r/20190409100148.24703-4-david@redhat.com
      
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Mike Travis <mike.travis@hpe.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Stefan Agner <stefan@agner.ch>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9d1d887d
    • David Hildenbrand's avatar
      mm/memory_hotplug: make unregister_memory_section() never fail · cb7b3a36
      David Hildenbrand authored
      Failing while removing memory is mostly ignored and cannot really be
      handled.  Let's treat errors in unregister_memory_section() in a nice way,
      warning, but continuing.
      
      Link: http://lkml.kernel.org/r/20190409100148.24703-3-david@redhat.com
      
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Mike Travis <mike.travis@hpe.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Stefan Agner <stefan@agner.ch>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cb7b3a36
    • David Hildenbrand's avatar
      mm/memory_hotplug: release memory resource after arch_remove_memory() · d9eb1417
      David Hildenbrand authored
      Patch series "mm/memory_hotplug: Better error handling when removing
      memory", v1.
      
      Error handling when removing memory is somewhat messed up right now.  Some
      errors result in warnings, others are completely ignored.  Memory unplug
      code can essentially not deal with errors properly as of now.
      remove_memory() will never fail.
      
      We have basically two choices:
      1. Allow arch_remov_memory() and friends to fail, propagating errors via
         remove_memory(). Might be problematic (e.g. DIMMs consisting of multiple
         pieces added/removed separately).
      2. Don't allow the functions to fail, handling errors in a nicer way.
      
      It seems like most errors that can theoretically happen are really corner
      cases and mostly theoretical (e.g.  "section not valid").  However e.g.
      aborting removal of sections while all callers simply continue in case of
      errors is not nice.
      
      If we can gurantee that removal of memory always works (and WARN/skip in
      case of theoretical errors so we can figure out what is going on), we can
      go ahead and implement better error handling when adding memory.
      
      E.g. via add_memory():
      
      arch_add_memory()
      ret = do_stuff()
      if (ret) {
      	arch_remove_memory();
      	goto error;
      }
      
      Handling here that arch_remove_memory() might fail is basically
      impossible.  So I suggest, let's avoid reporting errors while removing
      memory, warning on theoretical errors instead and continuing instead of
      aborting.
      
      This patch (of 4):
      
      __add_pages() doesn't add the memory resource, so __remove_pages()
      shouldn't remove it.  Let's factor it out.  Especially as it is a special
      case for memory used as system memory, added via add_memory() and friends.
      
      We now remove the resource after removing the sections instead of doing it
      the other way around.  I don't think this change is problematic.
      
      add_memory()
      	register memory resource
      	arch_add_memory()
      
      remove_memory
      	arch_remove_memory()
      	release memory resource
      
      While at it, explain why we ignore errors and that it only happeny if
      we remove memory in a different granularity as we added it.
      
      [david@redhat.com: fix printk warning]
        Link: http://lkml.kernel.org/r/20190417120204.6997-1-david@redhat.com
      Link: http://lkml.kernel.org/r/20190409100148.24703-2-david@redhat.com
      
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Mike Travis <mike.travis@hpe.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Stefan Agner <stefan@agner.ch>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d9eb1417
    • Laurent Dufour's avatar
    • Michal Hocko's avatar
      mm, memory_hotplug: provide a more generic restrictions for memory hotplug · 940519f0
      Michal Hocko authored
      arch_add_memory, __add_pages take a want_memblock which controls whether
      the newly added memory should get the sysfs memblock user API (e.g.
      ZONE_DEVICE users do not want/need this interface).  Some callers even
      want to control where do we allocate the memmap from by configuring
      altmap.
      
      Add a more generic hotplug context for arch_add_memory and __add_pages.
      struct mhp_restrictions contains flags which contains additional features
      to be enabled by the memory hotplug (MHP_MEMBLOCK_API currently) and
      altmap for alternative memmap allocator.
      
      This patch shouldn't introduce any functional change.
      
      [akpm@linux-foundation.org: build fix]
      Link: http://lkml.kernel.org/r/20190408082633.2864-3-osalvador@suse.de
      
      
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      940519f0
    • Michal Hocko's avatar
      mm, memory_hotplug: cleanup memory offline path · 5557c766
      Michal Hocko authored
      check_pages_isolated_cb currently accounts the whole pfn range as being
      offlined if test_pages_isolated suceeds on the range.  This is based on
      the assumption that all pages in the range are freed which is currently
      the case in most cases but it won't be with later changes, as pages marked
      as vmemmap won't be isolated.
      
      Move the offlined pages counting to offline_isolated_pages_cb and rely on
      __offline_isolated_pages to return the correct value.
      check_pages_isolated_cb will still do it's primary job and check the pfn
      range.
      
      While we are at it remove check_pages_isolated and offline_isolated_pages
      and use directly walk_system_ram_range as do in online_pages.
      
      Link: http://lkml.kernel.org/r/20190408082633.2864-2-osalvador@suse.de
      
      
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5557c766
    • Alexander Duyck's avatar
      mm: initialize MAX_ORDER_NR_PAGES at a time instead of doing larger sections · 0e56acae
      Alexander Duyck authored
      Add yet another iterator, for_each_free_mem_range_in_zone_from, and then
      use it to support initializing and freeing pages in groups no larger than
      MAX_ORDER_NR_PAGES.  By doing this we can greatly improve the cache
      locality of the pages while we do several loops over them in the init and
      freeing process.
      
      We are able to tighten the loops further as a result of the "from"
      iterator as we can perform the initial checks for first_init_pfn in our
      first call to the iterator, and continue without the need for those checks
      via the "from" iterator.  I have added this functionality in the function
      called deferred_init_mem_pfn_range_in_zone that primes the iterator and
      causes us to exit if we encounter any failure.
      
      On my x86_64 test system with 384GB of memory per node I saw a reduction
      in initialization time from 1.85s to 1.38s as a result of this patch.
      
      Link: http://lkml.kernel.org/r/20190405221231.12227.85836.stgit@localhost.localdomain
      
      
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: <yi.z.zhang@linux.intel.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0e56acae
    • Alexander Duyck's avatar
      mm: implement new zone specific memblock iterator · 837566e7
      Alexander Duyck authored
      Introduce a new iterator for_each_free_mem_pfn_range_in_zone.
      
      This iterator will take care of making sure a given memory range provided
      is in fact contained within a zone.  It takes are of all the bounds
      checking we were doing in deferred_grow_zone, and deferred_init_memmap.
      In addition it should help to speed up the search a bit by iterating until
      the end of a range is greater than the start of the zone pfn range, and
      will exit completely if the start is beyond the end of the zone.
      
      Link: http://lkml.kernel.org/r/20190405221225.12227.22573.stgit@localhost.localdomain
      
      
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@soleen.com>
      Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <yi.z.zhang@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      837566e7
    • Alexander Duyck's avatar
      mm: drop meminit_pfn_in_nid as it is redundant · 56ec43d8
      Alexander Duyck authored
      As best as I can tell the meminit_pfn_in_nid call is completely redundant.
      The deferred memory initialization is already making use of
      for_each_free_mem_range which in turn will call into __next_mem_range
      which will only return a memory range if it matches the node ID provided
      assuming it is not NUMA_NO_NODE.
      
      I am operating on the assumption that there are no zones or pgdata_t
      structures that have a NUMA node of NUMA_NO_NODE associated with them.  If
      that is the case then __next_mem_range will never return a memory range
      that doesn't match the zone's node ID and as such the check is redundant.
      
      So one piece I would like to verify on this is if this works for ia64.
      Technically it was using a different approach to get the node ID, but it
      seems to have the node ID also encoded into the memblock.  So I am
      assuming this is okay, but would like to get confirmation on that.
      
      On my x86_64 test system with 384GB of memory per node I saw a reduction
      in initialization time from 2.80s to 1.85s as a result of this patch.
      
      Link: http://lkml.kernel.org/r/20190405221219.12227.93957.stgit@localhost.localdomain
      
      
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <yi.z.zhang@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      56ec43d8
    • Alexander Duyck's avatar
      mm: use mm_zero_struct_page from SPARC on all 64b architectures · 5470dea4
      Alexander Duyck authored
      Patch series "Deferred page init improvements", v7.
      
      This patchset is essentially a refactor of the page initialization logic
      that is meant to provide for better code reuse while providing a
      significant improvement in deferred page initialization performance.
      
      In my testing on an x86_64 system with 384GB of RAM I have seen the
      following.  In the case of regular memory initialization the deferred init
      time was decreased from 3.75s to 1.38s on average.  This amounts to a 172%
      improvement for the deferred memory initialization performance.
      
      I have called out the improvement observed with each patch.
      
      This patch (of 4):
      
      Use the same approach that was already in use on Sparc on all the
      architectures that support a 64b long.
      
      This is mostly motivated by the fact that 7 to 10 store/move instructions
      are likely always going to be faster than having to call into a function
      that is not specialized for handling page init.
      
      An added advantage to doing it this way is that the compiler can get away
      with combining writes in the __init_single_page call.  As a result the
      memset call will be reduced to only about 4 write operations, or at least
      that is what I am seeing with GCC 6.2 as the flags, LRU pointers, and
      count/mapcount seem to be cancelling out at least 4 of the 8 assignments
      on my system.
      
      One change I had to make to the function was to reduce the minimum page
      size to 56 to support some powerpc64 configurations.
      
      This change should introduce no change on SPARC since it already had this
      code.  In the case of x86_64 I saw a reduction from 3.75s to 2.80s when
      initializing 384GB of RAM per node.  Pavel Tatashin tested on a system
      with Broadcom's Stingray CPU and 48GB of RAM and found that
      __init_single_page() takes 19.30ns / 64-byte struct page before this patch
      and with this patch it takes 17.33ns / 64-byte struct page.  Mike Rapoport
      ran a similar test on a OpenPower (S812LC 8348-21C) with Power8 processor
      and 128GB or RAM.  His results per 64-byte struct page were 4.68ns before,
      and 4.59ns after this patch.
      
      Link: http://lkml.kernel.org/r/20190405221213.12227.9392.stgit@localhost.localdomain
      
      
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <yi.z.zhang@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5470dea4
    • Huang Shijie's avatar
      mm/rmap.c: use the pra.mapcount to do the check · 059d8442
      Huang Shijie authored
      We have the pra.mapcount already, and there is no need to call the
      page_mapped() which may do some complicated computing for compound page.
      
      Link: http://lkml.kernel.org/r/20190404054828.2731-1-sjhuang@iluvatar.ai
      
      
      Signed-off-by: default avatarHuang Shijie <sjhuang@iluvatar.ai>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      059d8442
    • Josef Bacik's avatar
      mm/filemap.c: enable error injection at add_to_page_cache() · cfcbfb13
      Josef Bacik authored
      Recently I messed up the error handling in filemap_fault() because of an
      unexpected ENOMEM (related to cgroup memory limits) in add_to_page_cache.
      Enable error injection at this point so I can add a testcase to xfstests
      to verify I don't mess this up again.
      
      [akpm@linux-foundation.org: include linux/error-injection.h]
      Link: http://lkml.kernel.org/r/20190403152604.14008-1-josef@toxicpanda.com
      
      
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarWilliam Kucharski <william.kucharski@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cfcbfb13
    • Jérôme Glisse's avatar
      mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper · c6d23413
      Jérôme Glisse authored
      Helper to test if a range is updated to read only (it is still valid to
      read from the range).  This is useful for device driver or anyone who wish
      to optimize out update when they know that they already have the range map
      read only.
      
      Link: http://lkml.kernel.org/r/20190326164747.24405-9-jglisse@redhat.com
      
      
      Signed-off-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Reviewed-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Felix Kuehling <Felix.Kuehling@amd.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Ross Zwisler <zwisler@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Christian Koenig <christian.koenig@amd.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c6d23413
    • Jérôme Glisse's avatar
      mm/mmu_notifier: pass down vma and reasons why mmu notifier is happening · bf198b2b
      Jérôme Glisse authored
      CPU page table update can happens for many reasons, not only as a result
      of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also as
      a result of kernel activities (memory compression, reclaim, migration,
      ...).
      
      Users of mmu notifier API track changes to the CPU page table and take
      specific action for them.  While current API only provide range of virtual
      address affected by the change, not why the changes is happening
      
      This patch is just passing down the new informations by adding it to the
      mmu_notifier_range structure.
      
      Link: http://lkml.kernel.org/r/20190326164747.24405-8-jglisse@redhat.com
      
      
      Signed-off-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Reviewed-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Felix Kuehling <Felix.Kuehling@amd.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Ross Zwisler <zwisler@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Christian Koenig <christian.koenig@amd.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bf198b2b
    • Jérôme Glisse's avatar
      mm/mmu_notifier: use correct mmu_notifier events for each invalidation · 7269f999
      Jérôme Glisse authored
      This updates each existing invalidation to use the correct mmu notifier
      event that represent what is happening to the CPU page table.  See the
      patch which introduced the events to see the rational behind this.
      
      Link: http://lkml.kernel.org/r/20190326164747.24405-7-jglisse@redhat.com
      
      
      Signed-off-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Reviewed-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Felix Kuehling <Felix.Kuehling@amd.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Ross Zwisler <zwisler@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Christian Koenig <christian.koenig@amd.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7269f999
    • Jérôme Glisse's avatar
      mm/mmu_notifier: contextual information for event triggering invalidation · 6f4f13e8
      Jérôme Glisse authored
      CPU page table update can happens for many reasons, not only as a result
      of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also as
      a result of kernel activities (memory compression, reclaim, migration,
      ...).
      
      Users of mmu notifier API track changes to the CPU page table and take
      specific action for them.  While current API only provide range of virtual
      address affected by the change, not why the changes is happening.
      
      This patchset do the initial mechanical convertion of all the places that
      calls mmu_notifier_range_init to also provide the default MMU_NOTIFY_UNMAP
      event as well as the vma if it is know (most invalidation happens against
      a given vma).  Passing down the vma allows the users of mmu notifier to
      inspect the new vma page protection.
      
      The MMU_NOTIFY_UNMAP is always the safe default as users of mmu notifier
      should assume that every for the range is going away when that event
      happens.  A latter patch do convert mm call path to use a more appropriate
      events for each call.
      
      This is done as 2 patches so that no call site is forgotten especialy
      as it uses this following coccinelle patch:
      
      %<----------------------------------------------------------------------
      @@
      identifier I1, I2, I3, I4;
      @@
      static inline void mmu_notifier_range_init(struct mmu_notifier_range *I1,
      +enum mmu_notifier_event event,
      +unsigned flags,
      +struct vm_area_struct *vma,
      struct mm_struct *I2, unsigned long I3, unsigned long I4) { ... }
      
      @@
      @@
      -#define mmu_notifier_range_init(range, mm, start, end)
      +#define mmu_notifier_range_init(range, event, flags, vma, mm, start, end)
      
      @@
      expression E1, E3, E4;
      identifier I1;
      @@
      <...
      mmu_notifier_range_init(E1,
      +MMU_NOTIFY_UNMAP, 0, I1,
      I1->vm_mm, E3, E4)
      ...>
      
      @@
      expression E1, E2, E3, E4;
      identifier FN, VMA;
      @@
      FN(..., struct vm_area_struct *VMA, ...) {
      <...
      mmu_notifier_range_init(E1,
      +MMU_NOTIFY_UNMAP, 0, VMA,
      E2, E3, E4)
      ...> }
      
      @@
      expression E1, E2, E3, E4;
      identifier FN, VMA;
      @@
      FN(...) {
      struct vm_area_struct *VMA;
      <...
      mmu_notifier_range_init(E1,
      +MMU_NOTIFY_UNMAP, 0, VMA,
      E2, E3, E4)
      ...> }
      
      @@
      expression E1, E2, E3, E4;
      identifier FN;
      @@
      FN(...) {
      <...
      mmu_notifier_range_init(E1,
      +MMU_NOTIFY_UNMAP, 0, NULL,
      E2, E3, E4)
      ...> }
      ---------------------------------------------------------------------->%
      
      Applied with:
      spatch --all-includes --sp-file mmu-notifier.spatch fs/proc/task_mmu.c --in-place
      spatch --sp-file mmu-notifier.spatch --dir kernel/events/ --in-place
      spatch --sp-file mmu-notifier.spatch --dir mm --in-place
      
      Link: http://lkml.kernel.org/r/20190326164747.24405-6-jglisse@redhat.com
      
      
      Signed-off-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Reviewed-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Felix Kuehling <Felix.Kuehling@amd.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Ross Zwisler <zwisler@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Christian Koenig <christian.koenig@amd.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6f4f13e8
    • Jérôme Glisse's avatar
      mm/mmu_notifier: contextual information for event enums · d87f055b
      Jérôme Glisse authored
      CPU page table update can happens for many reasons, not only as a result
      of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also as
      a result of kernel activities (memory compression, reclaim, migration,
      ...).
      
      This patch introduce a set of enums that can be associated with each of
      the events triggering a mmu notifier.  Latter patches take advantages of
      those enum values.
      
          - UNMAP: munmap() or mremap()
          - CLEAR: page table is cleared (migration, compaction, reclaim, ...)
          - PROTECTION_VMA: change in access protections for the range
          - PROTECTION_PAGE: change in access protections for page in the range
          - SOFT_DIRTY: soft dirtyness tracking
      
      Being able to identify munmap() and mremap() from other reasons why the
      page table is cleared is important to allow user of mmu notifier to update
      their own internal tracking structure accordingly (on munmap or mremap it
      is not longer needed to track range of virtual address as it becomes
      invalid).
      
      Link: http://lkml.kernel.org/r/20190326164747.24405-5-jglisse@redhat.com
      
      
      Signed-off-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Reviewed-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Felix Kuehling <Felix.Kuehling@amd.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Ross Zwisler <zwisler@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Christian Koenig <christian.koenig@amd.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d87f055b
    • Jérôme Glisse's avatar
      mm/mmu_notifier: convert mmu_notifier_range->blockable to a flags · 27560ee9
      Jérôme Glisse authored
      Use an unsigned field for flags other than blockable and convert the
      blockable field to be one of those flags.
      
      Link: http://lkml.kernel.org/r/20190326164747.24405-4-jglisse@redhat.com
      
      
      Signed-off-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Reviewed-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Felix Kuehling <Felix.Kuehling@amd.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Ross Zwisler <zwisler@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Christian Koenig <christian.koenig@amd.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      27560ee9
    • Jérôme Glisse's avatar
      mm/mmu_notifier: convert user range->blockable to helper function · dfcd6660
      Jérôme Glisse authored
      Use the mmu_notifier_range_blockable() helper function instead of directly
      dereferencing the range->blockable field.  This is done to make it easier
      to change the mmu_notifier range field.
      
      This patch is the outcome of the following coccinelle patch:
      
      %<-------------------------------------------------------------------
      @@
      identifier I1, FN;
      @@
      FN(..., struct mmu_notifier_range *I1, ...) {
      <...
      -I1->blockable
      +mmu_notifier_range_blockable(I1)
      ...>
      }
      ------------------------------------------------------------------->%
      
      spatch --in-place --sp-file blockable.spatch --dir .
      
      Link: http://lkml.kernel.org/r/20190326164747.24405-3-jglisse@redhat.com
      
      
      Signed-off-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Reviewed-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Ro...
      dfcd6660
    • Jérôme Glisse's avatar
      mm/mmu_notifier: helper to test if a range invalidation is blockable · 4a83bfe9
      Jérôme Glisse authored
      Patch series "mmu notifier provide context informations", v6.
      
      Here I am not posting users of this, they already have been posted to
      appropriate mailing list [6] and will be merge through the appropriate
      tree once this patchset is upstream.
      
      Note that this serie does not change any behavior for any existing code.
      It just pass down more information to mmu notifier listener.
      
      The rationale for this patchset:
      
      CPU page table update can happens for many reasons, not only as a result
      of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also as
      a result of kernel activities (memory compression, reclaim, migration,
      ...).
      
      This patchset introduce a set of enums that can be associated with each of
      the events triggering a mmu notifier:
      
          - UNMAP: munmap() or mremap()
          - CLEAR: page table is cleared (migration, compaction, reclaim, ...)
          - PROTECTION_VMA: change in access protections for the range
          - PROTECTION_PAGE: change in access protections for page in the range
          - SOFT_DIRTY: soft dirtyness tracking
      
      Being able to identify munmap() and mremap() from other reasons why the
      page table is cleared is important to allow user of mmu notifier to update
      their own internal tracking structure accordingly (on munmap or mremap it
      is not longer needed to track range of virtual address as it becomes
      invalid).  Without this serie, driver are force to assume that every
      notification is an munmap which triggers useless trashing within drivers
      that associate structure with range of virtual address.  Each driver is
      force to free up its tracking structure and then restore it on next device
      page fault.  With this series we can also optimize device page table update.  Patches to use this are at
      
      https://lkml.org/lkml/2019/1/23/833
      https://lkml.org/lkml/2019/1/23/834
      https://lkml.org/lkml/2019/1/23/832
      https://lkml.org/lkml/2019/1/23/831
      
      Moreover this can also be used to optimize out some page table updates
      such as for KVM where we can update the secondary MMU directly from the
      callback instead of clearing it.
      
      ACKS AMD/RADEON https://lkml.org/lkml/2019/2/1/395
      ACKS RDMA https://lkml.org/lkml/2018/12/6/1473
      
      This patch (of 8):
      
      Simple helpers to test if range invalidation is blockable.  Latter patches
      use cocinnelle to convert all direct dereference of range-> blockable to
      use this function instead so that we can convert the blockable field to an
      unsigned for more flags.
      
      Link: http://lkml.kernel.org/r/20190326164747.24405-2-jglisse@redhat.com
      
      
      Signed-off-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Reviewed-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Felix Kuehling <Felix.Kuehling@amd.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Ross Zwisler <zwisler@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Christian Koenig <christian.koenig@amd.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4a83bfe9