Skip to content
  1. Jun 28, 2021
    • Linus Torvalds's avatar
      Linux 5.13 · 62fb9874
      Linus Torvalds authored
      v5.13
      62fb9874
    • Linus Torvalds's avatar
      Revert "signal: Allow tasks to cache one sigqueue struct" · b4b27b9e
      Linus Torvalds authored
      This reverts commits 4bad58eb (and
      399f8dd9, which tried to fix it).
      
      I do not believe these are correct, and I'm about to release 5.13, so am
      reverting them out of an abundance of caution.
      
      The locking is odd, and appears broken.
      
      On the allocation side (in __sigqueue_alloc()), the locking is somewhat
      straightforward: it depends on sighand->siglock.  Since one caller
      doesn't hold that lock, it further then tests 'sigqueue_flags' to avoid
      the case with no locks held.
      
      On the freeing side (in sigqueue_cache_or_free()), there is no locking
      at all, and the logic instead depends on 'current' being a single
      thread, and not able to race with itself.
      
      To make things more exciting, there's also the data race between freeing
      a signal and allocating one, which is handled by using WRITE_ONCE() and
      READ_ONCE(), and being mutually exclusive wrt the initial state (ie
      freeing will only free if the old state was NULL, while allocating will
      obviously only use the value if it was non-NULL, so only one or the
      other will actually act on the value).
      
      However, while the free->alloc paths do seem mutually exclusive thanks
      to just the data value dependency, it's not clear what the memory
      ordering constraints are on it.  Could writes from the previous
      allocation possibly be delayed and seen by the new allocation later,
      causing logical inconsistencies?
      
      So it's all very exciting and unusual.
      
      And in particular, it seems that the freeing side is incorrect in
      depending on "current" being single-threaded.  Yes, 'current' is a
      single thread, but in the presense of asynchronous events even a single
      thread can have data races.
      
      And such asynchronous events can and do happen, with interrupts causing
      signals to be flushed and thus free'd (for example - sending a
      SIGCONT/SIGSTOP can happen from interrupt context, and can flush
      previously queued process control signals).
      
      So regardless of all the other questions about the memory ordering and
      locking for this new cached allocation, the sigqueue_cache_or_free()
      assumptions seem to be fundamentally incorrect.
      
      It may be that people will show me the errors of my ways, and tell me
      why this is all safe after all.  We can reinstate it if so.  But my
      current belief is that the WRITE_ONCE() that sets the cached entry needs
      to be a smp_store_release(), and the READ_ONCE() that finds a cached
      entry needs to be a smp_load_acquire() to handle memory ordering
      correctly.
      
      And the sequence in sigqueue_cache_or_free() would need to either use a
      lock or at least be interrupt-safe some way (perhaps by using something
      like the percpu 'cmpxchg': it doesn't need to be SMP-safe, but like the
      percpu operations it needs to be interrupt-safe).
      
      Fixes: 399f8dd9 ("signal: Prevent sigqueue caching after task got released")
      Fixes: 4bad58eb
      
       ("signal: Allow tasks to cache one sigqueue struct")
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Christian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b4b27b9e
  2. Jun 27, 2021
    • Linus Torvalds's avatar
      Merge tag 's390-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 625acffd
      Linus Torvalds authored
      Pull s390 fixes from Vasily Gorbik:
      
       - Fix a couple of late pt_regs flags handling findings of conversion to
         generic entry.
      
       - Fix potential register clobbering in stack switch helper.
      
       - Fix thread/group masks for offline cpus.
      
       - Fix cleanup of mdev resources when remove callback is invoked in
         vfio-ap code.
      
      * tag 's390-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/stack: fix possible register corruption with stack switch helper
        s390/topology: clear thread/group maps for offline cpus
        s390/vfio-ap: clean up mdev resources when remove callback invoked
        s390: clear pt_regs::flags on irq entry
        s390: fix system call restart with multiple signals
      625acffd
  3. Jun 26, 2021
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v5.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · b7050b24
      Linus Torvalds authored
      Pull pin control fixes from Linus Walleij:
       "Two last-minute fixes:
      
         - Put an fwnode in the errorpath in the SGPIO driver
      
         - Fix the number of GPIO lines per bank in the STM32 driver"
      
      * tag 'pinctrl-v5.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: stm32: fix the reported number of GPIO lines per bank
        pinctrl: microchip-sgpio: Put fwnode in error case during ->probe()
      b7050b24
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · e2f527b5
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Two small fixes, both in upper layer drivers (scsi disk and cdrom).
      
        The sd one is fixing a commit changing revalidation that came from the
        block tree a while ago (5.10) and the sr one adds handling of a
        condition we didn't previously handle for manually removed media"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: sd: Call sd_revalidate_disk() for ioctl(BLKRRPART)
        scsi: sr: Return appropriate error code when disk is ejected
      e2f527b5
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 7ce32ac6
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "24 patches, based on 4a09d388.
      
        Subsystems affected by this patch series: mm (thp, vmalloc, hugetlb,
        memory-failure, and pagealloc), nilfs2, kthread, MAINTAINERS, and
        mailmap"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (24 commits)
        mailmap: add Marek's other e-mail address and identity without diacritics
        MAINTAINERS: fix Marek's identity again
        mm/page_alloc: do bulk array bounds check after checking populated elements
        mm/page_alloc: __alloc_pages_bulk(): do bounds check before accessing array
        mm/hwpoison: do not lock page again when me_huge_page() successfully recovers
        mm,hwpoison: return -EHWPOISON to denote that the page has already been poisoned
        mm/memory-failure: use a mutex to avoid memory_failure() races
        mm, futex: fix shared futex pgoff on shmem huge page
        kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()
        kthread_worker: split code for canceling the delayed work timer
        mm/vmalloc: unbreak kasan vmalloc support
        KVM: s390: prepare for hugepage vmalloc
        mm/vmalloc: add vmalloc_no_huge
        nilfs2: fix memory leak in nilfs_sysfs_delete_device_group
        mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk()
        mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes
        mm: page_vma_mapped_walk(): get vma_address_end() earlier
        mm: page_vma_mapped_walk(): use goto instead of while (1)
        mm: page_vma_mapped_walk(): add a level of indentation
        mm: page_vma_mapped_walk(): crossing page table boundary
        ...
      7ce32ac6
    • Gleb Fotengauer-Malinovskiy's avatar
      userfaultfd: uapi: fix UFFDIO_CONTINUE ioctl request definition · 808e9df4
      Gleb Fotengauer-Malinovskiy authored
      This ioctl request reads from uffdio_continue structure written by
      userspace which justifies _IOC_WRITE flag.  It also writes back to that
      structure which justifies _IOC_READ flag.
      
      See NOTEs in include/uapi/asm-generic/ioctl.h for more information.
      
      Fixes: f6191471
      
       ("userfaultfd: add UFFDIO_CONTINUE ioctl")
      Signed-off-by: default avatarGleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Reviewed-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      808e9df4
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 55fcd449
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "Three more driver bugfixes and an annotation fix for the core"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: robotfuzz-osif: fix control-request directions
        i2c: dev: Add __user annotation
        i2c: cp2615: check for allocation failure in cp2615_i2c_recv()
        i2c: i801: Ensure that SMBHSTSTS_INUSE_STS is cleared when leaving i801_access
      55fcd449
    • Linus Torvalds's avatar
      Merge tag 'devprop-5.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 7764c62f
      Linus Torvalds authored
      Pull device properties framework fix from Rafael Wysocki:
       "Fix a NULL pointer dereference introduced by a recent commit and
        occurring when device_remove_software_node() is used with a device
        that has never been registered (Heikki Krogerus)"
      
      * tag 'devprop-5.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        software node: Handle software node injection to an existing device properly
      7764c62f
    • Linus Torvalds's avatar
      Merge tag 'for-linus-5.13b-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · b960e014
      Linus Torvalds authored
      Pull xen fix from Juergen Gross:
       "A fix for a regression introduced in 5.12: when migrating an irq
        related to a Xen user event to another cpu, a race might result
        in a WARN() triggering"
      
      * tag 'for-linus-5.13b-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/events: reset active flag for lateeoi events later
      b960e014
    • Linus Torvalds's avatar
      Merge tag 'for-linus-urgent' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 616a99dd
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "A selftests fix for ARM, and the fix for page reference count
        underflow. This is a very small fix that was provided by Nick Piggin
        and tested by myself"
      
      * tag 'for-linus-urgent' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: do not allow mapping valid but non-reference-counted pages
        KVM: selftests: Fix mapping length truncation in m{,un}map()
      616a99dd
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 94ca94bb
      Linus Torvalds authored
      Pull x86 fixes from Borislav Petkov:
       "Two more urgent FPU fixes:
      
         - prevent unprivileged userspace from reinitializing supervisor
           states
      
         - prepare init_fpstate, which is the buffer used when initializing
           FPU state, properly in case the skip-writing-state-components
           XSAVE* variants are used"
      
      * tag 'x86_urgent_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/fpu: Make init_fpstate correct with optimized XSAVE
        x86/fpu: Preserve supervisor states in sanitize_restored_user_xstate()
      94ca94bb
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-5.13-rc8' of https://github.com/ceph/ceph-client · edf54d9d
      Linus Torvalds authored
      Pull ceph fixes from Ilya Dryomov:
       "Two regression fixes from the merge window: one in the auth code
        affecting old clusters and one in the filesystem for proper
        propagation of MDS request errors.
      
        Also included a locking fix for async creates, marked for stable"
      
      * tag 'ceph-for-5.13-rc8' of https://github.com/ceph/ceph-client:
        libceph: set global_id as soon as we get an auth ticket
        libceph: don't pass result into ac->ops->handle_reply()
        ceph: fix error handling in ceph_atomic_open and ceph_lookup
        ceph: must hold snap_rwsem when filling inode for async create
      edf54d9d
    • Linus Torvalds's avatar
      Merge tag 'netfs-fixes-20210621' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 9e736cf7
      Linus Torvalds authored
      Pull netfs fixes from David Howells:
       "This contains patches to fix netfs_write_begin() and afs_write_end()
        in the following ways:
      
        (1) In netfs_write_begin(), extract the decision about whether to skip
            a page out to its own helper and have that clear around the region
            to be written, but not clear that region. This requires the
            filesystem to patch it up afterwards if the hole doesn't get
            completely filled.
      
        (2) Use offset_in_thp() in (1) rather than manually calculating the
            offset into the page.
      
        (3) Due to (1), afs_write_end() now needs to handle short data write
            into the page by generic_perform_write(). I've adopted an
            analogous approach to ceph of just returning 0 in this case and
            letting the caller go round again.
      
        It also adds a note that (in the future) the len parameter may extend
        beyond the page allocated. This is because the page allocation is
        deferred to write_begin() and that gets to decide what size of THP to
        allocate."
      
      Jeff Layton points out:
       "The netfs fix in particular fixes a data corruption bug in cephfs"
      
      * tag 'netfs-fixes-20210621' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        netfs: fix test for whether we can skip read when writing beyond EOF
        afs: Fix afs_write_end() to handle short writes
      9e736cf7
    • Linus Torvalds's avatar
      Merge tag 'gpio-fixes-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · c13e3021
      Linus Torvalds authored
      Pull gpio fixes from Bartosz Golaszewski:
      
       - fix wake-up interrupt support on gpio-mxc
      
       - zero the padding bytes in a structure passed to user-space in the
         GPIO character device
      
       - require HAS_IOPORT_MAP in two drivers that need it to fix a Kbuild
         issue
      
      * tag 'gpio-fixes-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
        gpio: AMD8111 and TQMX86 require HAS_IOPORT_MAP
        gpiolib: cdev: zero padding during conversion to gpioline_info_changed
        gpio: mxc: Fix disabled interrupt wake-up support
      c13e3021
    • Linus Torvalds's avatar
      Merge tag 'sound-5.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · e41fc7c8
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "Two small changes have been cherry-picked as a last material for 5.13:
        a coverage after UMN revert action and a stale MAINTAINERS entry fix"
      
      * tag 'sound-5.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        MAINTAINERS: remove Timur Tabi from Freescale SOC sound drivers
        ASoC: rt5645: Avoid upgrading static warnings to errors
      e41fc7c8
  4. Jun 25, 2021
    • Johannes Berg's avatar
      gpio: AMD8111 and TQMX86 require HAS_IOPORT_MAP · c6414e1a
      Johannes Berg authored
      
      
      Both of these drivers use ioport_map(), so they need to
      depend on HAS_IOPORT_MAP. Otherwise, they cannot be built
      even with COMPILE_TEST on architectures without an ioport
      implementation, such as ARCH=um.
      
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarBartosz Golaszewski <bgolaszewski@baylibre.com>
      c6414e1a
    • Marek Behún's avatar
      mailmap: add Marek's other e-mail address and identity without diacritics · 72a461ad
      Marek Behún authored
      
      
      Some of my commits were sent with identities
        Marek Behun <marek.behun@nic.cz>
        Marek Behún <marek.behun@nic.cz>
      while the correct one is
        Marek Behún <kabel@kernel.org>
      
      Put this into mailmap so that git shortlog prints all my commits under
      one identity.
      
      Link: https://lkml.kernel.org/r/20210616113624.19351-2-kabel@kernel.org
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      72a461ad
    • Marek Behún's avatar
      MAINTAINERS: fix Marek's identity again · ee924d3d
      Marek Behún authored
      
      
      Fix my name to use diacritics, since MAINTAINERS supports it.
      
      Fix my e-mail address in MAINTAINERS' marvell10g PHY driver description,
      I accidentally put my other e-mail address here.
      
      Link: https://lkml.kernel.org/r/20210616113624.19351-1-kabel@kernel.org
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ee924d3d
    • Mel Gorman's avatar
      mm/page_alloc: do bulk array bounds check after checking populated elements · b3b64ebd
      Mel Gorman authored
      Dan Carpenter reported the following
      
        The patch 0f87d9d3: "mm/page_alloc: add an array-based interface
        to the bulk page allocator" from Apr 29, 2021, leads to the following
        static checker warning:
      
              mm/page_alloc.c:5338 __alloc_pages_bulk()
              warn: potentially one past the end of array 'page_array[nr_populated]'
      
      The problem can occur if an array is passed in that is fully populated.
      That potentially ends up allocating a single page and storing it past
      the end of the array.  This patch returns 0 if the array is fully
      populated.
      
      Link: https://lkml.kernel.org/r/20210618125102.GU30378@techsingularity.net
      Fixes: 0f87d9d3
      
       ("mm/page_alloc: add an array-based interface to the bulk page allocator")
      Signed-off-by: default avatarMel Gorman <mgorman@techsinguliarity.net>
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b3b64ebd
    • Rasmus Villemoes's avatar
      mm/page_alloc: __alloc_pages_bulk(): do bounds check before accessing array · b08e50dd
      Rasmus Villemoes authored
      In the event that somebody would call this with an already fully
      populated page_array, the last loop iteration would do an access beyond
      the end of page_array.
      
      It's of course extremely unlikely that would ever be done, but this
      triggers my internal static analyzer.  Also, if it really is not
      supposed to be invoked this way (i.e., with no NULL entries in
      page_array), the nr_populated<nr_pages check could simply be removed
      instead.
      
      Link: https://lkml.kernel.org/r/20210507064504.1712559-1-linux@rasmusvillemoes.dk
      Fixes: 0f87d9d3
      
       ("mm/page_alloc: add an array-based interface to the bulk page allocator")
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b08e50dd
    • Naoya Horiguchi's avatar
      mm/hwpoison: do not lock page again when me_huge_page() successfully recovers · ea6d0630
      Naoya Horiguchi authored
      Currently me_huge_page() temporary unlocks page to perform some actions
      then locks it again later.  My testcase (which calls hard-offline on
      some tail page in a hugetlb, then accesses the address of the hugetlb
      range) showed that page allocation code detects this page lock on buddy
      page and printed out "BUG: Bad page state" message.
      
      check_new_page_bad() does not consider a page with __PG_HWPOISON as bad
      page, so this flag works as kind of filter, but this filtering doesn't
      work in this case because the "bad page" is not the actual hwpoisoned
      page.  So stop locking page again.  Actions to be taken depend on the
      page type of the error, so page unlocking should be done in ->action()
      callbacks.  So let's make it assumed and change all existing callbacks
      that way.
      
      Link: https://lkml.kernel.org/r/20210609072029.74645-1-nao.horiguchi@gmail.com
      Fixes: commit 78bb9203
      
       ("mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error")
      Signed-off-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ea6d0630
    • Aili Yao's avatar
      mm,hwpoison: return -EHWPOISON to denote that the page has already been poisoned · 47af12ba
      Aili Yao authored
      
      
      When memory_failure() is called with MF_ACTION_REQUIRED on the page that
      has already been hwpoisoned, memory_failure() could fail to send SIGBUS
      to the affected process, which results in infinite loop of MCEs.
      
      Currently memory_failure() returns 0 if it's called for already
      hwpoisoned page, then the caller, kill_me_maybe(), could return without
      sending SIGBUS to current process.  An action required MCE is raised
      when the current process accesses to the broken memory, so no SIGBUS
      means that the current process continues to run and access to the error
      page again soon, so running into MCE loop.
      
      This issue can arise for example in the following scenarios:
      
       - Two or more threads access to the poisoned page concurrently. If
         local MCE is enabled, MCE handler independently handles the MCE
         events. So there's a race among MCE events, and the second or latter
         threads fall into the situation in question.
      
       - If there was a precedent memory error event and memory_failure() for
         the event failed to unmap the error page for some reason, the
         subsequent memory access to the error page triggers the MCE loop
         situation.
      
      To fix the issue, make memory_failure() return an error code when the
      error page has already been hwpoisoned.  This allows memory error
      handler to control how it sends signals to userspace.  And make sure
      that any process touching a hwpoisoned page should get a SIGBUS even in
      "already hwpoisoned" path of memory_failure() as is done in page fault
      path.
      
      Link: https://lkml.kernel.org/r/20210521030156.2612074-3-nao.horiguchi@gmail.com
      Signed-off-by: default avatarAili Yao <yaoaili@kingsoft.com>
      Signed-off-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jue Wang <juew@google.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      47af12ba
    • Tony Luck's avatar
      mm/memory-failure: use a mutex to avoid memory_failure() races · 171936dd
      Tony Luck authored
      
      
      Patch series "mm,hwpoison: fix sending SIGBUS for Action Required MCE", v5.
      
      I wrote this patchset to materialize what I think is the current
      allowable solution mentioned by the previous discussion [1].  I simply
      borrowed Tony's mutex patch and Aili's return code patch, then I queued
      another one to find error virtual address in the best effort manner.  I
      know that this is not a perfect solution, but should work for some
      typical case.
      
      [1]: https://lore.kernel.org/linux-mm/20210331192540.2141052f@alex-virtual-machine/
      
      This patch (of 2):
      
      There can be races when multiple CPUs consume poison from the same page.
      The first into memory_failure() atomically sets the HWPoison page flag
      and begins hunting for tasks that map this page.  Eventually it
      invalidates those mappings and may send a SIGBUS to the affected tasks.
      
      But while all that work is going on, other CPUs see a "success" return
      code from memory_failure() and so they believe the error has been
      handled and continue executing.
      
      Fix by wrapping most of the internal parts of memory_failure() in a
      mutex.
      
      [akpm@linux-foundation.org: make mf_mutex local to memory_failure()]
      
      Link: https://lkml.kernel.org/r/20210521030156.2612074-1-nao.horiguchi@gmail.com
      Link: https://lkml.kernel.org/r/20210521030156.2612074-2-nao.horiguchi@gmail.com
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Aili Yao <yaoaili@kingsoft.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jue Wang <juew@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      171936dd
    • Hugh Dickins's avatar
      mm, futex: fix shared futex pgoff on shmem huge page · fe19bd3d
      Hugh Dickins authored
      If more than one futex is placed on a shmem huge page, it can happen
      that waking the second wakes the first instead, and leaves the second
      waiting: the key's shared.pgoff is wrong.
      
      When 3.11 commit 13d60f4b ("futex: Take hugepages into account when
      generating futex_key"), the only shared huge pages came from hugetlbfs,
      and the code added to deal with its exceptional page->index was put into
      hugetlb source.  Then that was missed when 4.8 added shmem huge pages.
      
      page_to_pgoff() is what others use for this nowadays: except that, as
      currently written, it gives the right answer on hugetlbfs head, but
      nonsense on hugetlbfs tails.  Fix that by calling hugetlbfs-specific
      hugetlb_basepage_index() on PageHuge tails as well as on head.
      
      Yes, it's unconventional to declare hugetlb_basepage_index() there in
      pagemap.h, rather than in hugetlb.h; but I do not expect anything but
      page_to_pgoff() ever to need it.
      
      [akpm@linux-foundation.org: give hugetlb_basepage_index() prototype the correct scope]
      
      Link: https://lkml.kernel.org/r/b17d946b-d09-326e-b42a-52884c36df32@google.com
      Fixes: 800d8c63
      
       ("shmem: add huge pages support")
      Reported-by: default avatarNeel Natu <neelnatu@google.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Zhang Yi <wetpzy@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Darren Hart <dvhart@infradead.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fe19bd3d
    • Petr Mladek's avatar
      kthread: prevent deadlock when kthread_mod_delayed_work() races with... · 5fa54346
      Petr Mladek authored
      kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()
      
      The system might hang with the following backtrace:
      
      	schedule+0x80/0x100
      	schedule_timeout+0x48/0x138
      	wait_for_common+0xa4/0x134
      	wait_for_completion+0x1c/0x2c
      	kthread_flush_work+0x114/0x1cc
      	kthread_cancel_work_sync.llvm.16514401384283632983+0xe8/0x144
      	kthread_cancel_delayed_work_sync+0x18/0x2c
      	xxxx_pm_notify+0xb0/0xd8
      	blocking_notifier_call_chain_robust+0x80/0x194
      	pm_notifier_call_chain_robust+0x28/0x4c
      	suspend_prepare+0x40/0x260
      	enter_state+0x80/0x3f4
      	pm_suspend+0x60/0xdc
      	state_store+0x108/0x144
      	kobj_attr_store+0x38/0x88
      	sysfs_kf_write+0x64/0xc0
      	kernfs_fop_write_iter+0x108/0x1d0
      	vfs_write+0x2f4/0x368
      	ksys_write+0x7c/0xec
      
      It is caused by the following race between kthread_mod_delayed_work()
      and kthread_cancel_delayed_work_sync():
      
      CPU0				CPU1
      
      Context: Thread A		Context: Thread B
      
      kthread_mod_delayed_work()
        spin_lock()
        __kthread_cancel_work()
           spin_unlock()
           del_timer_sync()
      				kthread_cancel_delayed_work_sync()
      				  spin_lock()
      				  __kthread_cancel_work()
      				    spin_unlock()
      				    del_timer_sync()
      				    spin_lock()
      
      				  work->canceling++
      				  spin_unlock
           spin_lock()
         queue_delayed_work()
           // dwork is put into the worker->delayed_work_list
      
         spin_unlock()
      
      				  kthread_flush_work()
           // flush_work is put at the tail of the dwork
      
      				    wait_for_completion()
      
      Context: IRQ
      
        kthread_delayed_work_timer_fn()
          spin_lock()
          list_del_init(&work->node);
          spin_unlock()
      
      BANG: flush_work is not longer linked and will never get proceed.
      
      The problem is that kthread_mod_delayed_work() checks work->canceling
      flag before canceling the timer.
      
      A simple solution is to (re)check work->canceling after
      __kthread_cancel_work().  But then it is not clear what should be
      returned when __kthread_cancel_work() removed the work from the queue
      (list) and it can't queue it again with the new @delay.
      
      The return value might be used for reference counting.  The caller has
      to know whether a new work has been queued or an existing one was
      replaced.
      
      The proper solution is that kthread_mod_delayed_work() will remove the
      work from the queue (list) _only_ when work->canceling is not set.  The
      flag must be checked after the timer is stopped and the remaining
      operations can be done under worker->lock.
      
      Note that kthread_mod_delayed_work() could remove the timer and then
      bail out.  It is fine.  The other canceling caller needs to cancel the
      timer as well.  The important thing is that the queue (list)
      manipulation is done atomically under worker->lock.
      
      Link: https://lkml.kernel.org/r/20210610133051.15337-3-pmladek@suse.com
      Fixes: 9a6b06c8
      
       ("kthread: allow to modify delayed kthread work")
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      Reported-by: default avatarMartin Liu <liumartin@google.com>
      Cc: <jenhaochen@google.com>
      Cc: Minchan Kim <minchan@google.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5fa54346
    • Petr Mladek's avatar
      kthread_worker: split code for canceling the delayed work timer · 34b3d534
      Petr Mladek authored
      
      
      Patch series "kthread_worker: Fix race between kthread_mod_delayed_work()
      and kthread_cancel_delayed_work_sync()".
      
      This patchset fixes the race between kthread_mod_delayed_work() and
      kthread_cancel_delayed_work_sync() including proper return value
      handling.
      
      This patch (of 2):
      
      Simple code refactoring as a preparation step for fixing a race between
      kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync().
      
      It does not modify the existing behavior.
      
      Link: https://lkml.kernel.org/r/20210610133051.15337-2-pmladek@suse.com
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      Cc: <jenhaochen@google.com>
      Cc: Martin Liu <liumartin@google.com>
      Cc: Minchan Kim <minchan@google.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      34b3d534
    • Daniel Axtens's avatar
      mm/vmalloc: unbreak kasan vmalloc support · 7ca3027b
      Daniel Axtens authored
      In commit 121e6f32 ("mm/vmalloc: hugepage vmalloc mappings"),
      __vmalloc_node_range was changed such that __get_vm_area_node was no
      longer called with the requested/real size of the vmalloc allocation,
      but rather with a rounded-up size.
      
      This means that __get_vm_area_node called kasan_unpoision_vmalloc() with
      a rounded up size rather than the real size.  This led to it allowing
      access to too much memory and so missing vmalloc OOBs and failing the
      kasan kunit tests.
      
      Pass the real size and the desired shift into __get_vm_area_node.  This
      allows it to round up the size for the underlying allocators while still
      unpoisioning the correct quantity of shadow memory.
      
      Adjust the other call-sites to pass in PAGE_SHIFT for the shift value.
      
      Link: https://lkml.kernel.org/r/20210617081330.98629-1-dja@axtens.net
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=213335
      Fixes: 121e6f32
      
       ("mm/vmalloc: hugepage vmalloc mappings")
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Tested-by: default avatarDavid Gow <davidgow@google.com>
      Reviewed-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: default avatarUladzislau Rezki (Sony) <urezki@gmail.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Acked-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7ca3027b
    • Claudio Imbrenda's avatar
      KVM: s390: prepare for hugepage vmalloc · 185cca24
      Claudio Imbrenda authored
      The Create Secure Configuration Ultravisor Call does not support using
      large pages for the virtual memory area.  This is a hardware limitation.
      
      This patch replaces the vzalloc call with an almost equivalent call to
      the newly introduced vmalloc_no_huge function, which guarantees that
      only small pages will be used for the backing.
      
      The new call will not clear the allocated memory, but that has never
      been an actual requirement.
      
      Link: https://lkml.kernel.org/r/20210614132357.10202-3-imbrenda@linux.ibm.com
      Fixes: 121e6f32
      
       ("mm/vmalloc: hugepage vmalloc mappings")
      Signed-off-by: default avatarClaudio Imbrenda <imbrenda@linux.ibm.com>
      Reviewed-by: default avatarJanosch Frank <frankja@linux.ibm.com>
      Acked-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      185cca24
    • Claudio Imbrenda's avatar
      mm/vmalloc: add vmalloc_no_huge · 15a64f5a
      Claudio Imbrenda authored
      Patch series "mm: add vmalloc_no_huge and use it", v4.
      
      Add vmalloc_no_huge() and export it, so modules can allocate memory with
      small pages.
      
      Use the newly added vmalloc_no_huge() in KVM on s390 to get around a
      hardware limitation.
      
      This patch (of 2):
      
      Commit 121e6f32 ("mm/vmalloc: hugepage vmalloc mappings") added
      support for hugepage vmalloc mappings, it also added the flag
      VM_NO_HUGE_VMAP for __vmalloc_node_range to request the allocation to be
      performed with 0-order non-huge pages.
      
      This flag is not accessible when calling vmalloc, the only option is to
      call directly __vmalloc_node_range, which is not exported.
      
      This means that a module can't vmalloc memory with small pages.
      
      Case in point: KVM on s390x needs to vmalloc a large area, and it needs
      to be mapped with non-huge pages, because of a hardware limitation.
      
      This patch adds the function vmalloc_no_huge, which works like vmalloc,
      but it is guaranteed to always back the mapping using small pages.  This
      new function is exported, therefore it is usable by modules.
      
      [akpm@linux-foundation.org: whitespace fixes, per Christoph]
      
      Link: https://lkml.kernel.org/r/20210614132357.10202-1-imbrenda@linux.ibm.com
      Link: https://lkml.kernel.org/r/20210614132357.10202-2-imbrenda@linux.ibm.com
      Fixes: 121e6f32
      
       ("mm/vmalloc: hugepage vmalloc mappings")
      Signed-off-by: default avatarClaudio Imbrenda <imbrenda@linux.ibm.com>
      Reviewed-by: default avatarUladzislau Rezki (Sony) <urezki@gmail.com>
      Acked-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Cornelia Huck <cohuck@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      15a64f5a
    • Pavel Skripkin's avatar
      nilfs2: fix memory leak in nilfs_sysfs_delete_device_group · 8fd0c1b0
      Pavel Skripkin authored
      My local syzbot instance hit memory leak in nilfs2.  The problem was in
      missing kobject_put() in nilfs_sysfs_delete_device_group().
      
      kobject_del() does not call kobject_cleanup() for passed kobject and it
      leads to leaking duped kobject name if kobject_put() was not called.
      
      Fail log:
      
        BUG: memory leak
        unreferenced object 0xffff8880596171e0 (size 8):
        comm "syz-executor379", pid 8381, jiffies 4294980258 (age 21.100s)
        hex dump (first 8 bytes):
          6c 6f 6f 70 30 00 00 00                          loop0...
        backtrace:
           kstrdup+0x36/0x70 mm/util.c:60
           kstrdup_const+0x53/0x80 mm/util.c:83
           kvasprintf_const+0x108/0x190 lib/kasprintf.c:48
           kobject_set_name_vargs+0x56/0x150 lib/kobject.c:289
           kobject_add_varg lib/kobject.c:384 [inline]
           kobject_init_and_add+0xc9/0x160 lib/kobject.c:473
           nilfs_sysfs_create_device_group+0x150/0x800 fs/nilfs2/sysfs.c:999
           init_nilfs+0xe26/0x12b0 fs/nilfs2/the_nilfs.c:637
      
      Link: https://lkml.kernel.org/r/20210612140559.20022-1-paskripkin@gmail.com
      Fixes: da7141fb
      
       ("nilfs2: add /sys/fs/nilfs2/<device> group")
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Acked-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: Michael L. Semon <mlsemon35@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8fd0c1b0
    • Hugh Dickins's avatar
      mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() · a7a69d8b
      Hugh Dickins authored
      Aha! Shouldn't that quick scan over pte_none()s make sure that it holds
      ptlock in the PVMW_SYNC case? That too might have been responsible for
      BUGs or WARNs in split_huge_page_to_list() or its unmap_page(), though
      I've never seen any.
      
      Link: https://lkml.kernel.org/r/1bdf384c-8137-a149-2a1e-475a4791c3c@google.com
      Link: https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/
      Fixes: ace71a19
      
       ("mm: introduce page_vma_mapped_walk()")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: default avatarWang Yugui <wangyugui@e16-tech.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a7a69d8b
    • Hugh Dickins's avatar
      mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes · a9a7504d
      Hugh Dickins authored
      Running certain tests with a DEBUG_VM kernel would crash within hours,
      on the total_mapcount BUG() in split_huge_page_to_list(), while trying
      to free up some memory by punching a hole in a shmem huge page: split's
      try_to_unmap() was unable to find all the mappings of the page (which,
      on a !DEBUG_VM kernel, would then keep the huge page pinned in memory).
      
      Crash dumps showed two tail pages of a shmem huge page remained mapped
      by pte: ptes in a non-huge-aligned vma of a gVisor process, at the end
      of a long unmapped range; and no page table had yet been allocated for
      the head of the huge page to be mapped into.
      
      Although designed to handle these odd misaligned huge-page-mapped-by-pte
      cases, page_vma_mapped_walk() falls short by returning false prematurely
      when !pmd_present or !pud_present or !p4d_present or !pgd_present: there
      are cases when a huge page may span the boundary, with ptes present in
      the next.
      
      Restructure page_vma_mapped_walk() as a loop to continue in these cases,
      while keeping its layout much as before.  Add a step_forward() helper to
      advance pvmw->address across those boundaries: originally I tried to use
      mm's standard p?d_addr_end() macros, but hit the same crash 512 times
      less often: because of the way redundant levels are folded together, but
      folded differently in different configurations, it was just too
      difficult to use them correctly; and step_forward() is simpler anyway.
      
      Link: https://lkml.kernel.org/r/fedb8632-1798-de42-f39e-873551d5bc81@google.com
      Fixes: ace71a19
      
       ("mm: introduce page_vma_mapped_walk()")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Wang Yugui <wangyugui@e16-tech.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9a7504d
    • Hugh Dickins's avatar
      mm: page_vma_mapped_walk(): get vma_address_end() earlier · a765c417
      Hugh Dickins authored
      
      
      page_vma_mapped_walk() cleanup: get THP's vma_address_end() at the
      start, rather than later at next_pte.
      
      It's a little unnecessary overhead on the first call, but makes for a
      simpler loop in the following commit.
      
      Link: https://lkml.kernel.org/r/4542b34d-862f-7cb4-bb22-e0df6ce830a2@google.com
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Wang Yugui <wangyugui@e16-tech.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a765c417
    • Hugh Dickins's avatar
      mm: page_vma_mapped_walk(): use goto instead of while (1) · 47446630
      Hugh Dickins authored
      
      
      page_vma_mapped_walk() cleanup: add a label this_pte, matching next_pte,
      and use "goto this_pte", in place of the "while (1)" loop at the end.
      
      Link: https://lkml.kernel.org/r/a52b234a-851-3616-2525-f42736e8934@google.com
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Wang Yugui <wangyugui@e16-tech.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      47446630
    • Hugh Dickins's avatar
      mm: page_vma_mapped_walk(): add a level of indentation · b3807a91
      Hugh Dickins authored
      
      
      page_vma_mapped_walk() cleanup: add a level of indentation to much of
      the body, making no functional change in this commit, but reducing the
      later diff when this is all converted to a loop.
      
      [hughd@google.com: : page_vma_mapped_walk(): add a level of indentation fix]
        Link: https://lkml.kernel.org/r/7f817555-3ce1-c785-e438-87d8efdcaf26@google.com
      
      Link: https://lkml.kernel.org/r/efde211-f3e2-fe54-977-ef481419e7f3@google.com
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Wang Yugui <wangyugui@e16-tech.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b3807a91
    • Hugh Dickins's avatar
      mm: page_vma_mapped_walk(): crossing page table boundary · 44828248
      Hugh Dickins authored
      
      
      page_vma_mapped_walk() cleanup: adjust the test for crossing page table
      boundary - I believe pvmw->address is always page-aligned, but nothing
      else here assumed that; and remember to reset pvmw->pte to NULL after
      unmapping the page table, though I never saw any bug from that.
      
      Link: https://lkml.kernel.org/r/799b3f9c-2a9e-dfef-5d89-26e9f76fd97@google.com
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Wang Yugui <wangyugui@e16-tech.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      44828248
    • Hugh Dickins's avatar
      mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block · e2e1d407
      Hugh Dickins authored
      
      
      page_vma_mapped_walk() cleanup: rearrange the !pmd_present() block to
      follow the same "return not_found, return not_found, return true"
      pattern as the block above it (note: returning not_found there is never
      premature, since existence or prior existence of huge pmd guarantees
      good alignment).
      
      Link: https://lkml.kernel.org/r/378c8650-1488-2edf-9647-32a53cf2e21@google.com
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Wang Yugui <wangyugui@e16-tech.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e2e1d407
    • Hugh Dickins's avatar
      mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd · 3306d311
      Hugh Dickins authored
      
      
      page_vma_mapped_walk() cleanup: re-evaluate pmde after taking lock, then
      use it in subsequent tests, instead of repeatedly dereferencing pointer.
      
      Link: https://lkml.kernel.org/r/53fbc9d-891e-46b2-cb4b-468c3b19238e@google.com
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Wang Yugui <wangyugui@e16-tech.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3306d311
    • Hugh Dickins's avatar
      mm: page_vma_mapped_walk(): settle PageHuge on entry · 6d0fd598
      Hugh Dickins authored
      
      
      page_vma_mapped_walk() cleanup: get the hugetlbfs PageHuge case out of
      the way at the start, so no need to worry about it later.
      
      Link: https://lkml.kernel.org/r/e31a483c-6d73-a6bb-26c5-43c3b880a2@google.com
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Wang Yugui <wangyugui@e16-tech.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6d0fd598