Skip to content
  1. Sep 16, 2020
    • Andrew Scull's avatar
      KVM: arm64: nVHE: Fix pointers during SMCCC convertion · a071261d
      Andrew Scull authored
      
      
      The host need not concern itself with the pointer differences for the
      hyp interfaces that are shared between VHE and nVHE so leave it to the
      hyp to handle.
      
      As the SMCCC function IDs are converted into function calls, it is a
      suitable place to also convert any pointer arguments into hyp pointers.
      This, additionally, eases the reuse of the handlers in different
      contexts.
      
      Signed-off-by: default avatarAndrew Scull <ascull@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20200915104643.2543892-20-ascull@google.com
      a071261d
    • Andrew Scull's avatar
      KVM: arm64: nVHE: Migrate hyp-init to SMCCC · 04e4caa8
      Andrew Scull authored
      
      
      To complete the transition to SMCCC, the hyp initialization is given a
      function ID. This looks neater than comparing the hyp stub function IDs
      to the page table physical address.
      
      Some care is taken to only clobber x0-3 before the host context is saved
      as only those registers can be clobbered accoring to SMCCC. Fortunately,
      only a few acrobatics are needed. The possible new tpidr_el2 is moved to
      the argument in x2 so that it can be stashed in tpidr_el2 early to free
      up a scratch register. The page table configuration then makes use of
      x0-2.
      
      Signed-off-by: default avatarAndrew Scull <ascull@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20200915104643.2543892-19-ascull@google.com
      04e4caa8
    • Andrew Scull's avatar
      KVM: arm64: nVHE: Migrate hyp interface to SMCCC · 05469831
      Andrew Scull authored
      
      
      Rather than passing arbitrary function pointers to run at hyp, define
      and equivalent set of SMCCC functions.
      
      Since the SMCCC functions are strongly tied to the original function
      prototypes, it is not expected for the host to ever call an invalid ID
      but a warning is raised if this does ever occur.
      
      As __kvm_vcpu_run is used for every switch between the host and a guest,
      it is explicitly singled out to be identified before the other function
      IDs to improve the performance of the hot path.
      
      Signed-off-by: default avatarAndrew Scull <ascull@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20200915104643.2543892-18-ascull@google.com
      05469831
    • Andrew Scull's avatar
      smccc: Use separate variables for args and results · 0794a974
      Andrew Scull authored
      
      
      Using the same register-bound variable for both arguments and results
      means these values share a type. That type must allow the arguments to
      be assigned to it and must also be assignable to the unsigned long
      fields of struct arm_smccc_res.
      
      This restriction on types causes compiler warnings when the argument
      cannot be implicitly assigned to an unsigned long, for example the
      pointers that are used in the KVM hyp interface.
      
      By separating the arguments and results into their own variables, the
      type constraint is lifted allowing the arguments to avoid the need for
      any type conversion.
      
      Signed-off-by: default avatarAndrew Scull <ascull@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Cc: Sudeep Holla <sudeep.holla@arm.com>
      Link: https://lore.kernel.org/r/20200915104643.2543892-17-ascull@google.com
      0794a974
    • Andrew Scull's avatar
      smccc: Define vendor hyp owned service call region · cf650168
      Andrew Scull authored
      
      
      Vendor specific hypervisor services have their own region of function
      identifiers reserved by SMCCC. Extend the list of owners to include this
      case.
      
      Signed-off-by: default avatarAndrew Scull <ascull@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Cc: Sudeep Holla <sudeep.holla@arm.com>
      Link: https://lore.kernel.org/r/20200915104643.2543892-16-ascull@google.com
      cf650168
    • Andrew Scull's avatar
      KVM: arm64: nVHE: Pass pointers consistently to hyp-init · 5dc33bd1
      Andrew Scull authored
      
      
      Rather than some being kernel pointer and others being hyp pointers,
      standardize on all pointers being hyp pointers.
      
      Signed-off-by: default avatarAndrew Scull <ascull@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20200915104643.2543892-15-ascull@google.com
      5dc33bd1
    • Andrew Scull's avatar
      KVM: arm64: nVHE: Handle hyp panics · a2e102e2
      Andrew Scull authored
      
      
      Restore the host context when panicking from hyp to give the best chance
      of the panic being clean.
      
      The host requires that registers be preserved such as x18 for the shadow
      callstack. If the panic is caused by an exception from EL1, the host
      context is still valid so the panic can return straight back to the
      host. If the panic comes from EL2 then it's most likely that the hyp
      context is active and the host context needs to be restored.
      
      There are windows before and after the host context is saved and
      restored that restoration is attempted incorrectly and the panic won't
      be clean.
      
      Signed-off-by: default avatarAndrew Scull <ascull@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20200915104643.2543892-14-ascull@google.com
      a2e102e2
    • Andrew Scull's avatar
      KVM: arm64: nVHE: Switch to hyp context for EL2 · 4e3393a9
      Andrew Scull authored
      
      
      Save and restore the host context when switching to and from hyp. This
      gives hyp its own context that the host will not see as a step towards a
      full trust boundary between the two.
      
      SP_EL0 and pointer authentication keys are currently shared between the
      host and hyp so don't need to be switched yet.
      
      Signed-off-by: default avatarAndrew Scull <ascull@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20200915104643.2543892-13-ascull@google.com
      4e3393a9
    • Andrew Scull's avatar
      KVM: arm64: Share context save and restore macros · 603d2bda
      Andrew Scull authored
      
      
      To avoid duplicating the context save and restore macros, move them into
      a shareable header.
      
      Signed-off-by: default avatarAndrew Scull <ascull@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20200915104643.2543892-12-ascull@google.com
      603d2bda
    • Andrew Scull's avatar
      KVM: arm64: Restore hyp when panicking in guest context · 7db21530
      Andrew Scull authored
      
      
      If the guest context is loaded when a panic is triggered, restore the
      hyp context so e.g. the shadow call stack works when hyp_panic() is
      called and SP_EL0 is valid when the host's panic() is called.
      
      Use the hyp context's __hyp_running_vcpu field to track when hyp
      transitions to and from the guest vcpu so the exception handlers know
      whether the context needs to be restored.
      
      Signed-off-by: default avatarAndrew Scull <ascull@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20200915104643.2543892-11-ascull@google.com
      7db21530
    • Andrew Scull's avatar
      KVM: arm64: Update context references from host to hyp · 7c2e76d8
      Andrew Scull authored
      
      
      Hyp now has its own nominal context for saving and restoring its state
      when switching to and from a guest. Update the related comments and
      utilities to match the new name.
      
      Signed-off-by: default avatarAndrew Scull <ascull@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20200915104643.2543892-10-ascull@google.com
      7c2e76d8
    • Andrew Scull's avatar
      KVM: arm64: Introduce hyp context · b619d9aa
      Andrew Scull authored
      
      
      During __guest_enter, save and restore from a new hyp context rather
      than the host context. This is preparation for separation of the hyp and
      host context in nVHE.
      
      Signed-off-by: default avatarAndrew Scull <ascull@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20200915104643.2543892-9-ascull@google.com
      b619d9aa
    • Andrew Scull's avatar
      KVM: arm64: nVHE: Don't consume host SErrors with ESB · 472fc011
      Andrew Scull authored
      
      
      The ESB at the start of the host vector may cause SErrors to be consumed
      to DISR_EL1. However, this is not checked for the host so the SError
      could go unhandled.
      
      Remove the ESB so that SErrors are not consumed but are instead left
      pending for the host to consume. __guest_enter already defers entry into
      a guest if there are any SErrors pending.
      
      Signed-off-by: default avatarAndrew Scull <ascull@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Link: https://lore.kernel.org/r/20200915104643.2543892-8-ascull@google.com
      472fc011
    • Andrew Scull's avatar
      KVM: arm64: nVHE: Use separate vector for the host · 6e3bfbb2
      Andrew Scull authored
      
      
      The host is treated differently from the guests when an exception is
      taken so introduce a separate vector that is specialized for the host.
      This also allows the nVHE specific code to move out of hyp-entry.S and
      into nvhe/host.S.
      
      The host is only expected to make HVC calls and anything else is
      considered invalid and results in a panic.
      
      Hyp initialization is now passed the vector that is used for the host
      and it is swapped for the guest vector during the context switch.
      
      Signed-off-by: default avatarAndrew Scull <ascull@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20200915104643.2543892-7-ascull@google.com
      6e3bfbb2
    • Andrew Scull's avatar
      KVM: arm64: Save chosen hyp vector to a percpu variable · a0e47952
      Andrew Scull authored
      
      
      Introduce a percpu variable to hold the address of the selected hyp
      vector that will be used with guests. This avoids the selection process
      each time a guest is being entered and can be used by nVHE when a
      separate vector is introduced for the host.
      
      Signed-off-by: default avatarAndrew Scull <ascull@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20200915104643.2543892-6-ascull@google.com
      a0e47952
    • Andrew Scull's avatar
      KVM: arm64: Choose hyp symbol based on context · ceee2fe4
      Andrew Scull authored
      
      
      Make CHOOSE_HYP_SYM select the symbol of the active hypervisor for the
      host, the nVHE symbol for nVHE and the VHE symbol for VHE. The nVHE and
      VHE hypervisors see their own symbols without prefixes and trigger a
      link error when trying to use a symbol of the other hypervisor.
      
      Signed-off-by: default avatarAndrew Scull <ascull@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Cc: David Brazdil <dbrazdil@google.com>
      Link: https://lore.kernel.org/r/20200915104643.2543892-5-ascull@google.com
      ceee2fe4
    • Andrew Scull's avatar
      KVM: arm64: Remove kvm_host_data_t typedef · d7ca1079
      Andrew Scull authored
      
      
      The kvm_host_data_t typedef is used inconsistently and goes against the
      kernel's coding style. Remove it in favour of the full struct specifier.
      
      Signed-off-by: default avatarAndrew Scull <ascull@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20200915104643.2543892-4-ascull@google.com
      d7ca1079
    • Andrew Scull's avatar
      KVM: arm64: Remove hyp_panic arguments · 6a0259ed
      Andrew Scull authored
      
      
      hyp_panic is able to find all the context it needs from within itself so
      remove the argument. The __hyp_panic wrapper becomes redundant so is
      also removed.
      
      Signed-off-by: default avatarAndrew Scull <ascull@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20200915104643.2543892-3-ascull@google.com
      6a0259ed
    • Andrew Scull's avatar
      KVM: arm64: Remove __activate_vm wrapper · 501a67a2
      Andrew Scull authored
      
      
      The __activate_vm wrapper serves no useful function and has a misleading
      name as it simply calls __load_guest_stage2 and does not touch
      HCR_EL2.VM so remove it.
      
      Also rename __deactivate_vm to __load_host_stage2 to match naming
      pattern.
      
      Signed-off-by: default avatarAndrew Scull <ascull@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20200915104643.2543892-2-ascull@google.com
      501a67a2
  2. Sep 07, 2020
    • Linus Torvalds's avatar
      Linux 5.9-rc4 · f4d51dff
      Linus Torvalds authored
      f4d51dff
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.9-2020-09-06' of git://git.kernel.dk/linux-block · a8205e31
      Linus Torvalds authored
      Pull more io_uring fixes from Jens Axboe:
       "Two followup fixes. One is fixing a regression from this merge window,
        the other is two commits fixing cancelation of deferred requests.
      
        Both have gone through full testing, and both spawned a few new
        regression test additions to liburing.
      
         - Don't play games with const, properly store the output iovec and
           assign it as needed.
      
         - Deferred request cancelation fix (Pavel)"
      
      * tag 'io_uring-5.9-2020-09-06' of git://git.kernel.dk/linux-block:
        io_uring: fix linked deferred ->files cancellation
        io_uring: fix cancel of deferred reqs with ->files
        io_uring: fix explicit async read/write mapping for large segments
      a8205e31
    • Linus Torvalds's avatar
      Merge tag 'iommu-fixes-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 2ccdd9f8
      Linus Torvalds authored
      Pull iommu fixes from Joerg Roedel:
      
       - three Intel VT-d fixes to fix address handling on 32bit, fix a NULL
         pointer dereference bug and serialize a hardware register access as
         required by the VT-d spec.
      
       - two patches for AMD IOMMU to force AMD GPUs into translation mode
         when memory encryption is active and disallow using IOMMUv2
         functionality.  This makes the AMDGPU driver work when memory
         encryption is active.
      
       - two more fixes for AMD IOMMU to fix updating the Interrupt Remapping
         Table Entries.
      
       - MAINTAINERS file update for the Qualcom IOMMU driver.
      
      * tag 'iommu-fixes-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu/vt-d: Handle 36bit addressing for x86-32
        iommu/amd: Do not use IOMMUv2 functionality when SME is active
        iommu/amd: Do not force direct mapping when SME is active
        iommu/amd: Use cmpxchg_double() when updating 128-bit IRTE
        iommu/amd: Restore IRTE.RemapEn bit after programming IRTE
        iommu/vt-d: Fix NULL pointer dereference in dev_iommu_priv_set()
        iommu/vt-d: Serialize IOMMU GCMD register modifications
        MAINTAINERS: Update QUALCOMM IOMMU after Arm SMMU drivers move
      2ccdd9f8
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2020-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 015b3155
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
      
       - more generic entry code ABI fallout
      
       - debug register handling bugfixes
      
       - fix vmalloc mappings on 32-bit kernels
      
       - kprobes instrumentation output fix on 32-bit kernels
      
       - fix over-eager WARN_ON_ONCE() on !SMAP hardware
      
       - NUMA debugging fix
      
       - fix Clang related crash on !RETPOLINE kernels
      
      * tag 'x86-urgent-2020-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/entry: Unbreak 32bit fast syscall
        x86/debug: Allow a single level of #DB recursion
        x86/entry: Fix AC assertion
        tracing/kprobes, x86/ptrace: Fix regs argument order for i386
        x86, fakenuma: Fix invalid starting node ID
        x86/mm/32: Bring back vmalloc faulting on x86_32
        x86/cmdline: Disable jump tables for cmdline.c
      015b3155
    • Linus Torvalds's avatar
      Merge tag 'for-linus-5.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 68beef57
      Linus Torvalds authored
      Pull xen updates from Juergen Gross:
       "A small series for fixing a problem with Xen PVH guests when running
        as backends (e.g. as dom0).
      
        Mapping other guests' memory is now working via ZONE_DEVICE, thus not
        requiring to abuse the memory hotplug functionality for that purpose"
      
      * tag 'for-linus-5.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen: add helpers to allocate unpopulated memory
        memremap: rename MEMORY_DEVICE_DEVDAX to MEMORY_DEVICE_GENERIC
        xen/balloon: add header guard
      68beef57
  3. Sep 06, 2020
    • Pavel Begunkov's avatar
      io_uring: fix linked deferred ->files cancellation · c127a2a1
      Pavel Begunkov authored
      
      
      While looking for ->files in ->defer_list, consider that requests there
      may actually be links.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c127a2a1
    • Pavel Begunkov's avatar
      io_uring: fix cancel of deferred reqs with ->files · b7ddce3c
      Pavel Begunkov authored
      
      
      While trying to cancel requests with ->files, it also should look for
      requests in ->defer_list, otherwise it might end up hanging a thread.
      
      Cancel all requests in ->defer_list up to the last request there with
      matching ->files, that's needed to follow drain ordering semantics.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b7ddce3c
    • Linus Torvalds's avatar
      Merge tags 'auxdisplay-for-linus-v5.9-rc4', 'clang-format-for-linus-v5.9-rc4'... · dd9fb9bb
      Linus Torvalds authored
      Merge tags 'auxdisplay-for-linus-v5.9-rc4', 'clang-format-for-linus-v5.9-rc4' and 'compiler-attributes-for-linus-v5.9-rc4' of git://github.com/ojeda/linux
      
      Pull misc fixes from Miguel Ojeda:
       "A trivial patch for auxdisplay:
      
         - Replace HTTP links with HTTPS ones (Alexander A. Klimov)
      
        The usual clang-format trivial update:
      
         - Update with the latest for_each macro list (Miguel Ojeda)
      
        And Luc requested me to pick a sparse fix on my queue, so here it goes
        along with other two trivial Compiler Attributes ones (also from Luc).
      
         - sparse: use static inline for __chk_{user,io}_ptr() (Luc Van
           Oostenryck)
      
         - Compiler Attributes: fix comment concerning GCC 4.6 (Luc Van
           Oostenryck)
      
         - Compiler Attributes: remove comment about sparse not supporting
           __has_attribute (Luc Van Oostenryck)"
      
      * tag 'auxdisplay-for-linus-v5.9-rc4' of git://github.com/ojeda/linux:
        auxdisplay: Replace HTTP links with HTTPS ones
      
      * tag 'clang-format-for-linus-v5.9-rc4' of git://github.com/ojeda/linux:
        clang-format: Update with the latest for_each macro list
      
      * tag 'compiler-attributes-for-linus-v5.9-rc4' of git://github.com/ojeda/linux:
        sparse: use static inline for __chk_{user,io}_ptr()
        Compiler Attributes: fix comment concerning GCC 4.6
        Compiler Attributes: remove comment about sparse not supporting __has_attribute
      dd9fb9bb
    • Linus Torvalds's avatar
      Merge tag 'arc-5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc · 70187f77
      Linus Torvalds authored
      Pull ARC fixes from Vineet Gupta:
      
       - HSDK-4xd Dev system: perf driver updates for sampling interrupt
      
       - HSDK* Dev System: Ethernet broken [Evgeniy Didin]
      
       - HIGHMEM broken (2 memory banks) [Mike Rapoport]
      
       - show_regs() rewrite once and for all
      
       - Other minor fixes
      
      * tag 'arc-5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
        ARC: [plat-hsdk]: Switch ethernet phy-mode to rgmii-id
        arc: fix memory initialization for systems with two memory banks
        irqchip/eznps: Fix build error for !ARC700 builds
        ARC: show_regs: fix r12 printing and simplify
        ARC: HSDK: wireup perf irq
        ARC: perf: don't bail setup if pct irq missing in device-tree
        ARC: pgalloc.h: delete a duplicated word + other fixes
      70187f77
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 7514c036
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "19 patches.
      
        Subsystems affected by this patch series: MAINTAINERS, ipc, fork,
        checkpatch, lib, and mm (memcg, slub, pagemap, madvise, migration,
        hugetlb)"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        include/linux/log2.h: add missing () around n in roundup_pow_of_two()
        mm/khugepaged.c: fix khugepaged's request size in collapse_file
        mm/hugetlb: fix a race between hugetlb sysctl handlers
        mm/hugetlb: try preferred node first when alloc gigantic page from cma
        mm/migrate: preserve soft dirty in remove_migration_pte()
        mm/migrate: remove unnecessary is_zone_device_page() check
        mm/rmap: fixup copying of soft dirty and uffd ptes
        mm/migrate: fixup setting UFFD_WP flag
        mm: madvise: fix vma user-after-free
        checkpatch: fix the usage of capture group ( ... )
        fork: adjust sysctl_max_threads definition to match prototype
        ipc: adjust proc_ipc_sem_dointvec definition to match prototype
        mm: track page table modifications in __apply_to_page_range()
        MAINTAINERS: IA64: mark Status as Odd Fixes only
        MAINTAINERS: add LLVM maintainers
        MAINTAINERS: update Cavium/Marvell entries
        mm: slub: fix conversion of freelist_corrupted()
        mm: memcg: fix memcg reclaim soft lockup
        memcg: fix use-after-free in uncharge_batch
      7514c036
    • Jason Gunthorpe's avatar
      include/linux/log2.h: add missing () around n in roundup_pow_of_two() · 428fc0af
      Jason Gunthorpe authored
      Otherwise gcc generates warnings if the expression is complicated.
      
      Fixes: 312a0c17
      
       ("[PATCH] LOG2: Alter roundup_pow_of_two() so that it can use a ilog2() on a constant")
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Link: https://lkml.kernel.org/r/0-v1-8a2697e3c003+41165-log_brackets_jgg@nvidia.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      428fc0af
    • David Howells's avatar
      mm/khugepaged.c: fix khugepaged's request size in collapse_file · e5a59d30
      David Howells authored
      collapse_file() in khugepaged passes PAGE_SIZE as the number of pages to
      be read to page_cache_sync_readahead().  The intent was probably to read
      a single page.  Fix it to use the number of pages to the end of the
      window instead.
      
      Fixes: 99cb0dbd
      
       ("mm,thp: add read-only THP support for (non-shmem) FS")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Acked-by: default avatarYang Shi <shy828301@gmail.com>
      Acked-by: default avatarPankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Link: https://lkml.kernel.org/r/20200903140844.14194-2-willy@infradead.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e5a59d30
    • Muchun Song's avatar
      mm/hugetlb: fix a race between hugetlb sysctl handlers · 17743798
      Muchun Song authored
      There is a race between the assignment of `table->data` and write value
      to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
      the other thread.
      
        CPU0:                                 CPU1:
                                              proc_sys_write
        hugetlb_sysctl_handler                  proc_sys_call_handler
        hugetlb_sysctl_handler_common             hugetlb_sysctl_handler
          table->data = &tmp;                       hugetlb_sysctl_handler_common
                                                      table->data = &tmp;
            proc_doulongvec_minmax
              do_proc_doulongvec_minmax           sysctl_head_finish
                __do_proc_doulongvec_minmax         unuse_table
                  i = table->data;
                  *i = val;  // corrupt CPU1's stack
      
      Fix this by duplicating the `table`, and only update the duplicate of
      it.  And introduce a helper of proc_hugetlb_doulongvec_minmax() to
      simplify the code.
      
      The following oops was seen:
      
          BUG: kernel NULL pointer dereference, address: 0000000000000000
          #PF: supervisor instruction fetch in kernel mode
          #PF: error_code(0x0010) - not-present page
          Code: Bad RIP value.
          ...
          Call Trace:
           ? set_max_huge_pages+0x3da/0x4f0
           ? alloc_pool_huge_page+0x150/0x150
           ? proc_doulongvec_minmax+0x46/0x60
           ? hugetlb_sysctl_handler_common+0x1c7/0x200
           ? nr_hugepages_store+0x20/0x20
           ? copy_fd_bitmaps+0x170/0x170
           ? hugetlb_sysctl_handler+0x1e/0x20
           ? proc_sys_call_handler+0x2f1/0x300
           ? unregister_sysctl_table+0xb0/0xb0
           ? __fd_install+0x78/0x100
           ? proc_sys_write+0x14/0x20
           ? __vfs_write+0x4d/0x90
           ? vfs_write+0xef/0x240
           ? ksys_write+0xc0/0x160
           ? __ia32_sys_read+0x50/0x50
           ? __close_fd+0x129/0x150
           ? __x64_sys_write+0x43/0x50
           ? do_syscall_64+0x6c/0x200
           ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: e5ff2159
      
       ("hugetlb: multiple hstates for multiple page sizes")
      Signed-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Link: http://lkml.kernel.org/r/20200828031146.43035-1-songmuchun@bytedance.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      17743798
    • Li Xinhai's avatar
      mm/hugetlb: try preferred node first when alloc gigantic page from cma · 953f064a
      Li Xinhai authored
      Since commit cf11e85f ("mm: hugetlb: optionally allocate gigantic
      hugepages using cma"), the gigantic page would be allocated from node
      which is not the preferred node, although there are pages available from
      that node.  The reason is that the nid parameter has been ignored in
      alloc_gigantic_page().
      
      Besides, the __GFP_THISNODE also need be checked if user required to
      alloc only from the preferred node.
      
      After this patch, the preferred node is tried first before other allowed
      nodes, and don't try to allocate from other nodes if __GFP_THISNODE is
      specified.  If user don't specify the preferred node, the current node
      will be used as preferred node, which makes sure consistent behavior of
      allocating gigantic and non-gigantic hugetlb page.
      
      Fixes: cf11e85f
      
       ("mm: hugetlb: optionally allocate gigantic hugepages using cma")
      Signed-off-by: default avatarLi Xinhai <lixinhai.lxh@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Roman Gushchin <guro@fb.com>
      Link: https://lkml.kernel.org/r/20200902025016.697260-1-lixinhai.lxh@gmail.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      953f064a
    • Ralph Campbell's avatar
      mm/migrate: preserve soft dirty in remove_migration_pte() · 3d321bf8
      Ralph Campbell authored
      
      
      The code to remove a migration PTE and replace it with a device private
      PTE was not copying the soft dirty bit from the migration entry.  This
      could lead to page contents not being marked dirty when faulting the page
      back from device private memory.
      
      Signed-off-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Bharata B Rao <bharata@linux.ibm.com>
      Link: https://lkml.kernel.org/r/20200831212222.22409-3-rcampbell@nvidia.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3d321bf8
    • Ralph Campbell's avatar
      mm/migrate: remove unnecessary is_zone_device_page() check · 6128763f
      Ralph Campbell authored
      
      
      Patch series "mm/migrate: preserve soft dirty in remove_migration_pte()".
      
      I happened to notice this from code inspection after seeing Alistair
      Popple's patch ("mm/rmap: Fixup copying of soft dirty and uffd ptes").
      
      This patch (of 2):
      
      The check for is_zone_device_page() and is_device_private_page() is
      unnecessary since the latter is sufficient to determine if the page is a
      device private page.  Simplify the code for easier reading.
      
      Signed-off-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Bharata B Rao <bharata@linux.ibm.com>
      Link: https://lkml.kernel.org/r/20200831212222.22409-1-rcampbell@nvidia.com
      Link: https://lkml.kernel.org/r/20200831212222.22409-2-rcampbell@nvidia.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6128763f
    • Alistair Popple's avatar
      mm/rmap: fixup copying of soft dirty and uffd ptes · ad7df764
      Alistair Popple authored
      During memory migration a pte is temporarily replaced with a migration
      swap pte.  Some pte bits from the existing mapping such as the soft-dirty
      and uffd write-protect bits are preserved by copying these to the
      temporary migration swap pte.
      
      However these bits are not stored at the same location for swap and
      non-swap ptes.  Therefore testing these bits requires using the
      appropriate helper function for the given pte type.
      
      Unfortunately several code locations were found where the wrong helper
      function is being used to test soft_dirty and uffd_wp bits which leads to
      them getting incorrectly set or cleared during page-migration.
      
      Fix these by using the correct tests based on pte type.
      
      Fixes: a5430dda ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
      Fixes: 8c3328f1 ("mm/migrate: migrate_vma() unmap page from vma while collecting pages")
      Fixes: f45ec5ff
      
       ("userfaultfd: wp: support swap and page migration")
      Signed-off-by: default avatarAlistair Popple <alistair@popple.id.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Alistair Popple <alistair@popple.id.au>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20200825064232.10023-2-alistair@popple.id.au
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ad7df764
    • Alistair Popple's avatar
      mm/migrate: fixup setting UFFD_WP flag · ebdf8321
      Alistair Popple authored
      Commit f45ec5ff ("userfaultfd: wp: support swap and page migration")
      introduced support for tracking the uffd wp bit during page migration.
      However the non-swap PTE variant was used to set the flag for zone device
      private pages which are a type of swap page.
      
      This leads to corruption of the swap offset if the original PTE has the
      uffd_wp flag set.
      
      Fixes: f45ec5ff
      
       ("userfaultfd: wp: support swap and page migration")
      Signed-off-by: default avatarAlistair Popple <alistair@popple.id.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Link: https://lkml.kernel.org/r/20200825064232.10023-1-alistair@popple.id.au
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ebdf8321
    • Yang Shi's avatar
      mm: madvise: fix vma user-after-free · 7867fd7c
      Yang Shi authored
      The syzbot reported the below use-after-free:
      
        BUG: KASAN: use-after-free in madvise_willneed mm/madvise.c:293 [inline]
        BUG: KASAN: use-after-free in madvise_vma mm/madvise.c:942 [inline]
        BUG: KASAN: use-after-free in do_madvise.part.0+0x1c8b/0x1cf0 mm/madvise.c:1145
        Read of size 8 at addr ffff8880a6163eb0 by task syz-executor.0/9996
      
        CPU: 0 PID: 9996 Comm: syz-executor.0 Not tainted 5.9.0-rc1-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        Call Trace:
          __dump_stack lib/dump_stack.c:77 [inline]
          dump_stack+0x18f/0x20d lib/dump_stack.c:118
          print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
          __kasan_report mm/kasan/report.c:513 [inline]
          kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
          madvise_willneed mm/madvise.c:293 [inline]
          madvise_vma mm/madvise.c:942 [inline]
          do_madvise.part.0+0x1c8b/0x1cf0 mm/madvise.c:1145
          do_madvise mm/madvise.c:1169 [inline]
          __do_sys_madvise mm/madvise.c:1171 [inline]
          __se_sys_madvise mm/madvise.c:1169 [inline]
          __x64_sys_madvise+0xd9/0x110 mm/madvise.c:1169
          do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        Allocated by task 9992:
          kmem_cache_alloc+0x138/0x3a0 mm/slab.c:3482
          vm_area_alloc+0x1c/0x110 kernel/fork.c:347
          mmap_region+0x8e5/0x1780 mm/mmap.c:1743
          do_mmap+0xcf9/0x11d0 mm/mmap.c:1545
          vm_mmap_pgoff+0x195/0x200 mm/util.c:506
          ksys_mmap_pgoff+0x43a/0x560 mm/mmap.c:1596
          do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        Freed by task 9992:
          kmem_cache_free.part.0+0x67/0x1f0 mm/slab.c:3693
          remove_vma+0x132/0x170 mm/mmap.c:184
          remove_vma_list mm/mmap.c:2613 [inline]
          __do_munmap+0x743/0x1170 mm/mmap.c:2869
          do_munmap mm/mmap.c:2877 [inline]
          mmap_region+0x257/0x1780 mm/mmap.c:1716
          do_mmap+0xcf9/0x11d0 mm/mmap.c:1545
          vm_mmap_pgoff+0x195/0x200 mm/util.c:506
          ksys_mmap_pgoff+0x43a/0x560 mm/mmap.c:1596
          do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      It is because vma is accessed after releasing mmap_lock, but someone
      else acquired the mmap_lock and the vma is gone.
      
      Releasing mmap_lock after accessing vma should fix the problem.
      
      Fixes: 692fe624
      
       ("mm: Handle MADV_WILLNEED through vfs_fadvise()")
      Reported-by: default avatar <syzbot+b90df26038d1d5d85c97@syzkaller.appspotmail.com>
      Signed-off-by: default avatarYang Shi <shy828301@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: <stable@vger.kernel.org>	[5.4+]
      Link: https://lkml.kernel.org/r/20200816141204.162624-1-shy828301@gmail.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7867fd7c
    • Mrinal Pandey's avatar
      checkpatch: fix the usage of capture group ( ... ) · 13e45417
      Mrinal Pandey authored
      The usage of "capture group (...)" in the immediate condition after `&&`
      results in `$1` being uninitialized.  This issues a warning "Use of
      uninitialized value $1 in regexp compilation at ./scripts/checkpatch.pl
      line 2638".
      
      I noticed this bug while running checkpatch on the set of commits from
      v5.7 to v5.8-rc1 of the kernel on the commits with a diff content in
      their commit message.
      
      This bug was introduced in the script by commit e518e9a5
      ("checkpatch: emit an error when there's a diff in a changelog").  It
      has been in the script since then.
      
      The author intended to store the match made by capture group in variable
      `$1`.  This should have contained the name of the file as `[\w/]+`
      matched.  However, this couldn't be accomplished due to usage of capture
      group and `$1` in the same regular expression.
      
      Fix this by placing the capture group in the condition before `&&`.
      Thus, `$1` can be initialized to the text that capture group matches
      thereby setting it to the desired and required value.
      
      Fixes: e518e9a5
      
       ("checkpatch: emit an error when there's a diff in a changelog")
      Signed-off-by: default avatarMrinal Pandey <mrinalmni@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: default avatarLukas Bulwahn <lukas.bulwahn@gmail.com>
      Reviewed-by: default avatarLukas Bulwahn <lukas.bulwahn@gmail.com>
      Cc: Joe Perches <joe@perches.com>
      Link: https://lkml.kernel.org/r/20200714032352.f476hanaj2dlmiot@mrinalpandey
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      13e45417
    • Tobias Klauser's avatar
      fork: adjust sysctl_max_threads definition to match prototype · b0daa2c7
      Tobias Klauser authored
      Commit 32927393 ("sysctl: pass kernel pointers to ->proc_handler")
      changed ctl_table.proc_handler to take a kernel pointer.  Adjust the
      definition of sysctl_max_threads to match its prototype in
      linux/sysctl.h which fixes the following sparse error/warning:
      
        kernel/fork.c:3050:47: warning: incorrect type in argument 3 (different address spaces)
        kernel/fork.c:3050:47:    expected void *
        kernel/fork.c:3050:47:    got void [noderef] __user *buffer
        kernel/fork.c:3036:5: error: symbol 'sysctl_max_threads' redeclared with different type (incompatible argument 3 (different address spaces)):
        kernel/fork.c:3036:5:    int extern [addressable] [signed] [toplevel] sysctl_max_threads( ... )
        kernel/fork.c: note: in included file (through include/linux/key.h, include/linux/cred.h, include/linux/sched/signal.h, include/linux/sched/cputime.h):
        include/linux/sysctl.h:242:5: note: previously declared as:
        include/linux/sysctl.h:242:5:    int extern [addressable] [signed] [toplevel] sysctl_max_threads( ... )
      
      Fixes: 32927393
      
       ("sysctl: pass kernel pointers to ->proc_handler")
      Signed-off-by: default avatarTobias Klauser <tklauser@distanz.ch>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Link: https://lkml.kernel.org/r/20200825093647.24263-1-tklauser@distanz.ch
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b0daa2c7