Skip to content
  1. May 12, 2017
    • Paul Mackerras's avatar
      KVM: PPC: Book3S PR: Don't include SPAPR TCE code on non-pseries platforms · 76d837a4
      Paul Mackerras authored
      Commit e91aa8e6
      
       ("KVM: PPC: Enable IOMMU_API for KVM_BOOK3S_64
      permanently", 2017-03-22) enabled the SPAPR TCE code for all 64-bit
      Book 3S kernel configurations in order to simplify the code and
      reduce #ifdefs.  However, 64-bit Book 3S PPC platforms other than
      pseries and powernv don't implement the necessary IOMMU callbacks,
      leading to build failures like the following (for a pasemi config):
      
      scripts/kconfig/conf  --silentoldconfig Kconfig
      warning: (KVM_BOOK3S_64) selects SPAPR_TCE_IOMMU which has unmet direct dependencies (IOMMU_SUPPORT && (PPC_POWERNV || PPC_PSERIES))
      
      ...
      
        CC [M]  arch/powerpc/kvm/book3s_64_vio.o
      /home/paulus/kernel/kvm/arch/powerpc/kvm/book3s_64_vio.c: In function ‘kvmppc_clear_tce’:
      /home/paulus/kernel/kvm/arch/powerpc/kvm/book3s_64_vio.c:363:2: error: implicit declaration of function ‘iommu_tce_xchg’ [-Werror=implicit-function-declaration]
        iommu_tce_xchg(tbl, entry, &hpa, &dir);
        ^
      
      To fix this, we make the inclusion of the SPAPR TCE support, and the
      code that uses it in book3s_vio.c and book3s_vio_hv.c, depend on
      the inclusion of support for the pseries and/or powernv platforms.
      This means that when running a 'pseries' guest on those platforms,
      the guest won't have in-kernel acceleration of the PAPR TCE hypercalls,
      but at least now they compile.
      
      Reviewed-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      76d837a4
    • Paul Mackerras's avatar
      KVM: PPC: Book3S PR: Check copy_to/from_user return values · 67325e98
      Paul Mackerras authored
      
      
      The PR KVM implementation of the PAPR HPT hypercalls (H_ENTER etc.)
      access an image of the HPT in userspace memory using copy_from_user
      and copy_to_user.  Recently, the declarations of those functions were
      annotated to indicate that the return value must be checked.  Since
      this code doesn't currently check the return value, this causes
      compile warnings like the ones shown below, and since on PPC the
      default is to compile arch/powerpc with -Werror, this causes the
      build to fail.
      
      To fix this, we check the return values, and if non-zero, fail the
      hypercall being processed with a H_FUNCTION error return value.
      There is really no good error return value to use since PAPR didn't
      envisage the possibility that the hypervisor may not be able to access
      the guest's HPT, and H_FUNCTION (function not supported) seems as
      good as any.
      
      The typical compile warnings look like this:
      
        CC      arch/powerpc/kvm/book3s_pr_papr.o
      /home/paulus/kernel/kvm/arch/powerpc/kvm/book3s_pr_papr.c: In function ‘kvmppc_h_pr_enter’:
      /home/paulus/kernel/kvm/arch/powerpc/kvm/book3s_pr_papr.c:53:2: error: ignoring return value of ‘copy_from_user’, declared with attribute warn_unused_result [-Werror=unused-result]
        copy_from_user(pteg, (void __user *)pteg_addr, sizeof(pteg));
        ^
      /home/paulus/kernel/kvm/arch/powerpc/kvm/book3s_pr_papr.c:74:2: error: ignoring return value of ‘copy_to_user’, declared with attribute warn_unused_result [-Werror=unused-result]
        copy_to_user((void __user *)pteg_addr, hpte, HPTE_SIZE);
        ^
      
      ... etc.
      
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      67325e98
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Add radix checks in real-mode hypercall handlers · acde2572
      Paul Mackerras authored
      
      
      POWER9 running a radix guest will take some hypervisor interrupts
      without going to real mode (turning off the MMU).  This means that
      early hypercall handlers may now be called in virtual mode.  Most of
      the handlers work just fine in both modes, but there are some that
      can crash the host if called in virtual mode, notably the TCE (IOMMU)
      hypercalls H_PUT_TCE, H_STUFF_TCE and H_PUT_TCE_INDIRECT.  These
      already have both a real-mode and a virtual-mode version, so we
      arrange for the real-mode version to return H_TOO_HARD for radix
      guests, which will result in the virtual-mode version being called.
      
      The other hypercall which is sensitive to the MMU mode is H_RANDOM.
      It doesn't have a virtual-mode version, so this adds code to enable
      it to be called in either mode.
      
      An alternative solution was considered which would refuse to call any
      of the early hypercall handlers when doing a virtual-mode exit from a
      radix guest.  However, the XICS-on-XIVE code depends on the XICS
      hypercalls being handled early even for virtual-mode exits, because
      the handlers need to be called before the XIVE vCPU state has been
      pulled off the hardware.  Therefore that solution would have become
      quite invasive and complicated, and was rejected in favour of the
      simpler, though less elegant, solution presented here.
      
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Tested-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      acde2572
  2. May 09, 2017
    • Paolo Bonzini's avatar
      Merge tag 'kvm-arm-for-v4.12-round2' of... · 36c344f3
      Paolo Bonzini authored
      Merge tag 'kvm-arm-for-v4.12-round2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
      
      Second round of KVM/ARM Changes for v4.12.
      
      Changes include:
       - A fix related to the 32-bit idmap stub
       - A fix to the bitmask used to deode the operands of an AArch32 CP
         instruction
       - We have moved the files shared between arch/arm/kvm and
         arch/arm64/kvm to virt/kvm/arm
       - We add support for saving/restoring the virtual ITS state to
         userspace
      36c344f3
    • Christoffer Dall's avatar
      KVM: arm/arm64: vgic-its: Cleanup after failed ITT restore · a2b19e6e
      Christoffer Dall authored
      
      
      When failing to restore the ITT for a DTE, we should remove the failed
      device entry from the list and free the object.
      
      We slightly refactor vgic_its_destroy to be able to reuse the now
      separate vgic_its_free_dte() function.
      
      Signed-off-by: default avatarChristoffer Dall <cdall@linaro.org>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      a2b19e6e
    • Christoffer Dall's avatar
      KVM: arm/arm64: Don't call map_resources when restoring ITS tables · 67723c25
      Christoffer Dall authored
      
      
      The only reason we called kvm_vgic_map_resources() when restoring the
      ITS tables was because we wanted to have the KVM iodevs registered in
      the KVM IO bus framework at the time when the ITS was restored such that
      a restored and active device can inject MSIs prior to otherwise calling
      kvm_vgic_map_resources() from the first run of a VCPU.
      
      Since we now register the KVM iodevs for the redestributors and ITS as
      soon as possible (when setting the base addresses), we no longer need
      this call and kvm_vgic_map_resources() is again called only when first
      running a VCPU.
      
      Signed-off-by: default avatarChristoffer Dall <cdall@linaro.org>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      67723c25
    • Christoffer Dall's avatar
      KVM: arm/arm64: Register ITS iodev when setting base address · 30e1b684
      Christoffer Dall authored
      
      
      We have to register the ITS iodevice before running the VM, because in
      migration scenarios, we may be restoring a live device that wishes to
      inject MSIs before the VCPUs have started.
      
      All we need to register the ITS io device is the base address of the
      ITS, so we can simply register that when the base address of the ITS is
      set.
      
        [ Code to fix concurrency issues when setting the ITS base address and
          to fix the undef base address check written by Marc Zyngier ]
      
      Signed-off-by: default avatarChristoffer Dall <cdall@linaro.org>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      30e1b684
    • Marc Zyngier's avatar
      KVM: arm/arm64: Get rid of its->initialized field · 6cc40f27
      Marc Zyngier authored
      
      
      The its->initialized doesn't bring much to the table, and creates
      unnecessary ordering between setting the address and initializing it
      (which amounts to exactly nothing).
      
      Let's kill it altogether, making KVM_DEV_ARM_VGIC_CTRL_INIT the no-op
      it deserves to be.
      
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <cdall@linaro.org>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      6cc40f27
    • Christoffer Dall's avatar
      KVM: arm/arm64: Register iodevs when setting redist base and creating VCPUs · 1aab6f46
      Christoffer Dall authored
      
      
      Instead of waiting with registering KVM iodevs until the first VCPU is
      run, we can actually create the iodevs when the redist base address is
      set.  The only downside is that we must now also check if we need to do
      this for VCPUs which are created after creating the VGIC, because there
      is no enforced ordering between creating the VGIC (and setting its base
      addresses) and creating the VCPUs.
      
      Signed-off-by: default avatarChristoffer Dall <cdall@linaro.org>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      1aab6f46
    • Christoffer Dall's avatar
      KVM: arm/arm64: Slightly rework kvm_vgic_addr · 72030536
      Christoffer Dall authored
      
      
      As we are about to handle setting the address for the redistributor base
      region separately from some of the other base addresses, let's rework
      this function to leave a little more room for being flexible in what
      each type of base address does.
      
      Signed-off-by: default avatarChristoffer Dall <cdall@linaro.org>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      72030536
    • Christoffer Dall's avatar
      KVM: arm/arm64: Make vgic_v3_check_base more broadly usable · 9a746d75
      Christoffer Dall authored
      
      
      As we are about to fiddle with the IO device registration mechanism,
      let's be a little more careful when setting base addresses as early as
      possible.  When setting a base address, we can check that there's
      address space enough for its scope and when the last of the two
      base addresses (dist and redist) get set, we can also check if the
      regions overlap at that time.
      
      This allows us to provide error messages to the user at time when trying
      to set the base address, as opposed to later when trying to run the VM.
      
      To do this,  we make vgic_v3_check_base available in the core vgic-v3
      code as well as in the other parts of the GICv3 code, namely the MMIO
      config code.
      
      We also return true for undefined base addresses so that the function
      can be used before all base addresses are set; all callers already check
      for uninitialized addresses before calling this function.
      
      Signed-off-by: default avatarChristoffer Dall <cdall@linaro.org>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      9a746d75
    • Christoffer Dall's avatar
      KVM: arm/arm64: Refactor vgic_register_redist_iodevs · 7fadcd3a
      Christoffer Dall authored
      
      
      Split out the function to register all the redistributor iodevs into a
      function that handles a single redistributor at a time in preparation
      for being able to call this per VCPU as these get created.
      
      Signed-off-by: default avatarChristoffer Dall <cdall@linaro.org>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      7fadcd3a
    • Christoffer Dall's avatar
      KVM: Add kvm_vcpu_get_idx to get vcpu index in kvm->vcpus · 497d72d8
      Christoffer Dall authored
      
      
      There are occasional needs to use the index of vcpu in the kvm->vcpus
      array to map something related to a VCPU.  For example, unlike the
      vcpu->vcpu_id, the vcpu index is guaranteed to not be sparse across all
      vcpus which is useful when allocating a memory area for each vcpu.
      
      Signed-off-by: default avatarChristoffer Dall <cdall@linaro.org>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      497d72d8
    • Bandan Das's avatar
      nVMX: Advertise PML to L1 hypervisor · 03efce6f
      Bandan Das authored
      
      
      Advertise the PML bit in vmcs12 but don't try to enable
      it in hardware when running L2 since L0 is emulating it. Also,
      preserve L0's settings for PML since it may still
      want to log writes.
      
      Signed-off-by: default avatarBandan Das <bsd@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      03efce6f
    • Bandan Das's avatar
      nVMX: Implement emulated Page Modification Logging · c5f983f6
      Bandan Das authored
      
      
      With EPT A/D enabled, processor access to L2 guest
      paging structures will result in a write violation.
      When this happens, write the GUEST_PHYSICAL_ADDRESS
      to the pml buffer provided by L1 if the access is
      write and the dirty bit is being set.
      
      This patch also adds necessary checks during VMEntry if L1
      has enabled PML. If the PML index overflows, we change the
      exit reason and run L1 to simulate a PML full event.
      
      Signed-off-by: default avatarBandan Das <bsd@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c5f983f6
    • Bandan Das's avatar
      kvm: x86: Add a hook for arch specific dirty logging emulation · bab4165e
      Bandan Das authored
      
      
      When KVM updates accessed/dirty bits, this hook can be used
      to invoke an arch specific function that implements/emulates
      dirty logging such as PML.
      
      Signed-off-by: default avatarBandan Das <bsd@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bab4165e
    • Jim Mattson's avatar
      kvm: nVMX: Validate CR3 target count on nested VM-entry · c7c2c709
      Jim Mattson authored
      
      
      According to the SDM, the CR3-target count must not be greater than
      4. Future processors may support a different number of CR3-target
      values. Software should read the VMX capability MSR IA32_VMX_MISC to
      determine the number of values supported.
      
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c7c2c709
    • Paolo Bonzini's avatar
      Merge branch 'kvm-ppc-next' of... · 4415b335
      Paolo Bonzini authored
      Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
      
      The main thing here is a new implementation of the in-kernel
      XICS interrupt controller emulation for POWER9 machines, from Ben
      Herrenschmidt.
      
      POWER9 has a new interrupt controller called XIVE (eXternal Interrupt
      Virtualization Engine) which is able to deliver interrupts directly
      to guest virtual CPUs in hardware without hypervisor intervention.
      With this new code, the guest still sees the old XICS interface but
      performance is better because the XICS emulation in the host uses the
      XIVE directly rather than going through a XICS emulation in firmware.
      
      Conflicts:
      	arch/powerpc/kernel/cpu_setup_power.S [cherry-picked fix]
      	arch/powerpc/kvm/book3s_xive.c [include asm/debugfs.h]
      4415b335
    • Geliang Tang's avatar
      KVM: set no_llseek in stat_fops_per_vm · 3bed8888
      Geliang Tang authored
      
      
      In vm_stat_get_per_vm_fops and vcpu_stat_get_per_vm_fops, since we
      use nonseekable_open() to open, we should use no_llseek() to seek,
      not generic_file_llseek().
      
      Signed-off-by: default avatarGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3bed8888
    • Christoffer Dall's avatar
      KVM: arm/arm64: vgic: Rename kvm_vgic_vcpu_init to kvm_vgic_vcpu_enable · 443c3a9e
      Christoffer Dall authored
      
      
      This function really doesn't init anything, it enables the CPU
      interface, so name it as such, which gives us the name to use for actual
      init work later on.
      
      Signed-off-by: default avatarChristoffer Dall <cdall@linaro.org>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      443c3a9e
    • Christoffer Dall's avatar
      KVM: arm/arm64: Clarification and relaxation to ITS save/restore ABI · cb9d0434
      Christoffer Dall authored
      
      
      Clarify what is meant by the save/restore ABI only supporting virtual
      physical interrupts.
      
      Relax the requirement of the order that the collection entries are
      written in and be clear that there is no particular ordering enforced.
      
      Some cosmetic changes in the capitalization of ID names to align with
      the GICv3 manual and remove the empty line in the bottom of the patch.
      
      Signed-off-by: default avatarChristoffer Dall <cdall@linaro.org>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      cb9d0434
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-4.12-rc1' of... · 2868b251
      Linus Torvalds authored
      Merge tag 'linux-kselftest-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull kselftest updates from Shuah Khan:
       "This update consists of:
      
         - important fixes for build failures and clean target related
           warnings to address regressions introduced in commit 88baa78d
           ("selftests: remove duplicated all and clean target")
      
         - several minor spelling fixes in and log messages and comment
           blocks.
      
         - Enabling configs for better test coverage in ftrace, vm, and
           cpufreq tests.
      
         - .gitignore changes"
      
      * tag 'linux-kselftest-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (26 commits)
        selftests: x86: add missing executables to .gitignore
        selftests: watchdog: accept multiple params on command line
        selftests: create cpufreq kconfig fragments
        selftests: x86: override clean in lib.mk to fix warnings
        selftests: sync: override clean in lib.mk to fix warnings
        selftests: splice: override clean in lib.mk to fix warnings
        selftests: gpio: fix clean target to remove all generated files and dirs
        selftests: add gpio generated files to .gitignore
        selftests: powerpc: override clean in lib.mk to fix warnings
        selftests: gpio: override clean in lib.mk to fix warnings
        selftests: futex: override clean in lib.mk to fix warnings
        selftests: lib.mk: define CLEAN macro to allow Makefiles to override clean
        selftests: splice: fix clean target to not remove default_file_splice_read.sh
        selftests: gpio: add config fragment for gpio-mockup
        selftests: breakpoints: allow to cross-compile for aarch64/arm64
        selftests/Makefile: Add missed PHONY targets
        selftests/vm/run_vmtests: Fix wrong comment
        selftests/Makefile: Add missed closing `"` in comment
        selftests/vm/run_vmtests: Polish output text
        selftests/timers: fix spelling mistake: "Asynchronous"
        ...
      2868b251
    • Linus Torvalds's avatar
      Merge tag 'trace-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 00d95933
      Linus Torvalds authored
      Pull more tracing updates from Steven Rostedt:
       "These are three simple changes.
      
        The first one is just a switch from using strcpy() to strlcpy().
        Someone thought that it may cause an overflow bug, but since it only
        copies comms into a pre-allocated array of TASK_COMM_LEN, and no comm
        should ever be bigger than that, nor not end with a nul character,
        this change is more of a safety precaution than fixing anything that
        is actually broken.
      
        The other two changes are simply cleaning and optimizing some code"
      
      * tag 'trace-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        ftrace: Simplify ftrace_match_record() even more
        ftrace: Remove an unneeded condition
        tracing: Use strlcpy() instead of strcpy() in __trace_find_cmdline()
      00d95933
    • Linus Torvalds's avatar
      Merge tags 'for-linus' and 'for-next' of... · 3341713c
      Linus Torvalds authored
      Merge tags 'for-linus' and 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
      
      Pull more rdma updates from Doug Ledford:
       "As mentioned in my first pull request, this is the subsequent pull
        requests I had. This is all I have, and in fact this cleans out the
        RDMA subsystem's entire patchworks queue of kernel changes that are
        ready to go (well, it did for the weekend anyway, a few new patches
        are in, but they'll be coming during the -rc cycle).
      
        The first tag contains a single patch that would have conflicted if
        taken from my tree or DaveM's tree as it needed our trees merged to
        come cleanly.
      
        The second tag contains the patch series from Intel plus three other
        stragllers that came in late last week. I took them because it allowed
        me to legitimately claim that the RDMA patchworks queue was, for a
        short time, 100% cleared of all waiting kernel patches, woohoo! :-).
      
        I have it under my for-next tag, so it did get 0day and linux- next
        over the end of last week, and linux-next did show one minor conflict.
      
        Summary:
      
        'for-linus' tag:
         - mlx5/IPoIB fixup patch
      
        'for-next' tag:
         - the hfi1 15 patch set that landed late
         - IPoIB get_link_ksettings which landed late because I asked for a
           respin
         - one late rxe change
         - one -rc worthy fix that's in early"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
        IB/mlx5: Enable IPoIB acceleration
      
      * tag 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
        rxe: expose num_possible_cpus() cnum_comp_vectors
        IB/rxe: Update caller's CRC for RXE_MEM_TYPE_DMA memory type
        IB/hfi1: Clean up on context initialization failure
        IB/hfi1: Fix an assign/ordering issue with shared context IDs
        IB/hfi1: Clean up context initialization
        IB/hfi1: Correctly clear the pkey
        IB/hfi1: Search shared contexts on the opened device, not all devices
        IB/hfi1: Remove atomic operations for SDMA_REQ_HAVE_AHG bit
        IB/hfi1: Use filedata rather than filepointer
        IB/hfi1: Name function prototype parameters
        IB/hfi1: Fix a subcontext memory leak
        IB/hfi1: Return an error on memory allocation failure
        IB/hfi1: Adjust default eager_buffer_size to 8MB
        IB/hfi1: Get rid of divide when setting the tx request header
        IB/hfi1: Fix yield logic in send engine
        IB/hfi1, IB/rdmavt: Move r_adefered to r_lock cache line
        IB/hfi1: Fix checks for Offline transient state
        IB/ipoib: add get_link_ksettings in ethtool
      3341713c
    • Linus Torvalds's avatar
      Merge tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 857f8640
      Linus Torvalds authored
      Pull PCI updates from Bjorn Helgaas:
      
       - add framework for supporting PCIe devices in Endpoint mode (Kishon
         Vijay Abraham I)
      
       - use non-postable PCI config space mappings when possible (Lorenzo
         Pieralisi)
      
       - clean up and unify mmap of PCI BARs (David Woodhouse)
      
       - export and unify Function Level Reset support (Christoph Hellwig)
      
       - avoid FLR for Intel 82579 NICs (Sasha Neftin)
      
       - add pci_request_irq() and pci_free_irq() helpers (Christoph Hellwig)
      
       - short-circuit config access failures for disconnected devices (Keith
         Busch)
      
       - remove D3 sleep delay when possible (Adrian Hunter)
      
       - freeze PME scan before suspending devices (Lukas Wunner)
      
       - stop disabling MSI/MSI-X in pci_device_shutdown() (Prarit Bhargava)
      
       - disable boot interrupt quirk for ASUS M2N-LR (Stefan Assmann)
      
       - add arch-specific alignment control to improve device passthrough by
         avoiding multiple BARs in a page (Yongji Xie)
      
       - add sysfs sriov_drivers_autoprobe to control VF driver binding
         (Bodong Wang)
      
       - allow slots below PCI-to-PCIe "reverse bridges" (Bjorn Helgaas)
      
       - fix crashes when unbinding host controllers that don't support
         removal (Brian Norris)
      
       - add driver for MicroSemi Switchtec management interface (Logan
         Gunthorpe)
      
       - add driver for Faraday Technology FTPCI100 host bridge (Linus
         Walleij)
      
       - add i.MX7D support (Andrey Smirnov)
      
       - use generic MSI support for Aardvark (Thomas Petazzoni)
      
       - make Rockchip driver modular (Brian Norris)
      
       - advertise 128-byte Read Completion Boundary support for Rockchip
         (Shawn Lin)
      
       - advertise PCI_EXP_LNKSTA_SLC for Rockchip root port (Shawn Lin)
      
       - convert atomic_t to refcount_t in HV driver (Elena Reshetova)
      
       - add CPU IRQ affinity in HV driver (K. Y. Srinivasan)
      
       - fix PCI bus removal in HV driver (Long Li)
      
       - add support for ThunderX2 DMA alias topology (Jayachandran C)
      
       - add ThunderX pass2.x 2nd node MCFG quirk (Tomasz Nowicki)
      
       - add ITE 8893 bridge DMA alias quirk (Jarod Wilson)
      
       - restrict Cavium ACS quirk only to CN81xx/CN83xx/CN88xx devices
         (Manish Jaggi)
      
      * tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (146 commits)
        PCI: Don't allow unbinding host controllers that aren't prepared
        ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP
        MAINTAINERS: Add PCI Endpoint maintainer
        Documentation: PCI: Add userguide for PCI endpoint test function
        tools: PCI: Add sample test script to invoke pcitest
        tools: PCI: Add a userspace tool to test PCI endpoint
        Documentation: misc-devices: Add Documentation for pci-endpoint-test driver
        misc: Add host side PCI driver for PCI test function device
        PCI: Add device IDs for DRA74x and DRA72x
        dt-bindings: PCI: dra7xx: Add DT bindings to enable unaligned access
        PCI: dwc: dra7xx: Workaround for errata id i870
        dt-bindings: PCI: dra7xx: Add DT bindings for PCI dra7xx EP mode
        PCI: dwc: dra7xx: Add EP mode support
        PCI: dwc: dra7xx: Facilitate wrapper and MSI interrupts to be enabled independently
        dt-bindings: PCI: Add DT bindings for PCI designware EP mode
        PCI: dwc: designware: Add EP mode support
        Documentation: PCI: Add binding documentation for pci-test endpoint function
        ixgbe: Use pcie_flr() instead of duplicating it
        IB/hfi1: Use pcie_flr() instead of duplicating it
        PCI: imx6: Fix spelling mistake: "contol" -> "control"
        ...
      857f8640
    • Linus Torvalds's avatar
      Merge tag 'tty-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 8f3207c7
      Linus Torvalds authored
      Pull tty/serial updates from Greg KH:
       "Here is the "big" TTY/Serial patch updates for 4.12-rc1
      
        Not a lot of new things here, the normal number of serial driver
        updates and additions, tiny bugs fixed, and some core files split up
        to make future changes a bit easier for Nicolas's "tiny-tty" work.
      
        All of these have been in linux-next for a while"
      
      * tag 'tty-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (62 commits)
        serial: small Makefile reordering
        tty: split job control support into a file of its own
        tty: move baudrate handling code to a file of its own
        console: move console_init() out of tty_io.c
        serial: 8250_early: Add earlycon support for Palmchip UART
        tty: pl011: use "qdf2400_e44" as the earlycon name for QDF2400 E44
        vt: make mouse selection of non-ASCII consistent
        vt: set mouse selection word-chars to gpm's default
        imx-serial: Reduce RX DMA startup latency when opening for reading
        serial: omap: suspend device on probe errors
        serial: omap: fix runtime-pm handling on unbind
        tty: serial: omap: add UPF_BOOT_AUTOCONF flag for DT init
        serial: samsung: Remove useless spinlock
        serial: samsung: Add missing checks for dma_map_single failure
        serial: samsung: Use right device for DMA-mapping calls
        serial: imx: setup DCEDTE early and ensure DCD and RI irqs to be off
        tty: fix comment typo s/repsonsible/responsible/
        tty: amba-pl011: Fix spurious TX interrupts
        serial: xuartps: Enable clocks in the pm disable case also
        serial: core: Re-use struct uart_port {name} field
        ...
      8f3207c7
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · bf5f8946
      Linus Torvalds authored
      Merge more updates from Andrew Morton:
      
       - the rest of MM
      
       - various misc things
      
       - procfs updates
      
       - lib/ updates
      
       - checkpatch updates
      
       - kdump/kexec updates
      
       - add kvmalloc helpers, use them
      
       - time helper updates for Y2038 issues. We're almost ready to remove
         current_fs_time() but that awaits a btrfs merge.
      
       - add tracepoints to DAX
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits)
        drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4
        selftests/vm: add a test for virtual address range mapping
        dax: add tracepoint to dax_insert_mapping()
        dax: add tracepoint to dax_writeback_one()
        dax: add tracepoints to dax_writeback_mapping_range()
        dax: add tracepoints to dax_load_hole()
        dax: add tracepoints to dax_pfn_mkwrite()
        dax: add tracepoints to dax_iomap_pte_fault()
        mtd: nand: nandsim: convert to memalloc_noreclaim_*()
        treewide: convert PF_MEMALLOC manipulations to new helpers
        mm: introduce memalloc_noreclaim_{save,restore}
        mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC
        mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required
        mm/huge_memory.c: use zap_deposited_table() more
        time: delete CURRENT_TIME_SEC and CURRENT_TIME
        gfs2: replace CURRENT_TIME with current_time
        apparmorfs: replace CURRENT_TIME with current_time()
        lustre: replace CURRENT_TIME macro
        fs: ubifs: replace CURRENT_TIME_SEC with current_time
        fs: ufs: use ktime_get_real_ts64() for birthtime
        ...
      bf5f8946
    • Andrew Morton's avatar
      drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4 · 4d2b5bca
      Andrew Morton authored
      
      
        drivers/staging/ccree/ssi_hash.c:1990: error: unknown field 'template_ahash' specified in initializer
        drivers/staging/ccree/ssi_hash.c:1991: error: unknown field 'init' specified in initializer
        drivers/staging/ccree/ssi_hash.c:1991: warning: missing braces around initializer
        drivers/staging/ccree/ssi_hash.c:1991: warning: (near initialization for 'driver_hash[0].<anonymous>.template_ahash')
        drivers/staging/ccree/ssi_hash.c:1992: error: unknown field 'update' specified in initializer
        drivers/staging/ccree/ssi_hash.c:1992: warning: excess elements in union initializer
        drivers/staging/ccree/ssi_hash.c:1992: warning: (near initialization for 'driver_hash[0].<anonymous>')
        drivers/staging/ccree/ssi_hash.c:1993: error: unknown field 'final' specified in initializer
        drivers/staging/ccree/ssi_hash.c:1993: warning: excess elements in union initializer
        drivers/staging/ccree/ssi_hash.c:1993: warning: (near initialization for 'driver_hash[0].<anonymous>')
        ...
      
      gcc-4.4.4 has issues with anon union initializers.  Work around this.
      
      Cc: Gilad Ben-Yossef <gilad@benyossef.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4d2b5bca
    • Anshuman Khandual's avatar
      selftests/vm: add a test for virtual address range mapping · 4e5ce33c
      Anshuman Khandual authored
      This verifies virtual address mapping below and above the 128TB range
      and makes sure that address returned are within the expected range
      depending upon the hint passed from the user space.
      
      Link: http://lkml.kernel.org/r/20170418095252.20533-1-khandual@linux.vnet.ibm.com
      
      
      Signed-off-by: default avatarAnshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Michal Suchanek <msuchanek@suse.de>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Shuah Khan <shuahkh@osg.samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4e5ce33c
    • Ross Zwisler's avatar
      dax: add tracepoint to dax_insert_mapping() · b4440734
      Ross Zwisler authored
      Add a tracepoint to dax_insert_mapping(), following the same logging
      conventions as the rest of DAX.  This tracepoint, along with the one in
      dax_load_hole(), lets us know how a DAX PTE fault was serviced.
      
      Here is an example DAX fault that inserts a PTE mapping:
      
        small-1126  [007] ....
         145.451604: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220
      
        small-1126  [007] ....
         145.452317: dax_insert_mapping: dev 259:0 ino 0x1003 shared write address 0x10420000 radix_entry 0x100006
      
        small-1126  [007] ....
         145.452399: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 MAJOR|NOPAGE
      
      Link: http://lkml.kernel.org/r/20170221195116.13278-7-ross.zwisler@linux.intel.com
      
      
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b4440734
    • Ross Zwisler's avatar
      dax: add tracepoint to dax_writeback_one() · f9bc3a07
      Ross Zwisler authored
      Add a tracepoint to dax_writeback_one(), following the same logging
      conventions as the rest of DAX.
      
      Here is an example range writeback which ends up flushing one PMD and
      one PTE:
      
        test-1265  [003] ....
         496.615250: dax_writeback_range: dev 259:0 ino 0x1003 pgoff 0x0-0x7ffffffffffff
      
        test-1265  [003] ....
         496.616263: dax_writeback_one: dev 259:0 ino 0x1003 pgoff 0x0 pglen 0x200
      
        test-1265  [003] ....
         496.616270: dax_writeback_one: dev 259:0 ino 0x1003 pgoff 0x305 pglen 0x1
      
        test-1265  [003] ....
         496.616272: dax_writeback_range_done: dev 259:0 ino 0x1003 pgoff 0x0-0x7ffffffffffff
      
      [akpm@linux-foundation.org: struct blk_dax_ctl has disappeared]
      Link: http://lkml.kernel.org/r/20170221195116.13278-6-ross.zwisler@linux.intel.com
      
      
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f9bc3a07
    • Ross Zwisler's avatar
      dax: add tracepoints to dax_writeback_mapping_range() · d14a3f48
      Ross Zwisler authored
      Add tracepoints to dax_writeback_mapping_range(), following the same
      logging conventions as the rest of DAX.
      
      Here is an example writeback call:
      
        msync-1085  [006] ....
         200.902565: dax_writeback_range: dev 259:0 ino 0x1003 pgoff 0x200-0x2ff
      
        msync-1085  [006] ....
         200.902579: dax_writeback_range_done: dev 259:0 ino 0x1003 pgoff 0x200-0x2ff
      
      [ross.zwisler@linux.intel.com: fix regression in dax_writeback_mapping_range()]
        Link: http://lkml.kernel.org/r/20170314215358.31451-1-ross.zwisler@linux.intel.com
      Link: http://lkml.kernel.org/r/20170221195116.13278-5-ross.zwisler@linux.intel.com
      
      
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d14a3f48
    • Ross Zwisler's avatar
      dax: add tracepoints to dax_load_hole() · 678c9fd0
      Ross Zwisler authored
      Add tracepoints to dax_load_hole(), following the same logging conventions
      as the rest of DAX.
      
      Here is the logging generated by a PTE read from a hole:
      
        read-1075  [002] ....
          62.362108: dax_pte_fault: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280
      
        read-1075  [002] ....
          62.362140: dax_load_hole: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280 NOPAGE
      
        read-1075  [002] ....
          62.362141: dax_pte_fault_done: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280 NOPAGE
      
      Link: http://lkml.kernel.org/r/20170221195116.13278-4-ross.zwisler@linux.intel.com
      
      
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      678c9fd0
    • Ross Zwisler's avatar
      dax: add tracepoints to dax_pfn_mkwrite() · c3ff68d7
      Ross Zwisler authored
      Add tracepoints to dax_pfn_mkwrite(), following the same logging
      conventions as the rest of DAX.
      
      Here is an example PTE fault followed by a pfn_mkwrite:
      
        small_aligned-1094  [002] ....
         374.084998: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200
      
        small_aligned-1094  [002] ....
         374.085145: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200 MAJOR|NOPAGE
      
        small_aligned-1094  [002] ....
         374.085165: dax_pfn_mkwrite: dev 259:0 ino 0x1003 shared WRITE|MKWRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200 NOPAGE
      
      Link: http://lkml.kernel.org/r/20170221195116.13278-3-ross.zwisler@linux.intel.com
      
      
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c3ff68d7
    • Ross Zwisler's avatar
      dax: add tracepoints to dax_iomap_pte_fault() · a9c42b33
      Ross Zwisler authored
      Patch series "second round of tracepoints for DAX".
      
      This second round of DAX tracepoint patches adds tracing to the PTE
      fault path (dax_iomap_pte_fault(), dax_pfn_mkwrite(), dax_load_hole(),
      dax_insert_mapping()) and to the writeback path
      (dax_writeback_mapping_range(), dax_writeback_one()).
      
      The purpose of this tracing is to give us a high level view of what DAX
      is doing, whether faults are being serviced by PMDs or PTEs, and by real
      storage or by zero pages covering holes.
      
      I do have some patches nearly ready which also add tracing to
      grab_mapping_entry() and dax_insert_mapping_entry().  These are more
      targeted at logging how we are interacting with the radix tree, how we
      use empty entries for locking, whether we "downgrade" huge zero pages to
      4k PTE sized allocations, etc.  In the end it seemed to me that this
      might be too detailed to have as constantly present tracepoints, but if
      anyone sees value in having tracepoints like this in the DAX code
      permanently (Jan?), please let me know and I'll add those last two
      patches.
      
      All these tracepoints were done to be consistent with the style of the
      XFS tracepoints and with the existing DAX PMD tracepoints.
      
      This patch (of 6):
      
      Add tracepoints to dax_iomap_pte_fault(), following the same logging
      conventions as the rest of DAX.
      
      Here is an example fault that initially tries to be serviced by the PMD
      fault handler but which falls back to PTEs because the VMA isn't large
      enough to hold a PMD:
      
        small-1086  [005] ....
         71.140014: xfs_filemap_huge_fault: dev 259:0 ino 0x1003
      
        small-1086  [005] ....
          71.140027: dax_pmd_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 vm_start 0x10200000 vm_end 0x10500000 pgoff 0x220 max_pgoff 0x1400
      
        small-1086  [005] ....
          71.140028: dax_pmd_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 vm_start 0x10200000 vm_end 0x10500000 pgoff 0x220 max_pgoff 0x1400 FALLBACK
      
        small-1086  [005] ....
          71.140035: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220
      
        small-1086  [005] ....
          71.140396: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 MAJOR|NOPAGE
      
      Link: http://lkml.kernel.org/r/20170221195116.13278-2-ross.zwisler@linux.intel.com
      
      
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9c42b33
    • Vlastimil Babka's avatar
      mtd: nand: nandsim: convert to memalloc_noreclaim_*() · dcbe8214
      Vlastimil Babka authored
      Nandsim has own functions set_memalloc() and clear_memalloc() for robust
      setting and clearing of PF_MEMALLOC.  Replace them by the new generic
      helpers.  No functional change.
      
      Link: http://lkml.kernel.org/r/20170405074700.29871-5-vbabka@suse.cz
      
      
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Chris Leech <cleech@redhat.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Lee Duncan <lduncan@suse.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dcbe8214
    • Vlastimil Babka's avatar
      treewide: convert PF_MEMALLOC manipulations to new helpers · f1083048
      Vlastimil Babka authored
      We now have memalloc_noreclaim_{save,restore} helpers for robust setting
      and clearing of PF_MEMALLOC.  Let's convert the code which was using the
      generic tsk_restore_flags().  No functional change.
      
      [vbabka@suse.cz: in net/core/sock.c the hunk is missing]
      Link: http://lkml.kernel.org/r/20170405074700.29871-4-vbabka@suse.cz
      
      
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Lee Duncan <lduncan@suse.com>
      Cc: Chris Leech <cleech@redhat.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Wouter Verhelst <w@uter.be>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f1083048
    • Vlastimil Babka's avatar
      mm: introduce memalloc_noreclaim_{save,restore} · 499118e9
      Vlastimil Babka authored
      The previous patch ("mm: prevent potential recursive reclaim due to
      clearing PF_MEMALLOC") has shown that simply setting and clearing
      PF_MEMALLOC in current->flags can result in wrongly clearing a
      pre-existing PF_MEMALLOC flag and potentially lead to recursive reclaim.
      Let's introduce helpers that support proper nesting by saving the
      previous stat of the flag, similar to the existing memalloc_noio_* and
      memalloc_nofs_* helpers.  Convert existing setting/clearing of
      PF_MEMALLOC within mm to the new helpers.
      
      There are no known issues with the converted code, but the change makes
      it more robust.
      
      Link: http://lkml.kernel.org/r/20170405074700.29871-3-vbabka@suse.cz
      
      
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Suggested-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
      Cc: Chris Leech <cleech@redhat.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Lee Duncan <lduncan@suse.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Richard Weinberger <richard@nod.at>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      499118e9
    • Vlastimil Babka's avatar
      mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC · 62be1511
      Vlastimil Babka authored
      Patch series "more robust PF_MEMALLOC handling"
      
      This series aims to unify the setting and clearing of PF_MEMALLOC, which
      prevents recursive reclaim.  There are some places that clear the flag
      unconditionally from current->flags, which may result in clearing a
      pre-existing flag.  This already resulted in a bug report that Patch 1
      fixes (without the new helpers, to make backporting easier).  Patch 2
      introduces the new helpers, modelled after existing memalloc_noio_* and
      memalloc_nofs_* helpers, and converts mm core to use them.  Patches 3
      and 4 convert non-mm code.
      
      This patch (of 4):
      
      __alloc_pages_direct_compact() sets PF_MEMALLOC to prevent deadlock
      during page migration by lock_page() (see the comment in
      __unmap_and_move()).  Then it unconditionally clears the flag, which can
      clear a pre-existing PF_MEMALLOC flag and result in recursive reclaim.
      This was not a problem until commit a8161d1e ("mm, page_alloc:
      restructure direct compaction handling in slowpath"), because direct
      compation was called only after direct reclaim, which was skipped when
      PF_MEMALLOC flag was set.
      
      Even now it's only a theoretical issue, as the new callsite of
      __alloc_pages_direct_compact() is reached only for costly orders and
      when gfp_pfmemalloc_allowed() is true, which means either
      __GFP_NOMEMALLOC is in gfp_flags or in_interrupt() is true.  There is no
      such known context, but let's play it safe and make
      __alloc_pages_direct_compact() robust for cases where PF_MEMALLOC is
      already set.
      
      Fixes: a8161d1e ("mm, page_alloc: restructure direct compaction handling in slowpath")
      Link: http://lkml.kernel.org/r/20170405074700.29871-2-vbabka@suse.cz
      
      
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
      Cc: Chris Leech <cleech@redhat.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Lee Duncan <lduncan@suse.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      62be1511
    • Oliver O'Halloran's avatar
      mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required · 3b6521f5
      Oliver O'Halloran authored
      Although all architectures use a deposited page table for THP on
      anonymous VMAs, some architectures (s390 and powerpc) require the
      deposited storage even for file backed VMAs due to quirks of their MMUs.
      
      This patch adds support for depositing a table in DAX PMD fault handling
      path for archs that require it.  Other architectures should see no
      functional changes.
      
      Link: http://lkml.kernel.org/r/20170411174233.21902-3-oohall@gmail.com
      
      
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: linux-nvdimm@ml01.01.org
      Cc: Oliver O'Halloran <oohall@gmail.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3b6521f5