Skip to content
  1. Aug 30, 2017
  2. Jul 28, 2017
    • Will Deacon's avatar
      arm64: mmu: Place guard page after mapping of kernel image · 92bbd16e
      Will Deacon authored
      
      
      The vast majority of virtual allocations in the vmalloc region are followed
      by a guard page, which can help to avoid overruning on vma into another,
      which may map a read-sensitive device.
      
      This patch adds a guard page to the end of the kernel image mapping (i.e.
      following the data/bss segments).
      
      Cc: Mark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      92bbd16e
    • Matthias Kaehlcke's avatar
      x86/boot: Disable the address-of-packed-member compiler warning · 20c6c189
      Matthias Kaehlcke authored
      
      
      The clang warning 'address-of-packed-member' is disabled for the general
      kernel code, also disable it for the x86 boot code.
      
      This suppresses a bunch of warnings like this when building with clang:
      
      ./arch/x86/include/asm/processor.h:535:30: warning: taking address of
        packed member 'sp0' of class or structure 'x86_hw_tss' may result in an
        unaligned pointer value [-Waddress-of-packed-member]
          return this_cpu_read_stable(cpu_tss.x86_tss.sp0);
                                      ^~~~~~~~~~~~~~~~~~~
      ./arch/x86/include/asm/percpu.h:391:59: note: expanded from macro
        'this_cpu_read_stable'
          #define this_cpu_read_stable(var)       percpu_stable_op("mov", var)
                                                                          ^~~
      ./arch/x86/include/asm/percpu.h:228:16: note: expanded from macro
        'percpu_stable_op'
          : "p" (&(var)));
                   ^~~
      
      Signed-off-by: default avatarMatthias Kaehlcke <mka@chromium.org>
      Cc: Doug Anderson <dianders@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170725215053.135586-1-mka@chromium.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      20c6c189
  3. Jul 27, 2017
    • Will Deacon's avatar
      drivers/perf: arm_pmu: Request PMU SPIs with IRQF_PER_CPU · a3287c41
      Will Deacon authored
      Since the PMU register interface is banked per CPU, CPU PMU interrrupts
      cannot be handled by a CPU other than the one with the PMU asserting the
      interrupt. This means that migrating PMU SPIs, as we do during a CPU
      hotplug operation doesn't make any sense and can lead to the IRQ being
      disabled entirely if we route a spurious IRQ to the new affinity target.
      
      This has been observed in practice on AMD Seattle, where CPUs on the
      non-boot cluster appear to take a spurious PMU IRQ when coming online,
      which is routed to CPU0 where it cannot be handled.
      
      This patch passes IRQF_PERCPU for PMU SPIs and forcefully sets their
      affinity prior to requesting them, ensuring that they cannot
      be migrated during hotplug events. This interacts badly with the DB8500
      erratum workaround that ping-pongs the interrupt affinity from the handler,
      so we avoid passing IRQF_PERCPU in that case by allowing the IRQ flags
      to be overridden in the platdata.
      
      Fixes: 3cf7ee98
      
       ("drivers/perf: arm_pmu: move irq request/free into probe")
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      a3287c41
    • Aneesh Kumar K.V's avatar
      powerpc/mm/hash: Free the subpage_prot_table correctly · 0da12a7a
      Aneesh Kumar K.V authored
      Fixes: dad6f37c
      
       ("powerpc: subpage_protect: Increase the array size to take care of 64TB")
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Tested-by: default avatarRam Pai <linuxram@us.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      0da12a7a
    • Wanpeng Li's avatar
      KVM: LAPIC: Fix reentrancy issues with preempt notifiers · 1d518c68
      Wanpeng Li authored
      
      
      Preempt can occur in the preemption timer expiration handler:
      
                CPU0                    CPU1
      
        preemption timer vmexit
        handle_preemption_timer(vCPU0)
          kvm_lapic_expired_hv_timer
            hv_timer_is_use == true
        sched_out
                                 sched_in
                                 kvm_arch_vcpu_load
                                   kvm_lapic_restart_hv_timer
                                     restart_apic_timer
                                       start_hv_timer
                                         already-expired timer or sw timer triggerd in the window
                                       start_sw_timer
                                         cancel_hv_timer
                                 /* back in kvm_lapic_expired_hv_timer */
                                 cancel_hv_timer
                                   WARN_ON(!apic->lapic_timer.hv_timer_in_use);  ==> Oops
      
      This can be reproduced if CONFIG_PREEMPT is enabled.
      
      ------------[ cut here ]------------
       WARNING: CPU: 4 PID: 2972 at /home/kernel/linux/arch/x86/kvm//lapic.c:1563 kvm_lapic_expired_hv_timer+0x9e/0xb0 [kvm]
       CPU: 4 PID: 2972 Comm: qemu-system-x86 Tainted: G           OE   4.13.0-rc2+ #16
       RIP: 0010:kvm_lapic_expired_hv_timer+0x9e/0xb0 [kvm]
      Call Trace:
        handle_preemption_timer+0xe/0x20 [kvm_intel]
        vmx_handle_exit+0xb8/0xd70 [kvm_intel]
        kvm_arch_vcpu_ioctl_run+0xdd1/0x1be0 [kvm]
        ? kvm_arch_vcpu_load+0x47/0x230 [kvm]
        ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
        kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? __fget+0xfc/0x210
        do_vfs_ioctl+0xa4/0x6a0
        ? __fget+0x11d/0x210
        SyS_ioctl+0x79/0x90
        do_syscall_64+0x81/0x220
        entry_SYSCALL64_slow_path+0x25/0x25
       ------------[ cut here ]------------
       WARNING: CPU: 4 PID: 2972 at /home/kernel/linux/arch/x86/kvm//lapic.c:1498 cancel_hv_timer.isra.40+0x4f/0x60 [kvm]
       CPU: 4 PID: 2972 Comm: qemu-system-x86 Tainted: G        W  OE   4.13.0-rc2+ #16
       RIP: 0010:cancel_hv_timer.isra.40+0x4f/0x60 [kvm]
      Call Trace:
        kvm_lapic_expired_hv_timer+0x3e/0xb0 [kvm]
        handle_preemption_timer+0xe/0x20 [kvm_intel]
        vmx_handle_exit+0xb8/0xd70 [kvm_intel]
        kvm_arch_vcpu_ioctl_run+0xdd1/0x1be0 [kvm]
        ? kvm_arch_vcpu_load+0x47/0x230 [kvm]
        ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
        kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? __fget+0xfc/0x210
        do_vfs_ioctl+0xa4/0x6a0
        ? __fget+0x11d/0x210
        SyS_ioctl+0x79/0x90
        do_syscall_64+0x81/0x220
        entry_SYSCALL64_slow_path+0x25/0x25
      
      This patch fixes it by making the caller of cancel_hv_timer, start_hv_timer
      and start_sw_timer be in preemption-disabled regions, which trivially
      avoid any reentrancy issue with preempt notifier.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      [Add more WARNs. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1d518c68
    • Wanpeng Li's avatar
      KVM: nVMX: Fix loss of L2's NMI blocking state · 2d6144e3
      Wanpeng Li authored
      Run kvm-unit-tests/eventinj.flat in L1 w/ ept=0 on both L0 and L1:
      
      Before NMI IRET test
      Sending NMI to self
      NMI isr running stack 0x461000
      Sending nested NMI to self
      After nested NMI to self
      Nested NMI isr running rip=40038e
      After iret
      After NMI to self
      FAIL: NMI
      
      Commit 4c4a6f79
      
       (KVM: nVMX: track NMI blocking state separately
      for each VMCS) tracks NMI blocking state separately for vmcs01 and
      vmcs02. However it is not enough:
      
       - The L2 (kvm-unit-tests/eventinj.flat) generates NMI that will fault
         on IRET, so the L2 can generate #PF which can be intercepted by L0.
       - L0 walks L1's guest page table and sees the mapping is invalid, it
         resumes the L1 guest and injects the #PF into L1.  At this point the
         vmcs02 has nmi_known_unmasked=true.
       - L1 sets set bit 3 (blocking by NMI) in the interruptibility-state field
         of vmcs12 (and fixes the shadow page table) before resuming L2 guest.
       - L1 executes VMRESUME to resume L2, causing a vmexit to L0
       - during VMRESUME emulation, prepare_vmcs02 sets bit 3 in the
         interruptibility-state field of vmcs02, but nmi_known_unmasked is
         still true.
       - L2 immediately exits to L0 with another page fault, because L0 still has
         not updated the NGVA->HPA page tables.  However, nmi_known_unmasked is
         true so vmx_recover_nmi_blocking does not do anything.
      
      The fix is to update nmi_known_unmasked when preparing vmcs02 from vmcs12.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2d6144e3
    • Wincy Van's avatar
      KVM: nVMX: Fix posted intr delivery when vcpu is in guest mode · 06a5524f
      Wincy Van authored
      
      
      The PI vector for L0 and L1 must be different. If dest vcpu0
      is in guest mode while vcpu1 is delivering a non-nested PI to
      vcpu0, there wont't be any vmexit so that the non-nested interrupt
      will be delayed.
      
      Signed-off-by: default avatarWincy Van <fanwenyi0529@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      06a5524f
    • Wincy Van's avatar
      x86: irq: Define a global vector for nested posted interrupts · 210f84b0
      Wincy Van authored
      
      
      We are using the same vector for nested/non-nested posted
      interrupts delivery, this may cause interrupts latency in
      L1 since we can't kick the L2 vcpu out of vmx-nonroot mode.
      
      This patch introduces a new vector which is only for nested
      posted interrupts to solve the problems above.
      
      Signed-off-by: default avatarWincy Van <fanwenyi0529@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      210f84b0
    • Paolo Bonzini's avatar
      KVM: x86: do mask out upper bits of PAE CR3 · a512177e
      Paolo Bonzini authored
      This reverts the change of commit f85c758d,
      as the behavior it modified was intended.
      
      The VM is running in 32-bit PAE mode, and Table 4-7 of the Intel manual
      says:
      
      Table 4-7. Use of CR3 with PAE Paging
      Bit Position(s)	Contents
      4:0		Ignored
      31:5		Physical address of the 32-Byte aligned
      		page-directory-pointer table used for linear-address
      		translation
      63:32		Ignored (these bits exist only on processors supporting
      		the Intel-64 architecture)
      
      To placate the static checker, write the mask explicitly as an
      unsigned long constant instead of using a 32-bit unsigned constant.
      
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Fixes: f85c758d
      
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a512177e
  4. Jul 26, 2017
    • Dave Martin's avatar
      arm64: sysreg: Fix unprotected macro argmuent in write_sysreg · d0153c7f
      Dave Martin authored
      
      
      write_sysreg() may misparse the value argument because it is used
      without parentheses to protect it.
      
      This patch adds the ( ) in order to avoid any surprises.
      
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarDave Martin <Dave.Martin@arm.com>
      [will: same change to write_sysreg_s]
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      d0153c7f
    • Michael Ellerman's avatar
      powerpc/Makefile: Fix ld version check with 64-bit LE-only toolchain · b40b2386
      Michael Ellerman authored
      In commit efe0160c ("powerpc/64: Linker on-demand sfpr functions
      for modules"), we added an ld version check early in the powerpc
      top-level Makefile.
      
      Because the Makefile runs before the kernel config is setup, the
      checks for CONFIG_CPU_LITTLE_ENDIAN etc. all take the default case. So
      we end up configuring ld for 32-bit big endian.
      
      That would be OK, except that for historical (or perhaps no) reason,
      we use 'override LD' to add the endian flags to the LD variable
      itself, rather than the normal approach of adding them to LDFLAGS.
      
      The end result is that when we check the ld version we run it as:
      
        $(CROSS_COMPILE)ld -EB -m elf32ppc --version
      
      This often works, unless you are using a 64-bit only and/or little
      endian only, toolchain. In which case you see something like:
      
        $ make defconfig
        powerpc64le-linux-ld: unrecognised emulation mode: elf32ppc
        Supported emulations: elf64lppc elf32lppc elf32lppclinux elf32lppcsim
        /bin/sh: 1: [: -ge: unexpected operator
      
      The proper fix is to stop using 'override LD', but that will require a
      fair bit of testing. Instead we can fix it for now just by reordering
      the Makefile to do the version check earlier.
      
      Fixes: efe0160c
      
       ("powerpc/64: Linker on-demand sfpr functions for modules")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b40b2386
    • Laurent Vivier's avatar
      powerpc/pseries: Fix of_node_put() underflow during reconfig remove · 4fd1bd44
      Laurent Vivier authored
      As for commit 68baf692 ("powerpc/pseries: Fix of_node_put()
      underflow during DLPAR remove"), the call to of_node_put() must be
      removed from pSeries_reconfig_remove_node().
      
      dlpar_detach_node() and pSeries_reconfig_remove_node() both call
      of_detach_node(), and thus the node should not be released in both
      cases.
      
      Fixes: 0829f6d1
      
       ("of: device_node kobject lifecycle fixes")
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: default avatarLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      4fd1bd44
    • Benjamin Herrenschmidt's avatar
      powerpc/mm/radix: Workaround prefetch issue with KVM · a25bd72b
      Benjamin Herrenschmidt authored
      
      
      There's a somewhat architectural issue with Radix MMU and KVM.
      
      When coming out of a guest with AIL (Alternate Interrupt Location, ie,
      MMU enabled), we start executing hypervisor code with the PID register
      still containing whatever the guest has been using.
      
      The problem is that the CPU can (and will) then start prefetching or
      speculatively load from whatever host context has that same PID (if
      any), thus bringing translations for that context into the TLB, which
      Linux doesn't know about.
      
      This can cause stale translations and subsequent crashes.
      
      Fixing this in a way that is neither racy nor a huge performance
      impact is difficult. We could just make the host invalidations always
      use broadcast forms but that would hurt single threaded programs for
      example.
      
      We chose to fix it instead by partitioning the PID space between guest
      and host. This is possible because today Linux only use 19 out of the
      20 bits of PID space, so existing guests will work if we make the host
      use the top half of the 20 bits space.
      
      We additionally add support for a property to indicate to Linux the
      size of the PID register which will be useful if we eventually have
      processors with a larger PID space available.
      
      There is still an issue with malicious guests purposefully setting the
      PID register to a value in the hosts PID range. Hopefully future HW
      can prevent that, but in the meantime, we handle it with a pair of
      kludges:
      
       - On the way out of a guest, before we clear the current VCPU in the
         PACA, we check the PID and if it's outside of the permitted range
         we flush the TLB for that PID.
      
       - When context switching, if the mm is "new" on that CPU (the
         corresponding bit was set for the first time in the mm cpumask), we
         check if any sibling thread is in KVM (has a non-NULL VCPU pointer
         in the PACA). If that is the case, we also flush the PID for that
         CPU (core).
      
      This second part is needed to handle the case where a process is
      migrated (or starts a new pthread) on a sibling thread of the CPU
      coming out of KVM, as there's a window where stale translations can
      exist before we detect it and flush them out.
      
      A future optimization could be added by keeping track of whether the
      PID has ever been used and avoid doing that for completely fresh PIDs.
      We could similarily mark PIDs that have been the subject of a global
      invalidation as "fresh". But for now this will do.
      
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      [mpe: Rework the asm to build with CONFIG_PPC_RADIX_MMU=n, drop
            unneeded include of kvm_book3s_asm.h]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a25bd72b
    • John David Anglin's avatar
      parisc: Extend disabled preemption in copy_user_page · 56008c04
      John David Anglin authored
      
      
      It's always bothered me that we only disable preemption in
      copy_user_page around the call to flush_dcache_page_asm.
      This patch extends this to after the copy.
      
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Cc: stable@vger.kernel.org # 4.9+
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      56008c04
    • John David Anglin's avatar
      parisc: Prevent TLB speculation on flushed pages on CPUs that only support equivalent aliases · ae7a609c
      John David Anglin authored
      
      
      Helge noticed that we flush the TLB page in flush_cache_page but not in
      flush_cache_range or flush_cache_mm.
      
      For a long time, we have had random segmentation faults building
      packages on machines with PA8800/8900 processors.  These machines only
      support equivalent aliases.  We don't see these faults on machines that
      don't require strict coherency.  So, it appears TLB speculation
      sometimes leads to cache corruption on machines that require coherency.
      
      This patch adds TLB flushes to flush_cache_range and flush_cache_mm when
      coherency is required.  We only flush the TLB in flush_cache_page when
      coherency is required.
      
      The patch also optimizes flush_cache_range.  It turns out we always have
      the right context to use flush_user_dcache_range_asm and
      flush_user_icache_range_asm.
      
      The patch has been tested for some time on rp3440, rp3410 and A500-44.
      It's been boot tested on c8000.  No random segmentation faults were
      observed during testing.
      
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Cc: stable@vger.kernel.org # 4.9+
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      ae7a609c
    • Helge Deller's avatar
      parisc: Suspend lockup detectors before system halt · 56188832
      Helge Deller authored
      
      
      Some machines can't power off the machine, so disable the lockup detectors to
      avoid this watchdog BUG to show up every few seconds:
      watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [systemd-shutdow:1]
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Cc: stable@vger.kernel.org # 4.9+
      56188832
    • Helge Deller's avatar
      parisc: Show DIMM slot number which holds broken memory module · c46bafc4
      Helge Deller authored
      
      
      The Page Deallocation Table (PDT) holds the physical addresses of all broken
      memory addresses. With the physical address we now are able to show which DIMM
      slot (e.g. 1a, 3c) actually holds the broken memory module so that users are
      able to replace it.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      c46bafc4
    • Helge Deller's avatar
      parisc: Add function to return DIMM slot of physical address · 25a9b765
      Helge Deller authored
      
      
      Add a firmware wrapper function, which asks PDC firmware for the DIMM slot of a
      physical address. This is needed to show users which DIMM module needs
      replacement in case a broken DIMM was encountered.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      25a9b765
    • Helge Deller's avatar
      parisc: Fix crash when calling PDC_PAT_MEM PDT firmware function · f520e552
      Helge Deller authored
      Commit c9c2877d ("parisc: Add Page Deallocation Table (PDT) support")
      introduced the pdc_pat_mem_read_pd_pdt() firmware helper function, which
      crashed the system because it trashed the stack if the
      pdc_pat_mem_read_pd_retinfo struct was located on the stack (and which is
      in size less than the required 32 64-bit values).
      
      Fix it by using the pdc_result struct instead when calling firmware and copy
      the return values back into the result struct when finished sucessfully.
      
      While debugging this code I noticed that the pdc_type wasn't set correctly
      either, so let's fix that too.
      
      Fixes: c9c2877d
      
       ("parisc: Add Page Deallocation Table (PDT) support")
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      f520e552
  5. Jul 25, 2017
  6. Jul 24, 2017
    • Dave Martin's avatar
      ARM: 8687/1: signal: Fix unparseable iwmmxt_sigframe in uc_regspace[] · ce184a0d
      Dave Martin authored
      
      
      In kernels with CONFIG_IWMMXT=y running on non-iWMMXt hardware, the
      signal frame can be left partially uninitialised in such a way
      that userspace cannot parse uc_regspace[] safely.  In particular,
      this means that the VFP registers cannot be located reliably in the
      signal frame when a multi_v7_defconfig kernel is run on the
      majority of platforms.
      
      The cause is that the uc_regspace[] is laid out statically based on
      the kernel config, but the decision of whether to save/restore the
      iWMMXt registers must be a runtime decision.
      
      To minimise breakage of software that may assume a fixed layout,
      this patch emits a dummy block of the same size as iwmmxt_sigframe,
      for non-iWMMXt threads.  However, the magic and size of this block
      are now filled in to help parsers skip over it.  A new DUMMY_MAGIC
      is defined for this purpose.
      
      It is probably legitimate (if non-portable) for userspace to
      manufacture its own sigframe for sigreturn, and there is no obvious
      reason why userspace should be required to insert a DUMMY_MAGIC
      block when running on non-iWMMXt hardware, when omitting it has
      worked just fine forever in other configurations.  So in this case,
      sigreturn does not require this block to be present.
      
      Reported-by: default avatarEdmund Grimley-Evans <Edmund.Grimley-Evans@arm.com>
      Signed-off-by: default avatarDave Martin <Dave.Martin@arm.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      ce184a0d
    • Dave Martin's avatar
      ARM: 8686/1: iwmmxt: Add missing __user annotations to sigframe accessors · 26958355
      Dave Martin authored
      
      
      preserve_iwmmxt_context() and restore_iwmmxt_context() lack __user
      accessors on their arguments pointing to the user signal frame.
      
      There does not be appear to be a bug here, but this omission is
      inconsistent with the crunch and vfp sigframe access functions.
      
      This patch adds the annotations, for consistency.
      
      Signed-off-by: default avatarDave Martin <Dave.Martin@arm.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      26958355
    • Masami Hiramatsu's avatar
      kprobes/x86: Release insn_slot in failure path · 38115f2f
      Masami Hiramatsu authored
      The following commit:
      
        003002e0
      
       ("kprobes: Fix arch_prepare_kprobe to handle copy insn failures")
      
      returns an error if the copying of the instruction, but does not release
      the allocated insn_slot.
      
      Clean up correctly.
      
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: David S . Miller <davem@davemloft.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 003002e0 ("kprobes: Fix arch_prepare_kprobe to handle copy insn failures")
      Link: http://lkml.kernel.org/r/150064834183.6172.11694375818447664416.stgit@devbox
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      38115f2f
    • Stephane Eranian's avatar
      perf/x86/intel/uncore: Fix missing marker for skx_uncore_cha_extra_regs · ba883b4a
      Stephane Eranian authored
      
      
      This skx_uncore_cha_extra_regs array was missing an end-marker.
      
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1499967350-10385-7-git-send-email-kan.liang@intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ba883b4a
    • Stephane Eranian's avatar
      perf/x86/intel/uncore: Fix SKX CHA event extra regs · 8aa7b7b4
      Stephane Eranian authored
      
      
      This patch adds two missing event extra regs for Skylake Server CHA PMU:
      
       - TOR_INSERTS
       - TOR_OCCUPANCY
      
      Were missing support for all the filters, including opcode matchers.
      
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1499967350-10385-6-git-send-email-kan.liang@intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8aa7b7b4
    • Kan Liang's avatar
      perf/x86/intel/uncore: Remove invalid Skylake server CHA filter field · 9ad0fbd8
      Kan Liang authored
      
      
      There is no field c6 and link for CHA BOX FILTER.
      
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1499967350-10385-5-git-send-email-kan.liang@intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9ad0fbd8
    • Kan Liang's avatar
      perf/x86/intel/uncore: Fix Skylake server CHA LLC_LOOKUP event umask · c3f02682
      Kan Liang authored
      
      
      Correct the umask for LLC_LOOKUP.LOCAL and LLC_LOOKUP.REMOTE events
      
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1499967350-10385-4-git-send-email-kan.liang@intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c3f02682
    • Kan Liang's avatar
      perf/x86/intel/uncore: Fix Skylake server PCU PMU event format · bab4e569
      Kan Liang authored
      
      
      PCU event format for SKX are different from snbep. Introduce a new
      format group for SKX PCU.
      
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1499967350-10385-3-git-send-email-kan.liang@intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      bab4e569
    • Stephane Eranian's avatar
      perf/x86/intel/uncore: Fix Skylake UPI PMU event masks · b3625980
      Stephane Eranian authored
      
      
      This patch fixes the event_mask and event_ext_mask for the Intel Skylake
      Server UPI PMU. Bit 21 is not used as a filter. The extended umask is
      from bit 32 to bit 55. Correct both umasks.
      
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1499967350-10385-2-git-send-email-kan.liang@intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b3625980
    • Paolo Bonzini's avatar
      KVM: VMX: remove unused field · fa19871a
      Paolo Bonzini authored
      
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fa19871a
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Fix host crash on changing HPT size · ef427198
      Paul Mackerras authored
      Commit f98a8bf9 ("KVM: PPC: Book3S HV: Allow KVM_PPC_ALLOCATE_HTAB
      ioctl() to change HPT size", 2016-12-20) changed the behaviour of
      the KVM_PPC_ALLOCATE_HTAB ioctl so that it now allocates a new HPT
      and new revmap array if there was a previously-allocated HPT of a
      different size from the size being requested.  In this case, we need
      to reset the rmap arrays of the memslots, because the rmap arrays
      will contain references to HPTEs which are no longer valid.  Worse,
      these references are also references to slots in the new revmap
      array (which parallels the HPT), and the new revmap array contains
      random contents, since it doesn't get zeroed on allocation.
      
      The effect of having these stale references to slots in the revmap
      array that contain random contents is that subsequent calls to
      functions such as kvmppc_add_revmap_chain will crash because they
      will interpret the non-zero contents of the revmap array as HPTE
      indexes and thus index outside of the revmap array.  This leads to
      host crashes such as the following.
      
      [ 7072.862122] Unable to handle kernel paging request for data at address 0xd000000c250c00f8
      [ 7072.862218] Faulting instruction address: 0xc0000000000e1c78
      [ 7072.862233] Oops: Kernel access of bad area, sig: 11 [#1]
      [ 7072.862286] SMP NR_CPUS=1024
      [ 7072.862286] NUMA
      [ 7072.862325] PowerNV
      [ 7072.862378] Modules linked in: kvm_hv vhost_net vhost tap xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm iw_cxgb3 mlx5_ib ib_core ses enclosure scsi_transport_sas ipmi_powernv ipmi_devintf ipmi_msghandler powernv_op_panel i2c_opal nfsd auth_rpcgss oid_registry
      [ 7072.863085]  nfs_acl lockd grace sunrpc kvm_pr kvm xfs libcrc32c scsi_dh_alua dm_service_time radeon lpfc nvme_fc nvme_fabrics nvme_core scsi_transport_fc i2c_algo_bit tg3 drm_kms_helper ptp pps_core syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm dm_multipath i2c_core cxgb3 mlx5_core mdio [last unloaded: kvm_hv]
      [ 7072.863381] CPU: 72 PID: 56929 Comm: qemu-system-ppc Not tainted 4.12.0-kvm+ #59
      [ 7072.863457] task: c000000fe29e7600 task.stack: c000001e3ffec000
      [ 7072.863520] NIP: c0000000000e1c78 LR: c0000000000e2e3c CTR: c0000000000e25f0
      [ 7072.863596] REGS: c000001e3ffef560 TRAP: 0300   Not tainted  (4.12.0-kvm+)
      [ 7072.863658] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE,TM[E]>
      [ 7072.863667]   CR: 44082882  XER: 20000000
      [ 7072.863767] CFAR: c0000000000e2e38 DAR: d000000c250c00f8 DSISR: 42000000 SOFTE: 1
      GPR00: c0000000000e2e3c c000001e3ffef7e0 c000000001407d00 d000000c250c00f0
      GPR04: d00000006509fb70 d00000000b3d2048 0000000003ffdfb7 0000000000000000
      GPR08: 00000001007fdfb7 00000000c000000f d0000000250c0000 000000000070f7bf
      GPR12: 0000000000000008 c00000000fdad000 0000000010879478 00000000105a0d78
      GPR16: 00007ffaf4080000 0000000000001190 0000000000000000 0000000000010000
      GPR20: 4001ffffff000415 d00000006509fb70 0000000004091190 0000000ee1881190
      GPR24: 0000000003ffdfb7 0000000003ffdfb7 00000000007fdfb7 c000000f5c958000
      GPR28: d00000002d09fb70 0000000003ffdfb7 d00000006509fb70 d00000000b3d2048
      [ 7072.864439] NIP [c0000000000e1c78] kvmppc_add_revmap_chain+0x88/0x130
      [ 7072.864503] LR [c0000000000e2e3c] kvmppc_do_h_enter+0x84c/0x9e0
      [ 7072.864566] Call Trace:
      [ 7072.864594] [c000001e3ffef7e0] [c000001e3ffef830] 0xc000001e3ffef830 (unreliable)
      [ 7072.864671] [c000001e3ffef830] [c0000000000e2e3c] kvmppc_do_h_enter+0x84c/0x9e0
      [ 7072.864751] [c000001e3ffef920] [d00000000b38d878] kvmppc_map_vrma+0x168/0x200 [kvm_hv]
      [ 7072.864831] [c000001e3ffef9e0] [d00000000b38a684] kvmppc_vcpu_run_hv+0x1284/0x1300 [kvm_hv]
      [ 7072.864914] [c000001e3ffefb30] [d00000000f465664] kvmppc_vcpu_run+0x44/0x60 [kvm]
      [ 7072.865008] [c000001e3ffefb60] [d00000000f461864] kvm_arch_vcpu_ioctl_run+0x114/0x290 [kvm]
      [ 7072.865152] [c000001e3ffefbe0] [d00000000f453c98] kvm_vcpu_ioctl+0x598/0x7a0 [kvm]
      [ 7072.865292] [c000001e3ffefd40] [c000000000389328] do_vfs_ioctl+0xd8/0x8c0
      [ 7072.865410] [c000001e3ffefde0] [c000000000389be4] SyS_ioctl+0xd4/0x130
      [ 7072.865526] [c000001e3ffefe30] [c00000000000b760] system_call+0x58/0x6c
      [ 7072.865644] Instruction dump:
      [ 7072.865715] e95b2110 793a0020 7b4926e4 7f8a4a14 409e0098 807c000c 786326e4 7c6a1a14
      [ 7072.865857] 935e0008 7bbd0020 813c000c 913e000c <93a30008> 93bc000c 48000038 60000000
      [ 7072.866001] ---[ end trace 627b6e4bf8080edc ]---
      
      Note that to trigger this, it is necessary to use a recent upstream
      QEMU (or other userspace that resizes the HPT at CAS time), specify
      a maximum memory size substantially larger than the current memory
      size, and boot a guest kernel that does not support HPT resizing.
      
      This fixes the problem by resetting the rmap arrays when the old HPT
      is freed.
      
      Fixes: f98a8bf9
      
       ("KVM: PPC: Book3S HV: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size")
      Cc: stable@vger.kernel.org # v4.11+
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      ef427198
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Enable TM before accessing TM registers · e4705715
      Paul Mackerras authored
      Commit 46a704f8 ("KVM: PPC: Book3S HV: Preserve userspace HTM state
      properly", 2017-06-15) added code to read transactional memory (TM)
      registers but forgot to enable TM before doing so.  The result is
      that if userspace does have live values in the TM registers, a KVM_RUN
      ioctl will cause a host kernel crash like this:
      
      [  181.328511] Unrecoverable TM Unavailable Exception f60 at d00000001e7d9980
      [  181.328605] Oops: Unrecoverable TM Unavailable Exception, sig: 6 [#1]
      [  181.328613] SMP NR_CPUS=2048
      [  181.328613] NUMA
      [  181.328618] PowerNV
      [  181.328646] Modules linked in: vhost_net vhost tap nfs_layout_nfsv41_files rpcsec_gss_krb5 nfsv4 dns_resolver nfs
      +fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat
      +nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables
      +ip6table_filter ip6_tables iptable_filter bridge stp llc kvm_hv kvm nfsd ses enclosure scsi_transport_sas ghash_generic
      +auth_rpcgss gf128mul xts sg ctr nfs_acl lockd vmx_crypto shpchp ipmi_powernv i2c_opal grace ipmi_devintf i2c_core
      +powernv_rng sunrpc ipmi_msghandler ibmpowernv uio_pdrv_genirq uio leds_powernv powernv_op_panel ip_tables xfs sd_mod
      +lpfc ipr bnx2x libata mdio ptp pps_core scsi_transport_fc libcrc32c dm_mirror dm_region_hash dm_log dm_mod
      [  181.329278] CPU: 40 PID: 9926 Comm: CPU 0/KVM Not tainted 4.12.0+ #1
      [  181.329337] task: c000003fc6980000 task.stack: c000003fe4d80000
      [  181.329396] NIP: d00000001e7d9980 LR: d00000001e77381c CTR: d00000001e7d98f0
      [  181.329465] REGS: c000003fe4d837e0 TRAP: 0f60   Not tainted  (4.12.0+)
      [  181.329523] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
      [  181.329527]   CR: 24022448  XER: 00000000
      [  181.329608] CFAR: d00000001e773818 SOFTE: 1
      [  181.329608] GPR00: d00000001e77381c c000003fe4d83a60 d00000001e7ef410 c000003fdcfe0000
      [  181.329608] GPR04: c000003fe4f00000 0000000000000000 0000000000000000 c000003fd7954800
      [  181.329608] GPR08: 0000000000000001 c000003fc6980000 0000000000000000 d00000001e7e2880
      [  181.329608] GPR12: d00000001e7d98f0 c000000007b19000 00000001295220e0 00007fffc0ce2090
      [  181.329608] GPR16: 0000010011886608 00007fff8c89f260 0000000000000001 00007fff8c080028
      [  181.329608] GPR20: 0000000000000000 00000100118500a6 0000010011850000 0000010011850000
      [  181.329608] GPR24: 00007fffc0ce1b48 0000010011850000 00000000d673b901 0000000000000000
      [  181.329608] GPR28: 0000000000000000 c000003fdcfe0000 c000003fdcfe0000 c000003fe4f00000
      [  181.330199] NIP [d00000001e7d9980] kvmppc_vcpu_run_hv+0x90/0x6b0 [kvm_hv]
      [  181.330264] LR [d00000001e77381c] kvmppc_vcpu_run+0x2c/0x40 [kvm]
      [  181.330322] Call Trace:
      [  181.330351] [c000003fe4d83a60] [d00000001e773478] kvmppc_set_one_reg+0x48/0x340 [kvm] (unreliable)
      [  181.330437] [c000003fe4d83b30] [d00000001e77381c] kvmppc_vcpu_run+0x2c/0x40 [kvm]
      [  181.330513] [c000003fe4d83b50] [d00000001e7700b4] kvm_arch_vcpu_ioctl_run+0x114/0x2a0 [kvm]
      [  181.330586] [c000003fe4d83bd0] [d00000001e7642f8] kvm_vcpu_ioctl+0x598/0x7a0 [kvm]
      [  181.330658] [c000003fe4d83d40] [c0000000003451b8] do_vfs_ioctl+0xc8/0x8b0
      [  181.330717] [c000003fe4d83de0] [c000000000345a64] SyS_ioctl+0xc4/0x120
      [  181.330776] [c000003fe4d83e30] [c00000000000b004] system_call+0x58/0x6c
      [  181.330833] Instruction dump:
      [  181.330869] e92d0260 e9290b50 e9290108 792807e3 41820058 e92d0260 e9290b50 e9290108
      [  181.330941] 792ae8a4 794a1f87 408204f4 e92d0260 <7d4022a6> f9490ff0 e92d0260 7d4122a6
      [  181.331013] ---[ end trace 6f6ddeb4bfe92a92 ]---
      
      The fix is just to turn on the TM bit in the MSR before accessing the
      registers.
      
      Cc: stable@vger.kernel.org # v3.14+
      Fixes: 46a704f8
      
       ("KVM: PPC: Book3S HV: Preserve userspace HTM state properly")
      Reported-by: default avatarJan Stancek <jstancek@redhat.com>
      Tested-by: default avatarJan Stancek <jstancek@redhat.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      e4705715
    • Helge Deller's avatar
      parisc: regenerate defconfig files · 108ea187
      Helge Deller authored
      
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      108ea187
    • Helge Deller's avatar
      parisc: Merge millicode routines via linker script · 6cd819e8
      Helge Deller authored
      
      
      When compiling the 4.13-rc kernel I got those linker errors:
      libgcc2.c:(.text+0x110): relocation truncated to fit: R_PARISC_PCREL22F against symbol `$$divU'
      	defined in .text.div section in /usr/lib/gcc/hppa64-linux-gnu/4.9.2/libgcc.a(_divU.o)
      hppa64-linux-gnu-ld: /usr/lib/gcc/hppa64-linux-gnu/4.9.2/libgcc.a(_moddi3.o)(.text+0x174): cannot reach $$divU
      
      Avoid such errors by bundling the millicode routines in the linker script.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      6cd819e8
    • Helge Deller's avatar
      parisc: Disable further stack checks when panic occurs during stack check · 5bc64bd2
      Helge Deller authored
      
      
      Before the irq handler detects a low stack and then panics the kernel, disable
      further stack checks to avoid recursive panics.
      
      Reported-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      5bc64bd2
  7. Jul 23, 2017