Skip to content
  1. Feb 10, 2018
  2. Feb 09, 2018
    • Jose Ricardo Ziviani's avatar
      KVM: PPC: Book3S: Add MMIO emulation for VMX instructions · 09f98496
      Jose Ricardo Ziviani authored
      
      
      This patch provides the MMIO load/store vector indexed
      X-Form emulation.
      
      Instructions implemented:
      lvx: the quadword in storage addressed by the result of EA &
      0xffff_ffff_ffff_fff0 is loaded into VRT.
      
      stvx: the contents of VRS are stored into the quadword in storage
      addressed by the result of EA & 0xffff_ffff_ffff_fff0.
      
      Reported-by: default avatarGopesh Kumar Chaudhary <gopchaud@in.ibm.com>
      Reported-by: default avatarBalamuruhan S <bala24@linux.vnet.ibm.com>
      Signed-off-by: default avatarJose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      09f98496
    • Alexander Graf's avatar
      KVM: PPC: Book3S HV: Branch inside feature section · d20fe50a
      Alexander Graf authored
      
      
      We ended up with code that did a conditional branch inside a feature
      section to code outside of the feature section. Depending on how the
      object file gets organized, that might mean we exceed the 14bit
      relocation limit for conditional branches:
      
        arch/powerpc/kvm/built-in.o:arch/powerpc/kvm/book3s_hv_rmhandlers.S:416:(__ftr_alt_97+0x8): relocation truncated to fit: R_PPC64_REL14 against `.text'+1ca4
      
      So instead of doing a conditional branch outside of the feature section,
      let's just jump at the end of the same, making the branch very short.
      
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      d20fe50a
    • David Gibson's avatar
      KVM: PPC: Book3S HV: Make HPT resizing work on POWER9 · 790a9df5
      David Gibson authored
      This adds code to enable the HPT resizing code to work on POWER9,
      which uses a slightly modified HPT entry format compared to POWER8.
      On POWER9, we convert HPTEs read from the HPT from the new format to
      the old format so that the rest of the HPT resizing code can work as
      before.  HPTEs written to the new HPT are converted to the new format
      as the last step before writing them into the new HPT.
      
      This takes out the checks added by commit bcd3bb63
      
       ("KVM: PPC:
      Book3S HV: Disable HPT resizing on POWER9 for now", 2017-02-18),
      now that HPT resizing works on POWER9.
      
      On POWER9, when we pivot to the new HPT, we now call
      kvmppc_setup_partition_table() to update the partition table in order
      to make the hardware use the new HPT.
      
      [paulus@ozlabs.org - added kvmppc_setup_partition_table() call,
       wrote commit message.]
      
      Tested-by: default avatarLaurent Vivier <lvivier@redhat.com>
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      790a9df5
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing code · 05f2bb03
      Paul Mackerras authored
      
      
      This fixes the computation of the HPTE index to use when the HPT
      resizing code encounters a bolted HPTE which is stored in its
      secondary HPTE group.  The code inverts the HPTE group number, which
      is correct, but doesn't then mask it with new_hash_mask.  As a result,
      new_pteg will be effectively negative, resulting in new_hptep
      pointing before the new HPT, which will corrupt memory.
      
      In addition, this removes two BUG_ON statements.  The condition that
      the BUG_ONs were testing -- that we have computed the hash value
      incorrectly -- has never been observed in testing, and if it did
      occur, would only affect the guest, not the host.  Given that
      BUG_ON should only be used in conditions where the kernel (i.e.
      the host kernel, in this case) can't possibly continue execution,
      it is not appropriate here.
      
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      05f2bb03
  3. Feb 08, 2018
  4. Feb 03, 2018
  5. Feb 01, 2018
    • Radim Krčmář's avatar
      Merge tag 'kvm-ppc-next-4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc · d2b9b207
      Radim Krčmář authored
      PPC KVM update for 4.16
      
      - Allow HPT guests to run on a radix host on POWER9 v2.2 CPUs
        without requiring the complex thread synchronization that earlier
        CPU versions required.
      
      - A series from Ben Herrenschmidt to improve the handling of
        escalation interrupts with the XIVE interrupt controller.
      
      - Provide for the decrementer register to be copied across on
        migration.
      
      - Various minor cleanups and bugfixes.
      d2b9b207
    • Radim Krčmář's avatar
      Merge branch 'x86/hyperv' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7bf14c28
      Radim Krčmář authored
      Topic branch for stable KVM clockource under Hyper-V.
      
      Thanks to Christoffer Dall for resolving the ARM conflict.
      7bf14c28
    • Alexander Graf's avatar
      KVM: PPC: Book3S PR: Fix svcpu copying with preemption enabled · 07ae5389
      Alexander Graf authored
      
      
      When copying between the vcpu and svcpu, we may get scheduled away onto
      a different host CPU which in turn means our svcpu pointer may change.
      
      That means we need to atomically copy to and from the svcpu with preemption
      disabled, so that all code around it always sees a coherent state.
      
      Reported-by: default avatarSimon Guo <wei.guo.simon@gmail.com>
      Fixes: 3d3319b4
      
       ("KVM: PPC: Book3S: PR: Enable interrupts earlier")
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      07ae5389
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Drop locks before reading guest memory · 36ee41d1
      Paul Mackerras authored
      Running with CONFIG_DEBUG_ATOMIC_SLEEP reveals that HV KVM tries to
      read guest memory, in order to emulate guest instructions, while
      preempt is disabled and a vcore lock is held.  This occurs in
      kvmppc_handle_exit_hv(), called from post_guest_process(), when
      emulating guest doorbell instructions on POWER9 systems, and also
      when checking whether we have hit a hypervisor breakpoint.
      Reading guest memory can cause a page fault and thus cause the
      task to sleep, so we need to avoid reading guest memory while
      holding a spinlock or when preempt is disabled.
      
      To fix this, we move the preempt_enable() in kvmppc_run_core() to
      before the loop that calls post_guest_process() for each vcore that
      has just run, and we drop and re-take the vcore lock around the calls
      to kvmppc_emulate_debug_inst() and kvmppc_emulate_doorbell_instr().
      
      Dropping the lock is safe with respect to the iteration over the
      runnable vcpus in post_guest_process(); for_each_runnable_thread
      is actually safe to use locklessly.  It is possible for a vcpu
      to become runnable and add itself to the runnable_threads array
      (code near the beginning of kvmppc_run_vcpu()) and then get included
      in the iteration in post_guest_process despite the fact that it
      has not just run.  This is benign because vcpu->arch.trap and
      vcpu->arch.ceded will be zero.
      
      Cc: stable@vger.kernel.org # v4.13+
      Fixes: 57900694
      
       ("KVM: PPC: Book3S HV: Virtualize doorbell facility on POWER9")
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      36ee41d1
    • Paolo Bonzini's avatar
      KVM: VMX: make MSR bitmaps per-VCPU · 904e14fb
      Paolo Bonzini authored
      
      
      Place the MSR bitmap in struct loaded_vmcs, and update it in place
      every time the x2apic or APICv state can change.  This is rare and
      the loop can handle 64 MSRs per iteration, in a similar fashion as
      nested_vmx_prepare_msr_bitmap.
      
      This prepares for choosing, on a per-VM basis, whether to intercept
      the SPEC_CTRL and PRED_CMD MSRs.
      
      Cc: stable@vger.kernel.org       # prereq for Spectre mitigation
      Suggested-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      904e14fb
    • Longpeng(Mike)'s avatar
      kvm: x86: remove efer_reload entry in kvm_vcpu_stat · 87cedc6b
      Longpeng(Mike) authored
      The efer_reload is never used since
      commit 26bb0981
      
       ("KVM: VMX: Use shared msr infrastructure"),
      so remove it.
      
      Signed-off-by: default avatarLongpeng(Mike) <longpeng2@huawei.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      87cedc6b
    • Stanislav Lanci's avatar
      KVM: x86: AMD Processor Topology Information · 806793f5
      Stanislav Lanci authored
      
      
      This patch allow to enable x86 feature TOPOEXT. This is needed to provide
      information about SMT on AMD Zen CPUs to the guest.
      
      Signed-off-by: default avatarStanislav Lanci <pixo@polepetko.eu>
      Tested-by: default avatarNick Sarnie <commendsarnex@gmail.com>
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarBabu Moger <babu.moger@amd.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      806793f5
    • Vitaly Kuznetsov's avatar
      x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested · d391f120
      Vitaly Kuznetsov authored
      I was investigating an issue with seabios >= 1.10 which stopped working
      for nested KVM on Hyper-V. The problem appears to be in
      handle_ept_violation() function: when we do fast mmio we need to skip
      the instruction so we do kvm_skip_emulated_instruction(). This, however,
      depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS.
      However, this is not the case.
      
      Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when
      EPT MISCONFIG occurs. While on real hardware it was observed to be set,
      some hypervisors follow the spec and don't set it; we end up advancing
      IP with some random value.
      
      I checked with Microsoft and they confirmed they don't fill
      VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG.
      
      Fix the issue by doing instruction skip through emulator when running
      nested.
      
      Fixes: 68c3b4d1
      
      
      Suggested-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      d391f120
    • Masatake YAMATO's avatar
      kvm: embed vcpu id to dentry of vcpu anon inode · e46b4692
      Masatake YAMATO authored
      
      
      All d-entries for vcpu have the same, "anon_inode:kvm-vcpu". That means
      it is impossible to know the mapping between fds for vcpu and vcpu
      from userland.
      
          # LC_ALL=C ls -l /proc/617/fd | grep vcpu
          lrwx------. 1 qemu qemu 64 Jan  7 16:50 18 -> anon_inode:kvm-vcpu
          lrwx------. 1 qemu qemu 64 Jan  7 16:50 19 -> anon_inode:kvm-vcpu
      
      It is also impossible to know the mapping between vma for kvm_run
      structure and vcpu from userland.
      
          # LC_ALL=C grep vcpu /proc/617/maps
          7f9d842d0000-7f9d842d3000 rw-s 00000000 00:0d 20393                      anon_inode:kvm-vcpu
          7f9d842d3000-7f9d842d6000 rw-s 00000000 00:0d 20393                      anon_inode:kvm-vcpu
      
      This change adds vcpu id to d-entries for vcpu. With this change
      you can get the following output:
      
          # LC_ALL=C ls -l /proc/617/fd | grep vcpu
          lrwx------. 1 qemu qemu 64 Jan  7 16:50 18 -> anon_inode:kvm-vcpu:0
          lrwx------. 1 qemu qemu 64 Jan  7 16:50 19 -> anon_inode:kvm-vcpu:1
      
          # LC_ALL=C grep vcpu /proc/617/maps
          7f9d842d0000-7f9d842d3000 rw-s 00000000 00:0d 20393                      anon_inode:kvm-vcpu:0
          7f9d842d3000-7f9d842d6000 rw-s 00000000 00:0d 20393                      anon_inode:kvm-vcpu:1
      
      With the mappings known from the output, a tool like strace can report more details
      of qemu-kvm process activities. Here is the strace output of my local prototype:
      
          # ./strace -KK -f -p 617 2>&1 | grep 'KVM_RUN\| K'
          ...
          [pid   664] ioctl(18, KVM_RUN, 0)       = 0 (KVM_EXIT_MMIO)
           K ready_for_interrupt_injection=1, if_flag=0, flags=0, cr8=0000000000000000, apic_base=0x000000fee00d00
           K phys_addr=0, len=1634035803, [33, 0, 0, 0, 0, 0, 0, 0], is_write=112
          [pid   664] ioctl(18, KVM_RUN, 0)       = 0 (KVM_EXIT_MMIO)
           K ready_for_interrupt_injection=1, if_flag=1, flags=0, cr8=0000000000000000, apic_base=0x000000fee00d00
           K phys_addr=0, len=1634035803, [33, 0, 0, 0, 0, 0, 0, 0], is_write=112
          ...
      
      Signed-off-by: default avatarMasatake YAMATO <yamato@redhat.com>
      Acked-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      e46b4692
    • KarimAllah Ahmed's avatar
      kvm: Map PFN-type memory regions as writable (if possible) · a340b3e2
      KarimAllah Ahmed authored
      
      
      For EPT-violations that are triggered by a read, the pages are also mapped with
      write permissions (if their memory region is also writable). That would avoid
      getting yet another fault on the same page when a write occurs.
      
      This optimization only happens when you have a "struct page" backing the memory
      region. So also enable it for memory regions that do not have a "struct page".
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarKarimAllah Ahmed <karahmed@amazon.de>
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      a340b3e2
  6. Jan 31, 2018
    • Radim Krčmář's avatar
      Merge tag 'kvm-arm-for-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm · e5317539
      Radim Krčmář authored
      KVM/ARM Changes for v4.16
      
      The changes for this version include icache invalidation optimizations
      (improving VM startup time), support for forwarded level-triggered
      interrupts (improved performance for timers and passthrough platform
      devices), a small fix for power-management notifiers, and some cosmetic
      changes.
      e5317539
    • Radim Krčmář's avatar
      810f4600
    • Thomas Gleixner's avatar
      x86/kvm: Make it compile on 32bit and with HYPYERVISOR_GUEST=n · 5fa4ec9c
      Thomas Gleixner authored
      The reenlightment support for hyperv slapped a direct reference to
      x86_hyper_type into the kvm code which results in the following build
      failure when CONFIG_HYPERVISOR_GUEST=n:
      
      arch/x86/kvm/x86.c:6259:6: error: ‘x86_hyper_type’ undeclared (first use in this function)
      arch/x86/kvm/x86.c:6259:6: note: each undeclared identifier is reported only once for each function it appears in
      
      Use the proper helper function to cure that.
      
      The 32bit compile fails because of:
      
      arch/x86/kvm/x86.c:5936:13: warning: ‘kvm_hyperv_tsc_notifier’ defined but not used [-Wunused-function]
      
      which is a real trainwreck engineering artwork. The callsite is wrapped
      into #ifdef CONFIG_X86_64, but the function itself has the #ifdef inside
      the function body. Make the function itself wrapped into the ifdef to cure
      that.
      
      Qualiteee....
      
      Fixes: 0092e434
      
       ("x86/kvm: Support Hyper-V reenlightenment")
      Reported-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: kvm@vger.kernel.org
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: devel@linuxdriverproject.org
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Cathy Avery <cavery@redhat.com>
      Cc: Mohammed Gamal <mmorsy@redhat.com>
      5fa4ec9c
    • Christoffer Dall's avatar
      KVM: arm/arm64: Fixup userspace irqchip static key optimization · cd15d205
      Christoffer Dall authored
      
      
      When I introduced a static key to avoid work in the critical path for
      userspace irqchips which is very rarely used, I accidentally messed up
      my logic and used && where I should have used ||, because the point was
      to short-circuit the evaluation in case userspace irqchips weren't even
      in use.
      
      This fixes an issue when running in-kernel irqchip VMs alongside
      userspace irqchip VMs.
      
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Fixes: c44c232ee2d3 ("KVM: arm/arm64: Avoid work when userspace iqchips are not used")
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      cd15d205
    • Christoffer Dall's avatar
      KVM: arm/arm64: Fix userspace_irqchip_in_use counting · f1d7231c
      Christoffer Dall authored
      
      
      We were not decrementing the static key count in the right location.
      kvm_arch_vcpu_destroy() is only called to clean up after a failed
      VCPU create attempt, whereas kvm_arch_vcpu_free() is called on teardown
      of the VM as well.  Move the static key decrement call to
      kvm_arch_vcpu_free().
      
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      f1d7231c
    • Christoffer Dall's avatar
      KVM: arm/arm64: Fix incorrect timer_is_pending logic · 13e59ece
      Christoffer Dall authored
      
      
      After the recently introduced support for level-triggered mapped
      interrupt, I accidentally left the VCPU thread busily going back and
      forward between the guest and the hypervisor whenever the guest was
      blocking, because I would always incorrectly report that a timer
      interrupt was pending.
      
      This is because the timer->irq.level field is not valid for mapped
      interrupts, where we offload the level state to the hardware, and as a
      result this field is always true.
      
      Luckily the problem can be relatively easily solved by not checking the
      cached signal state of either timer in kvm_timer_should_fire() but
      instead compute the timer state on the fly, which we do already if the
      cached signal state wasn't high.  In fact, the only reason for checking
      the cached signal state was a tiny optimization which would only be
      potentially faster when the polling loop detects a pending timer
      interrupt, which is quite unlikely.
      
      Instead of duplicating the logic from kvm_arch_timer_handler(), we
      enlighten kvm_timer_should_fire() to report something valid when the
      timer state is loaded onto the hardware.  We can then call this from
      kvm_arch_timer_handler() as well and avoid the call to
      __timer_snapshot_state() in kvm_arch_timer_get_input_level().
      
      Reported-by: default avatarTomasz Nowicki <tn@semihalf.com>
      Tested-by: default avatarTomasz Nowicki <tn@semihalf.com>
      Reviewed-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      13e59ece
    • Cornelia Huck's avatar
      MAINTAINERS: update KVM/s390 maintainers · cd74ff94
      Cornelia Huck authored
      
      
      As I have neither too much time nor access to the architecture
      documentation anymore, let's switch my status from maintainer to
      reviewer. Janosch will step in as second maintainer.
      
      Acked-by: default avatarJanosch Frank <frankja@linux.vnet.ibm.com>
      Acked-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarCornelia Huck <cohuck@redhat.com>
      cd74ff94
    • Cornelia Huck's avatar
    • Cornelia Huck's avatar
    • Vitaly Kuznetsov's avatar
      x86/kvm: Support Hyper-V reenlightenment · 0092e434
      Vitaly Kuznetsov authored
      
      
      When running nested KVM on Hyper-V guests its required to update
      masterclocks for all guests when L1 migrates to a host with different TSC
      frequency.
      
      Implement the procedure in the following way:
        - Pause all guests.
        - Tell the host (Hyper-V) to stop emulating TSC accesses.
        - Update the gtod copy, recompute clocks.
        - Unpause all guests.
      
      This is somewhat similar to cpufreq but there are two important differences:
       - TSC emulation can only be disabled globally (on all CPUs)
       - The new TSC frequency is not known until emulation is turned off so
         there is no way to 'prepare' for the event upfront.
      
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: kvm@vger.kernel.org
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: devel@linuxdriverproject.org
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Cathy Avery <cavery@redhat.com>
      Cc: Mohammed Gamal <mmorsy@redhat.com>
      Link: https://lkml.kernel.org/r/20180124132337.30138-8-vkuznets@redhat.com
      0092e434
    • Vitaly Kuznetsov's avatar
      x86/kvm: Pass stable clocksource to guests when running nested on Hyper-V · b0c39dc6
      Vitaly Kuznetsov authored
      
      
      Currently, KVM is able to work in 'masterclock' mode passing
      PVCLOCK_TSC_STABLE_BIT to guests when the clocksource which is used on the
      host is TSC.
      
      When running nested on Hyper-V the guest normally uses a different one: TSC
      page which is resistant to TSC frequency changes on events like L1
      migration. Add support for it in KVM.
      
      The only non-trivial change is in vgettsc(): when updating the gtod copy
      both the clock readout and tsc value have to be updated now.
      
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: kvm@vger.kernel.org
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: devel@linuxdriverproject.org
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Cathy Avery <cavery@redhat.com>
      Cc: Mohammed Gamal <mmorsy@redhat.com>
      Link: https://lkml.kernel.org/r/20180124132337.30138-7-vkuznets@redhat.com
      b0c39dc6
    • Vitaly Kuznetsov's avatar
      x86/irq: Count Hyper-V reenlightenment interrupts · 51d4e5da
      Vitaly Kuznetsov authored
      
      
      Hyper-V reenlightenment interrupts arrive when the VM is migrated, While
      they are not interesting in general it's important when L2 nested guests
      are running.
      
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: kvm@vger.kernel.org
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: devel@linuxdriverproject.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Cathy Avery <cavery@redhat.com>
      Cc: Mohammed Gamal <mmorsy@redhat.com>
      Link: https://lkml.kernel.org/r/20180124132337.30138-6-vkuznets@redhat.com
      51d4e5da
    • Vitaly Kuznetsov's avatar
      x86/hyperv: Redirect reenlightment notifications on CPU offlining · e7c4e36c
      Vitaly Kuznetsov authored
      
      
      It is very unlikely for CPUs to get offlined when running on Hyper-V as
      there is a protection in the vmbus module which prevents it when the guest
      has any VMBus devices assigned. This, however, may change in future if an
      option to reassign an already active channel will be added. It is also
      possible to run without any Hyper-V devices or to have a CPU with no
      assigned channels.
      
      Reassign reenlightenment notifications to some other active CPU when the
      CPU which is assigned to them goes offline.
      
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: kvm@vger.kernel.org
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: devel@linuxdriverproject.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Cathy Avery <cavery@redhat.com>
      Cc: Mohammed Gamal <mmorsy@redhat.com>
      Link: https://lkml.kernel.org/r/20180124132337.30138-5-vkuznets@redhat.com
      e7c4e36c
    • Vitaly Kuznetsov's avatar
      x86/hyperv: Reenlightenment notifications support · 93286261
      Vitaly Kuznetsov authored
      
      
      Hyper-V supports Live Migration notification. This is supposed to be used
      in conjunction with TSC emulation: when a VM is migrated to a host with
      different TSC frequency for some short period the host emulates the
      accesses to TSC and sends an interrupt to notify about the event. When the
      guest is done updating everything it can disable TSC emulation and
      everything will start working fast again.
      
      These notifications weren't required until now as Hyper-V guests are not
      supposed to use TSC as a clocksource: in Linux the TSC is even marked as
      unstable on boot. Guests normally use 'tsc page' clocksource and host
      updates its values on migrations automatically.
      
      Things change when with nested virtualization: even when the PV
      clocksources (kvm-clock or tsc page) are passed through to the nested
      guests the TSC frequency and frequency changes need to be know..
      
      Hyper-V Top Level Functional Specification (as of v5.0b) wrongly specifies
      EAX:BIT(12) of CPUID:0x40000009 as the feature identification bit. The
      right one to check is EAX:BIT(13) of CPUID:0x40000003. I was assured that
      the fix in on the way.
      
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: kvm@vger.kernel.org
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: devel@linuxdriverproject.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Cathy Avery <cavery@redhat.com>
      Cc: Mohammed Gamal <mmorsy@redhat.com>
      Link: https://lkml.kernel.org/r/20180124132337.30138-4-vkuznets@redhat.com
      93286261
    • Vitaly Kuznetsov's avatar
      x86/hyperv: Add a function to read both TSC and TSC page value simulateneously · e2768eaa
      Vitaly Kuznetsov authored
      
      
      This is going to be used from KVM code where both TSC and TSC page value
      are needed.
      
      Nothing is supposed to use the function when Hyper-V code is compiled out,
      just BUG().
      
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: kvm@vger.kernel.org
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: devel@linuxdriverproject.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Cathy Avery <cavery@redhat.com>
      Cc: Mohammed Gamal <mmorsy@redhat.com>
      Link: https://lkml.kernel.org/r/20180124132337.30138-3-vkuznets@redhat.com
      e2768eaa
    • Vitaly Kuznetsov's avatar
      x86/hyperv: Check for required priviliges in hyperv_init() · 89a8f6d4
      Vitaly Kuznetsov authored
      
      
      In hyperv_init() its presumed that it always has access to VP index and
      hypercall MSRs while according to the specification it should be checked if
      it's allowed to access the corresponding MSRs before accessing them.
      
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: kvm@vger.kernel.org
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: devel@linuxdriverproject.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Cathy Avery <cavery@redhat.com>
      Cc: Mohammed Gamal <mmorsy@redhat.com>
      Link: https://lkml.kernel.org/r/20180124132337.30138-2-vkuznets@redhat.com
      89a8f6d4
    • Linus Torvalds's avatar
      Merge branch 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 72906f38
      Linus Torvalds authored
      Pull x86 hyperv update from Ingo Molnar:
       "Enable PCID support on Hyper-V guests"
      
      * 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/hyperv: Stop suppressing X86_FEATURE_PCID
      72906f38
    • Linus Torvalds's avatar
      Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3ccabd6d
      Linus Torvalds authored
      Pull x86 cleanups from Ingo Molnar:
       "Misc cleanups"
      
      * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86: Remove unused IOMMU_STRESS Kconfig
        x86/extable: Mark exception handler functions visible
        x86/timer: Don't inline __const_udelay
        x86/headers: Remove duplicate #includes
      3ccabd6d
    • Linus Torvalds's avatar
      Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5289d300
      Linus Torvalds authored
      Pull x86 apic cleanup from Ingo Molnar:
       "A single change simplifying the APIC code bit"
      
      * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/apic: Remove local var in flat_send_IPI_allbutself()
      5289d300
    • Linus Torvalds's avatar
      Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · af8c5e2d
      Linus Torvalds authored
      Pull scheduler updates from Ingo Molnar:
       "The main changes in this cycle were:
      
         - Implement frequency/CPU invariance and OPP selection for
           SCHED_DEADLINE (Juri Lelli)
      
         - Tweak the task migration logic for better multi-tasking
           workload scalability (Mel Gorman)
      
         - Misc cleanups, fixes and improvements"
      
      * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/deadline: Make bandwidth enforcement scale-invariant
        sched/cpufreq: Move arch_scale_{freq,cpu}_capacity() outside of #ifdef CONFIG_SMP
        sched/cpufreq: Remove arch_scale_freq_capacity()'s 'sd' parameter
        sched/cpufreq: Always consider all CPUs when deciding next freq
        sched/cpufreq: Split utilization signals
        sched/cpufreq: Change the worker kthread to SCHED_DEADLINE
        sched/deadline: Move CPU frequency selection triggering points
        sched/cpufreq: Use the DEADLINE utilization signal
        sched/deadline: Implement "runtime overrun signal" support
        sched/fair: Only immediately migrate tasks due to interrupts if prev and target CPUs share cache
        sched/fair: Correct obsolete comment about cpufreq_update_util()
        sched/fair: Remove impossible condition from find_idlest_group_cpu()
        sched/cpufreq: Don't pass flags to sugov_set_iowait_boost()
        sched/cpufreq: Initialize sg_cpu->flags to 0
        sched/fair: Consider RT/IRQ pressure in capacity_spare_wake()
        sched/fair: Use 'unsigned long' for utilization, consistently
        sched/core: Rework and clarify prepare_lock_switch()
        sched/fair: Remove unused 'curr' parameter from wakeup_gran
        sched/headers: Constify object_is_on_stack()
      af8c5e2d
    • Linus Torvalds's avatar
      Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a1c75e17
      Linus Torvalds authored
      Pull x86 RAS updates from Ingo Molnar:
      
       - various AMD SMCA error parsing/reporting improvements (Yazen Ghannam)
      
       - extend Intel CMCI error reporting to more cases (Xie XiuQi)
      
      * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/MCE: Make correctable error detection look at the Deferred bit
        x86/MCE: Report only DRAM ECC as memory errors on AMD systems
        x86/MCE/AMD: Define a function to get SMCA bank type
        x86/mce/AMD: Don't set DEF_INT_TYPE in MSR_CU_DEF_ERR on SMCA systems
        x86/MCE: Extend table to report action optional errors through CMCI too
      a1c75e17
    • Linus Torvalds's avatar
      Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d8b91dde
      Linus Torvalds authored
      Pull perf updates from Ingo Molnar:
       "Kernel side changes:
      
         - Clean up the x86 instruction decoder (Masami Hiramatsu)
      
         - Add new uprobes optimization for PUSH instructions on x86 (Yonghong
           Song)
      
         - Add MSR_IA32_THERM_STATUS to the MSR events (Stephane Eranian)
      
         - Fix misc bugs, update documentation, plus various cleanups (Jiri
           Olsa)
      
        There's a large number of tooling side improvements:
      
         - Intel-PT/BTS improvements (Adrian Hunter)
      
         - Numerous 'perf trace' improvements (Arnaldo Carvalho de Melo)
      
         - Introduce an errno code to string facility (Hendrik Brueckner)
      
         - Various build system improvements (Jiri Olsa)
      
         - Add support for CoreSight trace decoding by making the perf tools
           use the external openCSD (Mathieu Poirier, Tor Jeremiassen)
      
         - Add ARM Statistical Profiling Extensions (SPE) support (Kim
           Phillips)
      
         - libtraceevent updates (Steven Rostedt)
      
         - Intel vendor event JSON updates (Andi Kleen)
      
         - Introduce 'perf report --mmaps' and 'perf report --tasks' to show
           info present in 'perf.data' (Jiri Olsa, Arnaldo Carvalho de Melo)
      
         - Add infrastructure to record first and last sample time to the
           perf.data file header, so that when processing all samples in a
           'perf record' session, such as when doing build-id processing, or
           when specifically requesting that that info be recorded, use that
           in 'perf report --time', that also got support for percent slices
           in addition to absolute ones.
      
           I.e. now it is possible to ask for the samples in the 10%-20% time
           slice of a perf.data file (Jin Yao)
      
         - Allow system wide 'perf stat --per-thread', sorting the result (Jin
           Yao)
      
           E.g.:
      
            [root@jouet ~]# perf stat --per-thread --metrics IPC
            ^C
             Performance counter stats for 'system wide':
      
                        make-22229  23,012,094,032  inst_retired.any   #  0.8 IPC
                         cc1-22419     692,027,497  inst_retired.any   #  0.8 IPC
                         gcc-22418     328,231,855  inst_retired.any   #  0.9 IPC
                         cc1-22509     220,853,647  inst_retired.any   #  0.8 IPC
                         gcc-22486     199,874,810  inst_retired.any   #  1.0 IPC
                          as-22466     177,896,365  inst_retired.any   #  0.9 IPC
                         cc1-22465     150,732,374  inst_retired.any   #  0.8 IPC
                         gcc-22508     112,555,593  inst_retired.any   #  0.9 IPC
                         cc1-22487     108,964,079  inst_retired.any   #  0.7 IPC
             qemu-system-x86-2697       21,330,550  inst_retired.any   #  0.3 IPC
             systemd-journal-551        20,642,951  inst_retired.any   #  0.4 IPC
             docker-containe-17651       9,552,892  inst_retired.any   #  0.5 IPC
             dockerd-current-9809        7,528,586  inst_retired.any   #  0.5 IPC
                        make-22153  12,504,194,380  inst_retired.any   #  0.8 IPC
                     python2-22429  12,081,290,954  inst_retired.any   #  0.8 IPC
            <SNIP>
                     python2-22429  15,026,328,103  cpu_clk_unhalted.thread
                         cc1-22419     826,660,193  cpu_clk_unhalted.thread
                         gcc-22418     365,321,295  cpu_clk_unhalted.thread
                         cc1-22509     279,169,362  cpu_clk_unhalted.thread
                         gcc-22486     210,156,950  cpu_clk_unhalted.thread
            <SNIP>
      
                 5.638075538 seconds time elapsed
      
           [root@jouet ~]#
      
         - Improve shell auto-completion of perf events (Jin Yao)
      
         - 'perf probe' improvements (Masami Hiramatsu)
      
         - Improve PMU infrastructure to support amp64's ThunderX2
           implementation defined core events (Ganapatrao Kulkarni)
      
         - Various annotation related improvements and fixes (Thomas Richter)
      
         - Clarify usage of 'overwrite' and 'backward' in the evlist/mmap
           code, removing the 'overwrite' parameter from several functions as
           it was always used it as 'false' (Wang Nan)
      
         - Fix/improve 'perf record' reverse recording support (Wang Nan)
      
         - Improve command line options documentation (Sihyeon Jang)
      
         - Optimize sample parsing for ordering events, where we don't need to
           parse all the PERF_SAMPLE_ bits, just the ones leading to the
           timestamp needed to reorder events (Jiri Olsa)
      
         - Generalize the annotation code to support other source information
           besides objdump/DWARF obtained ones, starting with python scripts,
           that will is slated to be merged soon (Jiri Olsa)
      
         - ... and a lot more that I failed to list, see the shortlog and
           changelog for details"
      
      * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (262 commits)
        perf trace beauty flock: Move to separate object file
        perf evlist: Remove fcntl.h from evlist.h
        perf trace beauty futex: Beautify FUTEX_BITSET_MATCH_ANY
        perf trace: Do not print from time delta for interrupted syscall lines
        perf trace: Add --print-sample
        perf bpf: Remove misplaced __maybe_unused attribute
        MAINTAINERS: Adding entry for CoreSight trace decoding
        perf tools: Add mechanic to synthesise CoreSight trace packets
        perf tools: Add full support for CoreSight trace decoding
        pert tools: Add queue management functionality
        perf tools: Add functionality to communicate with the openCSD decoder
        perf tools: Add support for decoding CoreSight trace data
        perf tools: Add decoder mechanic to support dumping trace data
        perf tools: Add processing of coresight metadata
        perf tools: Add initial entry point for decoder CoreSight traces
        perf tools: Integrating the CoreSight decoding library
        perf vendor events intel: Update IvyTown files to V20
        perf vendor events intel: Update IvyBridge files to V20
        perf vendor events intel: Update BroadwellDE events to V7
        perf vendor events intel: Update SkylakeX events to V1.06
        ...
      d8b91dde