Skip to content
  1. May 13, 2016
    • Christian Borntraeger's avatar
      KVM: halt_polling: provide a way to qualify wakeups during poll · 3491caf2
      Christian Borntraeger authored
      
      
      Some wakeups should not be considered a sucessful poll. For example on
      s390 I/O interrupts are usually floating, which means that _ALL_ CPUs
      would be considered runnable - letting all vCPUs poll all the time for
      transactional like workload, even if one vCPU would be enough.
      This can result in huge CPU usage for large guests.
      This patch lets architectures provide a way to qualify wakeups if they
      should be considered a good/bad wakeups in regard to polls.
      
      For s390 the implementation will fence of halt polling for anything but
      known good, single vCPU events. The s390 implementation for floating
      interrupts does a wakeup for one vCPU, but the interrupt will be delivered
      by whatever CPU checks first for a pending interrupt. We prefer the
      woken up CPU by marking the poll of this CPU as "good" poll.
      This code will also mark several other wakeup reasons like IPI or
      expired timers as "good". This will of course also mark some events as
      not sucessful. As  KVM on z runs always as a 2nd level hypervisor,
      we prefer to not poll, unless we are really sure, though.
      
      This patch successfully limits the CPU usage for cases like uperf 1byte
      transactional ping pong workload or wakeup heavy workload like OLTP
      while still providing a proper speedup.
      
      This also introduced a new vcpu stat "halt_poll_no_tuning" that marks
      wakeups that are considered not good for polling.
      
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: Radim Krčmář <rkrcmar@redhat.com> (for an earlier version)
      Cc: David Matlack <dmatlack@google.com>
      Cc: Wanpeng Li <kernellwp@gmail.com>
      [Rename config symbol. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3491caf2
    • Paolo Bonzini's avatar
      Merge branch 'kvm-ppc-next' of... · d7e1633a
      Paolo Bonzini authored
      Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
      d7e1633a
  2. May 12, 2016
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Re-enable XICS fast path for irqfd-generated interrupts · b1a4286b
      Paul Mackerras authored
      Commit c9a5ecca
      
       ("kvm/eventfd: add arch-specific set_irq",
      2015-10-16) added the possibility for architecture-specific code
      to handle the generation of virtual interrupts in atomic context
      where possible, without having to schedule a work function.
      
      Since we can easily generate virtual interrupts on XICS without
      having to do anything worse than take a spinlock, we define a
      kvm_arch_set_irq_inatomic() for XICS.  We also remove kvm_set_msi()
      since it is not used any more.
      
      The one slightly tricky thing is that with the new interface, we
      don't get told whether the interrupt is an MSI (or other edge
      sensitive interrupt) vs. level-sensitive.  The difference as far
      as interrupt generation is concerned is that for LSIs we have to
      set the asserted flag so it will continue to fire until it is
      explicitly cleared.
      
      In fact the XICS code gets told which interrupts are LSIs by userspace
      when it configures the interrupt via the KVM_DEV_XICS_GRP_SOURCES
      attribute group on the XICS device.  To store this information, we add
      a new "lsi" field to struct ics_irq_state.  With that we can also do a
      better job of returning accurate values when reading the attribute
      group.
      
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      b1a4286b
    • Alex Williamson's avatar
      kvm: Conditionally register IRQ bypass consumer · 14717e20
      Alex Williamson authored
      
      
      If we don't support a mechanism for bypassing IRQs, don't register as
      a consumer.  This eliminates meaningless dev_info()s when the connect
      fails between producer and consumer, such as on AMD systems where
      kvm_x86_ops->update_pi_irte is not implemented
      
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      14717e20
    • Alex Williamson's avatar
      irqbypass: Disallow NULL token · b52f3ed0
      Alex Williamson authored
      
      
      A NULL token is meaningless and can only lead to unintended problems.
      Error on registration with a NULL token, ignore de-registrations with
      a NULL token.
      
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b52f3ed0
    • Greg Kurz's avatar
      kvm: introduce KVM_MAX_VCPU_ID · 0b1b1dfd
      Greg Kurz authored
      
      
      The KVM_MAX_VCPUS define provides the maximum number of vCPUs per guest, and
      also the upper limit for vCPU ids. This is okay for all archs except PowerPC
      which can have higher ids, depending on the cpu/core/thread topology. In the
      worst case (single threaded guest, host with 8 threads per core), it limits
      the maximum number of vCPUS to KVM_MAX_VCPUS / 8.
      
      This patch separates the vCPU numbering from the total number of vCPUs, with
      the introduction of KVM_MAX_VCPU_ID, as the maximal valid value for vCPU ids
      plus one.
      
      The corresponding KVM_CAP_MAX_VCPU_ID allows userspace to validate vCPU ids
      before passing them to KVM_CREATE_VCPU.
      
      This patch only implements KVM_MAX_VCPU_ID with a specific value for PowerPC.
      Other archs continue to return KVM_MAX_VCPUS instead.
      
      Suggested-by: default avatarRadim Krcmar <rkrcmar@redhat.com>
      Signed-off-by: default avatarGreg Kurz <gkurz@linux.vnet.ibm.com>
      Reviewed-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0b1b1dfd
    • Greg Kurz's avatar
      KVM: remove NULL return path for vcpu ids >= KVM_MAX_VCPUS · 9b9e3fc4
      Greg Kurz authored
      Commit c896939f
      
       ("KVM: use heuristic for fast VCPU lookup by id") added
      a return path that prevents vcpu ids to exceed KVM_MAX_VCPUS. This is a
      problem for powerpc where vcpu ids can grow up to 8*KVM_MAX_VCPUS.
      
      This patch simply reverses the logic so that we only try fast path if the
      vcpu id can be tried as an index in kvm->vcpus[]. The slow path is not
      affected by the change.
      
      Reviewed-by: default avatarDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Reviewed-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: default avatarGreg Kurz <gkurz@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9b9e3fc4
    • Paolo Bonzini's avatar
      Merge tag 'kvm-arm-for-4.7' of... · bdb4094e
      Paolo Bonzini authored
      Merge tag 'kvm-arm-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
      
      KVM/ARM Changes for Linux v4.7
      
      Reworks our stage 2 page table handling to have page table manipulation
      macros separate from those of the host systems as the underlying
      hardware page tables can be configured to be noticably different in
      layout from the stage 1 page tables used by the host.
      
      Adds 16K page size support based on the above.
      
      Adds a generic firmware probing layer for the timer and GIC so that KVM
      initializes using the same logic based on both ACPI and FDT.
      
      Finally adds support for hardware updating of the access flag.
      bdb4094e
  3. May 11, 2016
  4. May 10, 2016
    • Paolo Bonzini's avatar
      Merge tag 'kvm-s390-next-4.7-2' of... · 6ac0f61f
      Paolo Bonzini authored
      Merge tag 'kvm-s390-next-4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
      
      KVM: s390: features and fixes for 4.7 part2
      
      - Use hardware provided information about facility bits that do not
        need any hypervisor activitiy
      - Add missing documentation for KVM_CAP_S390_RI
      - Some updates/fixes for handling cpu models and facilities
      6ac0f61f
    • James Hogan's avatar
      MIPS: KVM: Add missing disable FPU hazard barriers · 4ac33429
      James Hogan authored
      
      
      Add the necessary hazard barriers after disabling the FPU in
      kvm_lose_fpu(), just to be safe.
      
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4ac33429
    • James Hogan's avatar
      MIPS: KVM: Fix preemption warning reading FPU capability · 556f2a52
      James Hogan authored
      
      
      Reading the KVM_CAP_MIPS_FPU capability returns cpu_has_fpu, however
      this uses smp_processor_id() to read the current CPU capabilities (since
      some old MIPS systems could have FPUs present on only a subset of CPUs).
      
      We don't support any such systems, so work around the warning by using
      raw_cpu_has_fpu instead.
      
      We should probably instead claim not to support FPU at all if any one
      CPU is lacking an FPU, but this should do for now.
      
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      556f2a52
    • James Hogan's avatar
      MIPS: KVM: Fix preemptable kvm_mips_get_*_asid() calls · f049729c
      James Hogan authored
      
      
      There are a couple of places in KVM fault handling code which implicitly
      use smp_processor_id() via kvm_mips_get_kernel_asid() and
      kvm_mips_get_user_asid() from preemptable context. This is unsafe as a
      preemption could cause the guest kernel ASID to be changed, resulting in
      a host TLB entry being written with the wrong ASID.
      
      Fix by disabling preemption around the kvm_mips_get_*_asid() call and
      the corresponding kvm_mips_host_tlb_write().
      
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f049729c
    • James Hogan's avatar
      MIPS: KVM: Fix timer IRQ race when writing CP0_Compare · b45bacd2
      James Hogan authored
      Writing CP0_Compare clears the timer interrupt pending bit
      (CP0_Cause.TI), but this wasn't being done atomically. If a timer
      interrupt raced with the write of the guest CP0_Compare, the timer
      interrupt could end up being pending even though the new CP0_Compare is
      nowhere near CP0_Count.
      
      We were already updating the hrtimer expiry with
      kvm_mips_update_hrtimer(), which used both kvm_mips_freeze_hrtimer() and
      kvm_mips_resume_hrtimer(). Close the race window by expanding out
      kvm_mips_update_hrtimer(), and clearing CP0_Cause.TI and setting
      CP0_Compare between the freeze and resume. Since the pending timer
      interrupt should not be cleared when CP0_Compare is written via the KVM
      user API, an ack argument is added to distinguish the source of the
      write.
      
      Fixes: e30492bb
      
       ("MIPS: KVM: Rewrite count/compare timer emulation")
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Cc: <stable@vger.kernel.org> # 3.16.x-
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b45bacd2
    • James Hogan's avatar
      MIPS: KVM: Fix timer IRQ race when freezing timer · 4355c44f
      James Hogan authored
      There's a particularly narrow and subtle race condition when the
      software emulated guest timer is frozen which can allow a guest timer
      interrupt to be missed.
      
      This happens due to the hrtimer expiry being inexact, so very
      occasionally the freeze time will be after the moment when the emulated
      CP0_Count transitions to the same value as CP0_Compare (so an IRQ should
      be generated), but before the moment when the hrtimer is due to expire
      (so no IRQ is generated). The IRQ won't be generated when the timer is
      resumed either, since the resume CP0_Count will already match CP0_Compare.
      
      With VZ guests in particular this is far more likely to happen, since
      the soft timer may be frozen frequently in order to restore the timer
      state to the hardware guest timer. This happens after 5-10 hours of
      guest soak testing, resulting in an overflow in guest kernel timekeeping
      calculations, hanging the guest. A more focussed test case to
      intentionally hit the race (with the help of a new hypcall to cause the
      timer state to migrated between hardware & software) hits the condition
      fairly reliably within around 30 seconds.
      
      Instead of relying purely on the inexact hrtimer expiry to determine
      whether an IRQ should be generated, read the guest CP0_Compare and
      directly check whether the freeze time is before or after it. Only if
      CP0_Count is on or after CP0_Compare do we check the hrtimer expiry to
      determine whether the last IRQ has already been generated (which will
      have pushed back the expiry by one timer period).
      
      Fixes: e30492bb
      
       ("MIPS: KVM: Rewrite count/compare timer emulation")
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Cc: <stable@vger.kernel.org> # 3.16.x-
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4355c44f
    • Catalin Marinas's avatar
      kvm: arm64: Enable hardware updates of the Access Flag for Stage 2 page tables · 06485053
      Catalin Marinas authored
      
      
      The ARMv8.1 architecture extensions introduce support for hardware
      updates of the access and dirty information in page table entries. With
      VTCR_EL2.HA enabled (bit 21), when the CPU accesses an IPA with the
      PTE_AF bit cleared in the stage 2 page table, instead of raising an
      Access Flag fault to EL2 the CPU sets the actual page table entry bit
      (10). To ensure that kernel modifications to the page table do not
      inadvertently revert a bit set by hardware updates, certain Stage 2
      software pte/pmd operations must be performed atomically.
      
      The main user of the AF bit is the kvm_age_hva() mechanism. The
      kvm_age_hva_handler() function performs a "test and clear young" action
      on the pte/pmd. This needs to be atomic in respect of automatic hardware
      updates of the AF bit. Since the AF bit is in the same position for both
      Stage 1 and Stage 2, the patch reuses the existing
      ptep_test_and_clear_young() functionality if
      __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG is defined. Otherwise, the
      existing pte_young/pte_mkold mechanism is preserved.
      
      The kvm_set_s2pte_readonly() (and the corresponding pmd equivalent) have
      to perform atomic modifications in order to avoid a race with updates of
      the AF bit. The arm64 implementation has been re-written using
      exclusives.
      
      Currently, kvm_set_s2pte_writable() (and pmd equivalent) take a pointer
      argument and modify the pte/pmd in place. However, these functions are
      only used on local variables rather than actual page table entries, so
      it makes more sense to follow the pte_mkwrite() approach for stage 1
      attributes. The change to kvm_s2pte_mkwrite() makes it clear that these
      functions do not modify the actual page table entries.
      
      The (pte|pmd)_mkyoung() uses on Stage 2 entries (setting the AF bit
      explicitly) do not need to be modified since hardware updates of the
      dirty status are not supported by KVM, so there is no possibility of
      losing such information.
      
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      06485053
  5. May 09, 2016
  6. May 04, 2016
  7. May 03, 2016