Commit 38904911 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull kvm fixes from Paolo Bonzini:

 - Only do MSR filtering for MSRs accessed by rdmsr/wrmsr

 - Documentation improvements

 - Prevent module exit until all VMs are freed

 - PMU Virtualization fixes

 - Fix for kvm_irq_delivery_to_apic_fast() NULL-pointer dereferences

 - Other miscellaneous bugfixes

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits)
  KVM: x86: fix sending PV IPI
  KVM: x86/mmu: do compare-and-exchange of gPTE via the user address
  KVM: x86: Remove redundant vm_entry_controls_clearbit() call
  KVM: x86: cleanup enter_rmode()
  KVM: x86: SVM: fix tsc scaling when the host doesn't support it
  kvm: x86: SVM: remove unused defines
  KVM: x86: SVM: move tsc ratio definitions to svm.h
  KVM: x86: SVM: fix avic spec based definitions again
  KVM: MIPS: remove reference to trap&emulate virtualization
  KVM: x86: document limitations of MSR filtering
  KVM: x86: Only do MSR filtering when access MSR by rdmsr/wrmsr
  KVM: x86/emulator: Emulate RDPID only if it is enabled in guest
  KVM: x86/pmu: Fix and isolate TSX-specific performance event logic
  KVM: x86: mmu: trace kvm_mmu_set_spte after the new SPTE was set
  KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs
  KVM: x86: Trace all APICv inhibit changes and capture overall status
  KVM: x86: Add wrappers for setting/clearing APICv inhibits
  KVM: x86: Make APICv inhibit reasons an enum and cleanup naming
  KVM: X86: Handle implicit supervisor access with SMAP
  KVM: X86: Rename variable smap to not_smap in permission_fault()
  ...
parents 6f34f8c3 c15e0ae4
Loading
Loading
Loading
Loading
+55 −6
Original line number Diff line number Diff line
@@ -151,12 +151,6 @@ In order to create user controlled virtual machines on S390, check
KVM_CAP_S390_UCONTROL and use the flag KVM_VM_S390_UCONTROL as
privileged user (CAP_SYS_ADMIN).

To use hardware assisted virtualization on MIPS (VZ ASE) rather than
the default trap & emulate implementation (which changes the virtual
memory layout to fit in user mode), check KVM_CAP_MIPS_VZ and use the
flag KVM_VM_MIPS_VZ.


On arm64, the physical address size for a VM (IPA Size limit) is limited
to 40bits by default. The limit can be configured if the host supports the
extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use
@@ -4081,6 +4075,11 @@ x2APIC MSRs are always allowed, independent of the ``default_allow`` setting,
and their behavior depends on the ``X2APIC_ENABLE`` bit of the APIC base
register.

.. warning::
   MSR accesses coming from nested vmentry/vmexit are not filtered.
   This includes both writes to individual VMCS fields and reads/writes
   through the MSR lists pointed to by the VMCS.

If a bit is within one of the defined ranges, read and write accesses are
guarded by the bitmap's value for the MSR index if the kind of access
is included in the ``struct kvm_msr_filter_range`` flags.  If no range
@@ -5293,6 +5292,10 @@ type values:

KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO
  Sets the guest physical address of the vcpu_info for a given vCPU.
  As with the shared_info page for the VM, the corresponding page may be
  dirtied at any time if event channel interrupt delivery is enabled, so
  userspace should always assume that the page is dirty without relying
  on dirty logging.

KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO
  Sets the guest physical address of an additional pvclock structure
@@ -7719,3 +7722,49 @@ only be invoked on a VM prior to the creation of VCPUs.
At this time, KVM_PMU_CAP_DISABLE is the only capability.  Setting
this capability will disable PMU virtualization for that VM.  Usermode
should adjust CPUID leaf 0xA to reflect that the PMU is disabled.

9. Known KVM API problems
=========================

In some cases, KVM's API has some inconsistencies or common pitfalls
that userspace need to be aware of.  This section details some of
these issues.

Most of them are architecture specific, so the section is split by
architecture.

9.1. x86
--------

``KVM_GET_SUPPORTED_CPUID`` issues
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In general, ``KVM_GET_SUPPORTED_CPUID`` is designed so that it is possible
to take its result and pass it directly to ``KVM_SET_CPUID2``.  This section
documents some cases in which that requires some care.

Local APIC features
~~~~~~~~~~~~~~~~~~~

CPU[EAX=1]:ECX[21] (X2APIC) is reported by ``KVM_GET_SUPPORTED_CPUID``,
but it can only be enabled if ``KVM_CREATE_IRQCHIP`` or
``KVM_ENABLE_CAP(KVM_CAP_IRQCHIP_SPLIT)`` are used to enable in-kernel emulation of
the local APIC.

The same is true for the ``KVM_FEATURE_PV_UNHALT`` paravirtualized feature.

CPU[EAX=1]:ECX[24] (TSC_DEADLINE) is not reported by ``KVM_GET_SUPPORTED_CPUID``.
It can be enabled if ``KVM_CAP_TSC_DEADLINE_TIMER`` is present and the kernel
has enabled in-kernel emulation of the local APIC.

Obsolete ioctls and capabilities
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

KVM_CAP_DISABLE_QUIRKS does not let userspace know which quirks are actually
available.  Use ``KVM_CHECK_EXTENSION(KVM_CAP_DISABLE_QUIRKS2)`` instead if
available.

Ordering of KVM_GET_*/KVM_SET_* ioctls
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

TBD
+7 −19
Original line number Diff line number Diff line
@@ -8,25 +8,13 @@ KVM
   :maxdepth: 2

   api
   amd-memory-encryption
   cpuid
   halt-polling
   hypercalls
   locking
   mmu
   msr
   nested-vmx
   ppc-pv
   s390-diag
   s390-pv
   s390-pv-boot
   timekeeping
   vcpu-requests

   review-checklist
   devices/index

   arm/index
   s390/index
   ppc-pv
   x86/index

   devices/index

   running-nested-guests
   locking
   vcpu-requests
   review-checklist
+34 −9
Original line number Diff line number Diff line
@@ -210,32 +210,47 @@ time it will be set using the Dirty tracking mechanism described above.
3. Reference
------------

:Name:		kvm_lock
``kvm_lock``
^^^^^^^^^^^^

:Type:		mutex
:Arch:		any
:Protects:	- vm_list

:Name:		kvm_count_lock
``kvm_count_lock``
^^^^^^^^^^^^^^^^^^

:Type:		raw_spinlock_t
:Arch:		any
:Protects:	- hardware virtualization enable/disable
:Comment:	'raw' because hardware enabling/disabling must be atomic /wrt
		migration.

:Name:		kvm_arch::tsc_write_lock
:Type:		raw_spinlock
``kvm->mn_invalidate_lock``
^^^^^^^^^^^^^^^^^^^^^^^^^^^

:Type:          spinlock_t
:Arch:          any
:Protects:      mn_active_invalidate_count, mn_memslots_update_rcuwait

``kvm_arch::tsc_write_lock``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:Type:		raw_spinlock_t
:Arch:		x86
:Protects:	- kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset}
		- tsc offset in vmcb
:Comment:	'raw' because updating the tsc offsets must not be preempted.

:Name:		kvm->mmu_lock
:Type:		spinlock_t
``kvm->mmu_lock``
^^^^^^^^^^^^^^^^^
:Type:		spinlock_t or rwlock_t
:Arch:		any
:Protects:	-shadow page/shadow tlb entry
:Comment:	it is a spinlock since it is used in mmu notifier.

:Name:		kvm->srcu
``kvm->srcu``
^^^^^^^^^^^^^
:Type:		srcu lock
:Arch:		any
:Protects:	- kvm->memslots
@@ -246,10 +261,20 @@ time it will be set using the Dirty tracking mechanism described above.
		The srcu index can be stored in kvm_vcpu->srcu_idx per vcpu
		if it is needed by multiple functions.

:Name:		blocked_vcpu_on_cpu_lock
``kvm->slots_arch_lock``
^^^^^^^^^^^^^^^^^^^^^^^^
:Type:          mutex
:Arch:          any (only needed on x86 though)
:Protects:      any arch-specific fields of memslots that have to be modified
                in a ``kvm->srcu`` read-side critical section.
:Comment:       must be held before reading the pointer to the current memslots,
                until after all changes to the memslots are complete

``wakeup_vcpus_on_cpu_lock``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:Type:		spinlock_t
:Arch:		x86
:Protects:	blocked_vcpu_on_cpu
:Protects:	wakeup_vcpus_on_cpu
:Comment:	This is a per-CPU lock and it is used for VT-d posted-interrupts.
		When VT-d posted-interrupts is supported and the VM has assigned
		devices, we put the blocked vCPU on the list blocked_vcpu_on_cpu
+12 −0
Original line number Diff line number Diff line
.. SPDX-License-Identifier: GPL-2.0

====================
KVM for s390 systems
====================

.. toctree::
   :maxdepth: 2

   s390-diag
   s390-pv
   s390-pv-boot
Loading