Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm (49d57592) · Commits · EulixOS / Software / Kernel

Documentation/admin-guide/kernel-parameters.txt

+6 −1

Original line number	Diff line number	Diff line
		@@ -2536,9 +2536,14 @@
		protected: nVHE-based mode with support for guests whose
		state is kept private from the host.

		nested: VHE-based mode with support for nested
		virtualization. Requires at least ARMv8.3
		hardware.

		Defaults to VHE/nVHE based on hardware support. Setting
		mode to "protected" will disable kexec and hibernation
		for the host.
		for the host. "nested" is experimental and should be
		used with extreme caution.

		kvm-arm.vgic_v3_group0_trap=
		[KVM,ARM] Trap guest accesses to GICv3 group-0

Documentation/virt/kvm/api.rst

+108 −16

Original line number	Diff line number	Diff line
		@@ -3736,7 +3736,7 @@ The fields in each entry are defined as follows:
		:Parameters: struct kvm_s390_mem_op (in)
		:Returns: = 0 on success,
		< 0 on generic error (e.g. -EFAULT or -ENOMEM),
		> 0 if an exception occurred while walking the page tables
		16 bit program exception code if the access causes such an exception

		Read or write data from/to the VM's memory.
		The KVM_CAP_S390_MEM_OP_EXTENSION capability specifies what functionality is
		@@ -3754,6 +3754,8 @@ Parameters are specified via the following structure::
		struct {
		__u8 ar; /* the access register number */
		__u8 key; /* access key, ignored if flag unset */
		__u8 pad1[6]; /* ignored */
		__u64 old_addr; /* ignored if flag unset */
		};
		__u32 sida_offset; /* offset into the sida */
		__u8 reserved[32]; /* ignored */
		@@ -3781,6 +3783,7 @@ Possible operations are:
		* ``KVM_S390_MEMOP_ABSOLUTE_WRITE``
		* ``KVM_S390_MEMOP_SIDA_READ``
		* ``KVM_S390_MEMOP_SIDA_WRITE``
		* ``KVM_S390_MEMOP_ABSOLUTE_CMPXCHG``

		Logical read/write:
		^^^^^^^^^^^^^^^^^^^
		@@ -3829,7 +3832,7 @@ the checks required for storage key protection as one operation (as opposed to
		user space getting the storage keys, performing the checks, and accessing
		memory thereafter, which could lead to a delay between check and access).
		Absolute accesses are permitted for the VM ioctl if KVM_CAP_S390_MEM_OP_EXTENSION
		is > 0.
		has the KVM_S390_MEMOP_EXTENSION_CAP_BASE bit set.
		Currently absolute accesses are not permitted for VCPU ioctls.
		Absolute accesses are permitted for non-protected guests only.

		@@ -3837,7 +3840,26 @@ Supported flags:
		* ``KVM_S390_MEMOP_F_CHECK_ONLY``
		* ``KVM_S390_MEMOP_F_SKEY_PROTECTION``

		The semantics of the flags are as for logical accesses.
		The semantics of the flags common with logical accesses are as for logical
		accesses.

		Absolute cmpxchg:
		^^^^^^^^^^^^^^^^^

		Perform cmpxchg on absolute guest memory. Intended for use with the
		KVM_S390_MEMOP_F_SKEY_PROTECTION flag.
		Instead of doing an unconditional write, the access occurs only if the target
		location contains the value pointed to by "old_addr".
		This is performed as an atomic cmpxchg with the length specified by the "size"
		parameter. "size" must be a power of two up to and including 16.
		If the exchange did not take place because the target value doesn't match the
		old value, the value "old_addr" points to is replaced by the target value.
		User space can tell if an exchange took place by checking if this replacement
		occurred. The cmpxchg op is permitted for the VM ioctl if
		KVM_CAP_S390_MEM_OP_EXTENSION has flag KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG set.

		Supported flags:
		* ``KVM_S390_MEMOP_F_SKEY_PROTECTION``

		SIDA read/write:
		^^^^^^^^^^^^^^^^
		@@ -4457,6 +4479,18 @@ not holding a previously reported uncorrected error).
		:Parameters: struct kvm_s390_cmma_log (in, out)
		:Returns: 0 on success, a negative value on error

		Errors:

		====== =============================================================
		ENOMEM not enough memory can be allocated to complete the task
		ENXIO if CMMA is not enabled
		EINVAL if KVM_S390_CMMA_PEEK is not set but migration mode was not enabled
		EINVAL if KVM_S390_CMMA_PEEK is not set but dirty tracking has been
		disabled (and thus migration mode was automatically disabled)
		EFAULT if the userspace address is invalid or if no page table is
		present for the addresses (e.g. when using hugepages).
		====== =============================================================

		This ioctl is used to get the values of the CMMA bits on the s390
		architecture. It is meant to be used in two scenarios:

		@@ -4537,12 +4571,6 @@ mask is unused.

		values points to the userspace buffer where the result will be stored.

		This ioctl can fail with -ENOMEM if not enough memory can be allocated to
		complete the task, with -ENXIO if CMMA is not enabled, with -EINVAL if
		KVM_S390_CMMA_PEEK is not set but migration mode was not enabled, with
		-EFAULT if the userspace address is invalid or if no page table is
		present for the addresses (e.g. when using hugepages).

		4.108 KVM_S390_SET_CMMA_BITS
		----------------------------

		@@ -5005,6 +5033,15 @@ using this ioctl.
		:Parameters: struct kvm_pmu_event_filter (in)
		:Returns: 0 on success, -1 on error

		Errors:

		====== ============================================================
		EFAULT args[0] cannot be accessed
		EINVAL args[0] contains invalid data in the filter or filter events
		E2BIG nevents is too large
		EBUSY not enough memory to allocate the filter
		====== ============================================================

		::

		struct kvm_pmu_event_filter {
		@@ -5016,14 +5053,69 @@ using this ioctl.
		__u64 events[0];
		};

		This ioctl restricts the set of PMU events that the guest can program.
		The argument holds a list of events which will be allowed or denied.
		The eventsel+umask of each event the guest attempts to program is compared
		against the events field to determine whether the guest should have access.
		The events field only controls general purpose counters; fixed purpose
		counters are controlled by the fixed_counter_bitmap.
		This ioctl restricts the set of PMU events the guest can program by limiting
		which event select and unit mask combinations are permitted.

		The argument holds a list of filter events which will be allowed or denied.

		Filter events only control general purpose counters; fixed purpose counters
		are controlled by the fixed_counter_bitmap.

		Valid values for 'flags'::

		``0``

		To use this mode, clear the 'flags' field.

		In this mode each event will contain an event select + unit mask.

		When the guest attempts to program the PMU the guest's event select +
		unit mask is compared against the filter events to determine whether the
		guest should have access.

		``KVM_PMU_EVENT_FLAG_MASKED_EVENTS``
		:Capability: KVM_CAP_PMU_EVENT_MASKED_EVENTS

		In this mode each filter event will contain an event select, mask, match, and
		exclude value. To encode a masked event use::

		KVM_PMU_ENCODE_MASKED_ENTRY()

		An encoded event will follow this layout::

		Bits Description
		---- -----------
		7:0 event select (low bits)
		15:8 umask match
		31:16 unused
		35:32 event select (high bits)
		36:54 unused
		55 exclude bit
		63:56 umask mask

		When the guest attempts to program the PMU, these steps are followed in
		determining if the guest should have access:

		1. Match the event select from the guest against the filter events.
		2. If a match is found, match the guest's unit mask to the mask and match
		values of the included filter events.
		I.e. (unit mask & mask) == match && !exclude.
		3. If a match is found, match the guest's unit mask to the mask and match
		values of the excluded filter events.
		I.e. (unit mask & mask) == match && exclude.
		4.
		a. If an included match is found and an excluded match is not found, filter
		the event.
		b. For everything else, do not filter the event.
		5.
		a. If the event is filtered and it's an allow list, allow the guest to
		program the event.
		b. If the event is filtered and it's a deny list, do not allow the guest to
		program the event.

		No flags are defined yet, the field must be zero.
		When setting a new pmu event filter, -EINVAL will be returned if any of the
		unused fields are set or if any of the high bits (35:32) in the event
		select are set when called on Intel.

		Valid values for 'action'::

Documentation/virt/kvm/devices/vm.rst

+4 −0

Original line number	Diff line number	Diff line
		@@ -302,6 +302,10 @@ Allows userspace to start migration mode, needed for PGSTE migration.
		Setting this attribute when migration mode is already active will have
		no effects.

		Dirty tracking must be enabled on all memslots, else -EINVAL is returned. When
		dirty tracking is disabled on any memslot, migration mode is automatically
		stopped.

		:Parameters: none
		:Returns: -ENOMEM if there is not enough free memory to start migration mode;
		-EINVAL if the state of the VM is invalid (e.g. no memory defined);

Documentation/virt/kvm/locking.rst

+16 −9

Original line number	Diff line number	Diff line
		@@ -9,6 +9,8 @@ KVM Lock Overview

		The acquisition orders for mutexes are as follows:

		- cpus_read_lock() is taken outside kvm_lock

		- kvm->lock is taken outside vcpu->mutex

		- kvm->lock is taken outside kvm->slots_lock and kvm->irq_lock
		@@ -226,15 +228,10 @@ time it will be set using the Dirty tracking mechanism described above.
		:Type: mutex
		:Arch: any
		:Protects: - vm_list

		``kvm_count_lock``
		^^^^^^^^^^^^^^^^^^

		:Type: raw_spinlock_t
		:Arch: any
		:Protects: - hardware virtualization enable/disable
		:Comment: 'raw' because hardware enabling/disabling must be atomic /wrt
		migration.
		- kvm_usage_count
		- hardware virtualization enable/disable
		:Comment: KVM also disables CPU hotplug via cpus_read_lock() during
		enable/disable.

		``kvm->mn_invalidate_lock``
		^^^^^^^^^^^^^^^^^^^^^^^^^^^
		@@ -292,3 +289,13 @@ time it will be set using the Dirty tracking mechanism described above.
		wakeup notification event since external interrupts from the
		assigned devices happens, we will find the vCPU on the list to
		wakeup.

		``vendor_module_lock``
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^
		:Type: mutex
		:Arch: x86
		:Protects: loading a vendor module (kvm_amd or kvm_intel)
		:Comment: Exists because using kvm_lock leads to deadlock. cpu_hotplug_lock is
		taken outside of kvm_lock, e.g. in KVM's CPU online/offline callbacks, and
		many operations need to take cpu_hotplug_lock when loading a vendor module,
		e.g. updating static calls.

Documentation/virt/kvm/x86/errata.rst

+11 −0

Original line number	Diff line number	Diff line
		@@ -37,3 +37,14 @@ Nested virtualization features
		------------------------------

		TBD

		x2APIC
		------
		When KVM_X2APIC_API_USE_32BIT_IDS is enabled, KVM activates a hack/quirk that
		allows sending events to a single vCPU using its x2APIC ID even if the target
		vCPU has legacy xAPIC enabled, e.g. to bring up hotplugged vCPUs via INIT-SIPI
		on VMs with > 255 vCPUs. A side effect of the quirk is that, if multiple vCPUs
		have the same physical APIC ID, KVM will deliver events targeting that APIC ID
		only to the vCPU with the lowest vCPU ID. If KVM_X2APIC_API_USE_32BIT_IDS is
		not enabled, KVM follows x86 architecture when processing interrupts (all vCPUs
		matching the target APIC ID receive the interrupt).