Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm (2258c2dc) · Commits · EulixOS / Software / Kernel

Documentation/virt/kvm/api.rst

+26 −20

Original line number	Diff line number	Diff line
		@@ -5343,9 +5343,9 @@ KVM_XEN_ATTR_TYPE_SHARED_INFO
		32 vCPUs in the shared_info page, KVM does not automatically do so
		and instead requires that KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO be used
		explicitly even when the vcpu_info for a given vCPU resides at the
		"default" location in the shared_info page. This is because KVM is
		not aware of the Xen CPU id which is used as the index into the
		vcpu_info[] array, so cannot know the correct default location.
		"default" location in the shared_info page. This is because KVM may
		not be aware of the Xen CPU id which is used as the index into the
		vcpu_info[] array, so may know the correct default location.

		Note that the shared info page may be constantly written to by KVM;
		it contains the event channel bitmap used to deliver interrupts to
		@@ -5356,23 +5356,29 @@ KVM_XEN_ATTR_TYPE_SHARED_INFO
		any vCPU has been running or any event channel interrupts can be
		routed to the guest.

		Setting the gfn to KVM_XEN_INVALID_GFN will disable the shared info
		page.

		KVM_XEN_ATTR_TYPE_UPCALL_VECTOR
		Sets the exception vector used to deliver Xen event channel upcalls.
		This is the HVM-wide vector injected directly by the hypervisor
		(not through the local APIC), typically configured by a guest via
		HVM_PARAM_CALLBACK_IRQ.
		HVM_PARAM_CALLBACK_IRQ. This can be disabled again (e.g. for guest
		SHUTDOWN_soft_reset) by setting it to zero.

		KVM_XEN_ATTR_TYPE_EVTCHN
		This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates
		support for KVM_XEN_HVM_CONFIG_EVTCHN_SEND features. It configures
		an outbound port number for interception of EVTCHNOP_send requests
		from the guest. A given sending port number may be directed back
		to a specified vCPU (by APIC ID) / port / priority on the guest,
		or to trigger events on an eventfd. The vCPU and priority can be
		changed by setting KVM_XEN_EVTCHN_UPDATE in a subsequent call,
		but other fields cannot change for a given sending port. A port
		mapping is removed by using KVM_XEN_EVTCHN_DEASSIGN in the flags
		field.
		from the guest. A given sending port number may be directed back to
		a specified vCPU (by APIC ID) / port / priority on the guest, or to
		trigger events on an eventfd. The vCPU and priority can be changed
		by setting KVM_XEN_EVTCHN_UPDATE in a subsequent call, but but other
		fields cannot change for a given sending port. A port mapping is
		removed by using KVM_XEN_EVTCHN_DEASSIGN in the flags field. Passing
		KVM_XEN_EVTCHN_RESET in the flags field removes all interception of
		outbound event channels. The values of the flags field are mutually
		exclusive and cannot be combined as a bitmask.

		KVM_XEN_ATTR_TYPE_XEN_VERSION
		This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates
		@@ -5388,7 +5394,7 @@ KVM_XEN_ATTR_TYPE_RUNSTATE_UPDATE_FLAG
		support for KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG. It enables the
		XEN_RUNSTATE_UPDATE flag which allows guest vCPUs to safely read
		other vCPUs' vcpu_runstate_info. Xen guests enable this feature via
		the VM_ASST_TYPE_runstate_update_flag of the HYPERVISOR_vm_assist
		the VMASST_TYPE_runstate_update_flag of the HYPERVISOR_vm_assist
		hypercall.

		4.127 KVM_XEN_HVM_GET_ATTR
		@@ -5446,15 +5452,18 @@ KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO
		As with the shared_info page for the VM, the corresponding page may be
		dirtied at any time if event channel interrupt delivery is enabled, so
		userspace should always assume that the page is dirty without relying
		on dirty logging.
		on dirty logging. Setting the gpa to KVM_XEN_INVALID_GPA will disable
		the vcpu_info.

		KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO
		Sets the guest physical address of an additional pvclock structure
		for a given vCPU. This is typically used for guest vsyscall support.
		Setting the gpa to KVM_XEN_INVALID_GPA will disable the structure.

		KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR
		Sets the guest physical address of the vcpu_runstate_info for a given
		vCPU. This is how a Xen guest tracks CPU state such as steal time.
		Setting the gpa to KVM_XEN_INVALID_GPA will disable the runstate area.

		KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT
		Sets the runstate (RUNSTATE_running/_runnable/_blocked/_offline) of
		@@ -5487,7 +5496,8 @@ KVM_XEN_VCPU_ATTR_TYPE_TIMER
		This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates
		support for KVM_XEN_HVM_CONFIG_EVTCHN_SEND features. It sets the
		event channel port/priority for the VIRQ_TIMER of the vCPU, as well
		as allowing a pending timer to be saved/restored.
		as allowing a pending timer to be saved/restored. Setting the timer
		port to zero disables kernel handling of the singleshot timer.

		KVM_XEN_VCPU_ATTR_TYPE_UPCALL_VECTOR
		This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates
		@@ -5495,7 +5505,8 @@ KVM_XEN_VCPU_ATTR_TYPE_UPCALL_VECTOR
		per-vCPU local APIC upcall vector, configured by a Xen guest with
		the HVMOP_set_evtchn_upcall_vector hypercall. This is typically
		used by Windows guests, and is distinct from the HVM-wide upcall
		vector configured with HVM_PARAM_CALLBACK_IRQ.
		vector configured with HVM_PARAM_CALLBACK_IRQ. It is disabled by
		setting the vector to zero.


		4.129 KVM_XEN_VCPU_GET_ATTR
		@@ -6577,11 +6588,6 @@ Please note that the kernel is allowed to use the kvm_run structure as the
		primary storage for certain register types. Therefore, the kernel may use the
		values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.

		::

		};



		6. Capabilities that can be enabled on vCPUs
		============================================

Documentation/virt/kvm/locking.rst

+14 −5

Original line number	Diff line number	Diff line
		@@ -16,17 +16,26 @@ The acquisition orders for mutexes are as follows:
		- kvm->slots_lock is taken outside kvm->irq_lock, though acquiring
		them together is quite rare.

		- Unlike kvm->slots_lock, kvm->slots_arch_lock is released before
		synchronize_srcu(&kvm->srcu). Therefore kvm->slots_arch_lock
		can be taken inside a kvm->srcu read-side critical section,
		while kvm->slots_lock cannot.

		- kvm->mn_active_invalidate_count ensures that pairs of
		invalidate_range_start() and invalidate_range_end() callbacks
		use the same memslots array. kvm->slots_lock and kvm->slots_arch_lock
		are taken on the waiting side in install_new_memslots, so MMU notifiers
		must not take either kvm->slots_lock or kvm->slots_arch_lock.

		For SRCU:

		- ``synchronize_srcu(&kvm->srcu)`` is called _inside_
		the kvm->slots_lock critical section, therefore kvm->slots_lock
		cannot be taken inside a kvm->srcu read-side critical section.
		Instead, kvm->slots_arch_lock is released before the call
		to ``synchronize_srcu()`` and _can_ be taken inside a
		kvm->srcu read-side critical section.

		- kvm->lock is taken inside kvm->srcu, therefore
		``synchronize_srcu(&kvm->srcu)`` cannot be called inside
		a kvm->lock critical section. If you cannot delay the
		call until after kvm->lock is released, use ``call_srcu``.

		On x86:

		- vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock

MAINTAINERS

+1 −1

Original line number	Diff line number	Diff line
		@@ -11468,7 +11468,7 @@ F: arch/x86/kvm/hyperv.*
		F: arch/x86/kvm/kvm_onhyperv.*
		F: arch/x86/kvm/svm/hyperv.*
		F: arch/x86/kvm/svm/svm_onhyperv.*
		F: arch/x86/kvm/vmx/evmcs.*
		F: arch/x86/kvm/vmx/hyperv.*

		KVM X86 Xen (KVM/Xen)
		M: David Woodhouse <dwmw2@infradead.org>

arch/x86/kvm/hyperv.c

+36 −27

Original line number	Diff line number	Diff line
		@@ -1769,6 +1769,7 @@ static bool hv_is_vp_in_sparse_set(u32 vp_id, u64 valid_bank_mask, u64 sparse_ba
		}

		struct kvm_hv_hcall {
		/* Hypercall input data */
		u64 param;
		u64 ingpa;
		u64 outgpa;
		@@ -1779,12 +1780,21 @@ struct kvm_hv_hcall {
		bool fast;
		bool rep;
		sse128_t xmm[HV_HYPERCALL_MAX_XMM_REGISTERS];

		/*
		* Current read offset when KVM reads hypercall input data gradually,
		* either offset in bytes from 'ingpa' for regular hypercalls or the
		* number of already consumed 'XMM halves' for 'fast' hypercalls.
		*/
		union {
		gpa_t data_offset;
		int consumed_xmm_halves;
		};
		};


		static int kvm_hv_get_hc_data(struct kvm kvm, struct kvm_hv_hcall hc,
		u16 orig_cnt, u16 cnt_cap, u64 *data,
		int consumed_xmm_halves, gpa_t offset)
		u16 orig_cnt, u16 cnt_cap, u64 *data)
		{
		/*
		* Preserve the original count when ignoring entries via a "cap", KVM
		@@ -1799,11 +1809,11 @@ static int kvm_hv_get_hc_data(struct kvm kvm, struct kvm_hv_hcall hc,
		* Each XMM holds two sparse banks, but do not count halves that
		* have already been consumed for hypercall parameters.
		*/
		if (orig_cnt > 2 * HV_HYPERCALL_MAX_XMM_REGISTERS - consumed_xmm_halves)
		if (orig_cnt > 2 * HV_HYPERCALL_MAX_XMM_REGISTERS - hc->consumed_xmm_halves)
		return HV_STATUS_INVALID_HYPERCALL_INPUT;

		for (i = 0; i < cnt; i++) {
		j = i + consumed_xmm_halves;
		j = i + hc->consumed_xmm_halves;
		if (j % 2)
		data[i] = sse128_hi(hc->xmm[j / 2]);
		else
		@@ -1812,27 +1822,24 @@ static int kvm_hv_get_hc_data(struct kvm kvm, struct kvm_hv_hcall hc,
		return 0;
		}

		return kvm_read_guest(kvm, hc->ingpa + offset, data,
		return kvm_read_guest(kvm, hc->ingpa + hc->data_offset, data,
		cnt * sizeof(*data));
		}

		static u64 kvm_get_sparse_vp_set(struct kvm kvm, struct kvm_hv_hcall hc,
		u64 *sparse_banks, int consumed_xmm_halves,
		gpa_t offset)
		u64 *sparse_banks)
		{
		if (hc->var_cnt > HV_MAX_SPARSE_VCPU_BANKS)
		return -EINVAL;

		/* Cap var_cnt to ignore banks that cannot contain a legal VP index. */
		return kvm_hv_get_hc_data(kvm, hc, hc->var_cnt, KVM_HV_MAX_SPARSE_VCPU_SET_BITS,
		sparse_banks, consumed_xmm_halves, offset);
		sparse_banks);
		}

		static int kvm_hv_get_tlb_flush_entries(struct kvm kvm, struct kvm_hv_hcall hc, u64 entries[],
		int consumed_xmm_halves, gpa_t offset)
		static int kvm_hv_get_tlb_flush_entries(struct kvm kvm, struct kvm_hv_hcall hc, u64 entries[])
		{
		return kvm_hv_get_hc_data(kvm, hc, hc->rep_cnt, hc->rep_cnt,
		entries, consumed_xmm_halves, offset);
		return kvm_hv_get_hc_data(kvm, hc, hc->rep_cnt, hc->rep_cnt, entries);
		}

		static void hv_tlb_flush_enqueue(struct kvm_vcpu *vcpu,
		@@ -1926,8 +1933,6 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu vcpu, struct kvm_hv_hcall hc)
		struct kvm_vcpu *v;
		unsigned long i;
		bool all_cpus;
		int consumed_xmm_halves = 0;
		gpa_t data_offset;

		/*
		* The Hyper-V TLFS doesn't allow more than HV_MAX_SPARSE_VCPU_BANKS
		@@ -1955,12 +1960,12 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu vcpu, struct kvm_hv_hcall hc)
		flush.address_space = hc->ingpa;
		flush.flags = hc->outgpa;
		flush.processor_mask = sse128_lo(hc->xmm[0]);
		consumed_xmm_halves = 1;
		hc->consumed_xmm_halves = 1;
		} else {
		if (unlikely(kvm_read_guest(kvm, hc->ingpa,
		&flush, sizeof(flush))))
		return HV_STATUS_INVALID_HYPERCALL_INPUT;
		data_offset = sizeof(flush);
		hc->data_offset = sizeof(flush);
		}

		trace_kvm_hv_flush_tlb(flush.processor_mask,
		@@ -1985,12 +1990,12 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu vcpu, struct kvm_hv_hcall hc)
		flush_ex.flags = hc->outgpa;
		memcpy(&flush_ex.hv_vp_set,
		&hc->xmm[0], sizeof(hc->xmm[0]));
		consumed_xmm_halves = 2;
		hc->consumed_xmm_halves = 2;
		} else {
		if (unlikely(kvm_read_guest(kvm, hc->ingpa, &flush_ex,
		sizeof(flush_ex))))
		return HV_STATUS_INVALID_HYPERCALL_INPUT;
		data_offset = sizeof(flush_ex);
		hc->data_offset = sizeof(flush_ex);
		}

		trace_kvm_hv_flush_tlb_ex(flush_ex.hv_vp_set.valid_bank_mask,
		@@ -2009,8 +2014,7 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu vcpu, struct kvm_hv_hcall hc)
		if (!hc->var_cnt)
		goto ret_success;

		if (kvm_get_sparse_vp_set(kvm, hc, sparse_banks,
		consumed_xmm_halves, data_offset))
		if (kvm_get_sparse_vp_set(kvm, hc, sparse_banks))
		return HV_STATUS_INVALID_HYPERCALL_INPUT;
		}

		@@ -2021,8 +2025,10 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu vcpu, struct kvm_hv_hcall hc)
		* consumed_xmm_halves to make sure TLB flush entries are read
		* from the correct offset.
		*/
		data_offset += hc->var_cnt * sizeof(sparse_banks[0]);
		consumed_xmm_halves += hc->var_cnt;
		if (hc->fast)
		hc->consumed_xmm_halves += hc->var_cnt;
		else
		hc->data_offset += hc->var_cnt * sizeof(sparse_banks[0]);
		}

		if (hc->code == HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE \|\|
		@@ -2030,8 +2036,7 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu vcpu, struct kvm_hv_hcall hc)
		hc->rep_cnt > ARRAY_SIZE(__tlb_flush_entries)) {
		tlb_flush_entries = NULL;
		} else {
		if (kvm_hv_get_tlb_flush_entries(kvm, hc, __tlb_flush_entries,
		consumed_xmm_halves, data_offset))
		if (kvm_hv_get_tlb_flush_entries(kvm, hc, __tlb_flush_entries))
		return HV_STATUS_INVALID_HYPERCALL_INPUT;
		tlb_flush_entries = __tlb_flush_entries;
		}
		@@ -2180,9 +2185,13 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu vcpu, struct kvm_hv_hcall hc)
		if (!hc->var_cnt)
		goto ret_success;

		if (kvm_get_sparse_vp_set(kvm, hc, sparse_banks, 1,
		offsetof(struct hv_send_ipi_ex,
		vp_set.bank_contents)))
		if (!hc->fast)
		hc->data_offset = offsetof(struct hv_send_ipi_ex,
		vp_set.bank_contents);
		else
		hc->consumed_xmm_halves = 1;

		if (kvm_get_sparse_vp_set(kvm, hc, sparse_banks))
		return HV_STATUS_INVALID_HYPERCALL_INPUT;
		}

arch/x86/kvm/irq_comm.c

+3 −2

Original line number	Diff line number	Diff line
		@@ -426,8 +426,9 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
		kvm_set_msi_irq(vcpu->kvm, entry, &irq);

		if (irq.trig_mode &&
		kvm_apic_match_dest(vcpu, NULL, APIC_DEST_NOSHORT,
		irq.dest_id, irq.dest_mode))
		(kvm_apic_match_dest(vcpu, NULL, APIC_DEST_NOSHORT,
		irq.dest_id, irq.dest_mode) \|\|
		kvm_apic_pending_eoi(vcpu, irq.vector)))
		__set_bit(irq.vector, ioapic_handled_vectors);
		}
		}