Commit 1e17a6f8 authored by Sean Christopherson's avatar Sean Christopherson Committed by Paolo Bonzini
Browse files

KVM: x86: Don't snapshot pending INIT/SIPI prior to checking nested events



Don't snapshot pending INIT/SIPI events prior to checking nested events,
architecturally there's nothing wrong with KVM processing (dropping) a
SIPI that is received immediately after synthesizing a VM-Exit.  Taking
and consuming the snapshot makes the flow way more subtle than it needs
to be, e.g. nVMX consumes/clears events that trigger VM-Exit (INIT/SIPI),
and so at first glance it appears that KVM is double-dipping on pending
INITs and SIPIs.  But that's not the case because INIT is blocked
unconditionally in VMX root mode the CPU cannot be in wait-for_SIPI after
VM-Exit, i.e. the paths that truly consume the snapshot are unreachable
if apic->pending_events is modified by kvm_check_nested_events().

nSVM is a similar story as GIF is cleared by the CPU on VM-Exit; INIT is
blocked regardless of whether or not it was pending prior to VM-Exit.

Drop the snapshot logic so that a future fix doesn't create weirdness
when kvm_vcpu_running()'s call to kvm_check_nested_events() is moved to
vcpu_block().  In that case, kvm_check_nested_events() will be called
immediately before kvm_apic_accept_events(), which raises the obvious
question of why that change doesn't break the snapshot logic.

Note, there is a subtle functional change.  Previously, KVM would clear
pending SIPIs if and only SIPI was pending prior to VM-Exit, whereas now
KVM clears pending SIPI unconditionally if INIT+SIPI are blocked.  The
latter is architecturally allowed, as SIPI is ignored if the CPU is not
in wait-for-SIPI mode (arguably, KVM should be even more aggressive in
dropping SIPIs).  It is software's responsibility to ensure the SIPI is
delivered, i.e. software shouldn't be firing INIT-SIPI at a CPU until
it knows with 100% certaining that the target CPU isn't in VMX root mode.

Furthermore, the existing code is extra weird as SIPIs that arrive after
VM-Exit _are_ dropped if there also happened to be a pending SIPI before
VM-Exit.

Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
Message-Id: <20220921003201.1441511-10-seanjc@google.com>
Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
parent ea2f00c6
Loading
Loading
Loading
Loading
+10 −26
Original line number Diff line number Diff line
@@ -3025,17 +3025,8 @@ int kvm_apic_accept_events(struct kvm_vcpu *vcpu)
	struct kvm_lapic *apic = vcpu->arch.apic;
	u8 sipi_vector;
	int r;
	unsigned long pe;

	if (!lapic_in_kernel(vcpu))
		return 0;

	/*
	 * Read pending events before calling the check_events
	 * callback.
	 */
	pe = smp_load_acquire(&apic->pending_events);
	if (!pe)
	if (!kvm_apic_has_pending_init_or_sipi(vcpu))
		return 0;

	if (is_guest_mode(vcpu)) {
@@ -3043,38 +3034,31 @@ int kvm_apic_accept_events(struct kvm_vcpu *vcpu)
		if (r < 0)
			return r == -EBUSY ? 0 : r;
		/*
		 * If an event has happened and caused a vmexit,
		 * we know INITs are latched and therefore
		 * we will not incorrectly deliver an APIC
		 * event instead of a vmexit.
		 * Continue processing INIT/SIPI even if a nested VM-Exit
		 * occurred, e.g. pending SIPIs should be dropped if INIT+SIPI
		 * are blocked as a result of transitioning to VMX root mode.
		 */
	}

	/*
	 * INITs are blocked while CPU is in specific states
	 * (SMM, VMX root mode, SVM with GIF=0).
	 * Because a CPU cannot be in these states immediately
	 * after it has processed an INIT signal (and thus in
	 * KVM_MP_STATE_INIT_RECEIVED state), just eat SIPIs
	 * and leave the INIT pending.
	 * INITs are blocked while CPU is in specific states (SMM, VMX root
	 * mode, SVM with GIF=0), while SIPIs are dropped if the CPU isn't in
	 * wait-for-SIPI (WFS).
	 */
	if (!kvm_apic_init_sipi_allowed(vcpu)) {
		WARN_ON_ONCE(vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED);
		if (test_bit(KVM_APIC_SIPI, &pe))
		clear_bit(KVM_APIC_SIPI, &apic->pending_events);
		return 0;
	}

	if (test_bit(KVM_APIC_INIT, &pe)) {
		clear_bit(KVM_APIC_INIT, &apic->pending_events);
	if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) {
		kvm_vcpu_reset(vcpu, true);
		if (kvm_vcpu_is_bsp(apic->vcpu))
			vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
		else
			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
	}
	if (test_bit(KVM_APIC_SIPI, &pe)) {
		clear_bit(KVM_APIC_SIPI, &apic->pending_events);
	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events)) {
		if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
			/* evaluate pending_events before reading the vector */
			smp_rmb();