Skip to content
  1. Jan 07, 2022
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD · 7fd55a02
      Paolo Bonzini authored
      KVM/arm64 updates for Linux 5.16
      
      - Simplification of the 'vcpu first run' by integrating it into
        KVM's 'pid change' flow
      
      - Refactoring of the FP and SVE state tracking, also leading to
        a simpler state and less shared data between EL1 and EL2 in
        the nVHE case
      
      - Tidy up the header file usage for the nvhe hyp object
      
      - New HYP unsharing mechanism, finally allowing pages to be
        unmapped from the Stage-1 EL2 page-tables
      
      - Various pKVM cleanups around refcounting and sharing
      
      - A couple of vgic fixes for bugs that would trigger once
        the vcpu xarray rework is merged, but not sooner
      
      - Add minimal support for ARMv8.7's PMU extension
      
      - Rework kvm_pgtable initialisation ahead of the NV work
      
      - New selftest for IRQ injection
      
      - Teach selftests about the lack of default IPA space and
        page sizes
      
      - Expand sysreg selftest to deal with Pointer Authentication
      
      - The usual bunch of cleanups and doc update
      7fd55a02
  2. Jan 05, 2022
  3. Jan 04, 2022
    • Marc Zyngier's avatar
      Merge branch kvm-arm64/selftest/irq-injection into kvmarm-master/next · ad7937dc
      Marc Zyngier authored
      
      
      * kvm-arm64/selftest/irq-injection:
        : .
        : New tests from Ricardo Koller:
        : "This series adds a new test, aarch64/vgic-irq, that validates the injection of
        : different types of IRQs from userspace using various methods and configurations"
        : .
        KVM: selftests: aarch64: Add test for restoring active IRQs
        KVM: selftests: aarch64: Add ISPENDR write tests in vgic_irq
        KVM: selftests: aarch64: Add tests for IRQFD in vgic_irq
        KVM: selftests: Add IRQ GSI routing library functions
        KVM: selftests: aarch64: Add test_inject_fail to vgic_irq
        KVM: selftests: aarch64: Add tests for LEVEL_INFO in vgic_irq
        KVM: selftests: aarch64: Level-sensitive interrupts tests in vgic_irq
        KVM: selftests: aarch64: Add preemption tests in vgic_irq
        KVM: selftests: aarch64: Cmdline arg to set EOI mode in vgic_irq
        KVM: selftests: aarch64: Cmdline arg to set number of IRQs in vgic_irq test
        KVM: selftests: aarch64: Abstract the injection functions in vgic_irq
        KVM: selftests: aarch64: Add vgic_irq to test userspace IRQ injection
        KVM: selftests: aarch64: Add vGIC library functions to deal with vIRQ state
        KVM: selftests: Add kvm_irq_line library function
        KVM: selftests: aarch64: Add GICv3 register accessor library functions
        KVM: selftests: aarch64: Add function for accessing GICv3 dist and redist registers
        KVM: selftests: aarch64: Move gic_v3.h to shared headers
      
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      ad7937dc
    • Marc Zyngier's avatar
      Merge branch kvm-arm64/selftest/ipa into kvmarm-master/next · 089606c0
      Marc Zyngier authored
      
      
      * kvm-arm64/selftest/ipa:
        : .
        : Expand the KVM/arm64 selftest infrastructure to discover
        : supported page sizes at runtime, support 16kB pages, and
        : find out about the original M1 stupidly small IPA space.
        : .
        KVM: selftests: arm64: Add support for various modes with 16kB page size
        KVM: selftests: arm64: Add support for VM_MODE_P36V48_{4K,64K}
        KVM: selftests: arm64: Rework TCR_EL1 configuration
        KVM: selftests: arm64: Check for supported page sizes
        KVM: selftests: arm64: Introduce a variable default IPA size
        KVM: selftests: arm64: Initialise default guest mode at test startup time
      
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      089606c0
    • Zenghui Yu's avatar
      KVM: arm64: Fix comment typo in kvm_vcpu_finalize_sve() · e938eddb
      Zenghui Yu authored
      kvm_arm_init_arch_resources() was renamed to kvm_arm_init_sve() in
      commit a3be836d
      
       ("KVM: arm/arm64: Demote
      kvm_arm_init_arch_resources() to just set up SVE"). Fix the function
      name in comment of kvm_vcpu_finalize_sve().
      
      Signed-off-by: default avatarZenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20211230141535.1389-1-yuzenghui@huawei.com
      e938eddb
    • Marc Zyngier's avatar
      KVM: arm64: selftests: get-reg-list: Add pauth configuration · f15dcf1b
      Marc Zyngier authored
      
      
      The get-reg-list test ignores the Pointer Authentication features,
      which is a shame now that we have relatively common HW with this feature.
      
      Define two new configurations (with and without PMU) that exercise the
      KVM capabilities.
      
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Reviewed-by: default avatarAndrew Jones <drjones@redhat.com>
      Link: https://lore.kernel.org/r/20211228121414.1013250-1-maz@kernel.org
      f15dcf1b
  4. Dec 29, 2021
  5. Dec 28, 2021
  6. Dec 22, 2021
    • Paolo Bonzini's avatar
      Merge tag 'kvm-s390-next-5.17-1' of... · 5e4e84f1
      Paolo Bonzini authored
      Merge tag 'kvm-s390-next-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
      
      KVM: s390: Fix and cleanup
      
      - fix sigp sense/start/stop/inconsistency
      - cleanups
      5e4e84f1
    • Paolo Bonzini's avatar
      Merge remote-tracking branch 'kvm/master' into HEAD · 855fb038
      Paolo Bonzini authored
      Pick commit fdba608f
      
       ("KVM: VMX: Wake vCPU when delivering posted
      IRQ even if vCPU == this vCPU").  In addition to fixing a bug, it
      also aligns the non-nested and nested usage of triggering posted
      interrupts, allowing for additional cleanups.
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      855fb038
    • Sean Christopherson's avatar
      KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU · fdba608f
      Sean Christopherson authored
      Drop a check that guards triggering a posted interrupt on the currently
      running vCPU, and more importantly guards waking the target vCPU if
      triggering a posted interrupt fails because the vCPU isn't IN_GUEST_MODE.
      If a vIRQ is delivered from asynchronous context, the target vCPU can be
      the currently running vCPU and can also be blocking, in which case
      skipping kvm_vcpu_wake_up() is effectively dropping what is supposed to
      be a wake event for the vCPU.
      
      The "do nothing" logic when "vcpu == running_vcpu" mostly works only
      because the majority of calls to ->deliver_posted_interrupt(), especially
      when using posted interrupts, come from synchronous KVM context.  But if
      a device is exposed to the guest using vfio-pci passthrough, the VFIO IRQ
      and vCPU are bound to the same pCPU, and the IRQ is _not_ configured to
      use posted interrupts, wake events from the device will be delivered to
      KVM from IRQ context, e.g.
      
        vfio_msihandler()
        |
        |-> eventfd_signal()
            |
            |-> ...
                |
                |->  irqfd_wakeup()
                     |
                     |->kvm_arch_set_irq_inatomic()
                        |
                        |-> kvm_irq_delivery_to_apic_fast()
                            |
                            |-> kvm_apic_set_irq()
      
      This also aligns the non-nested and nested usage of triggering posted
      interrupts, and will allow for additional cleanups.
      
      Fixes: 379a3c8e
      
       ("KVM: VMX: Optimize posted-interrupt delivery for timer fastpath")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarLongpeng (Mike) <longpeng2@huawei.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20211208015236.1616697-18-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fdba608f
  7. Dec 20, 2021
    • Fuad Tabba's avatar
      KVM: arm64: Fix comment on barrier in kvm_psci_vcpu_on() · dda0190d
      Fuad Tabba authored
      The barrier is there for power_off rather than power_state.
      Probably typo in commit 358b28f0
      
       ("arm/arm64: KVM: Allow
      a VCPU to fully reset itself").
      
      Signed-off-by: default avatarFuad Tabba <tabba@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20211208193257.667613-3-tabba@google.com
      dda0190d
    • Fuad Tabba's avatar
      KVM: arm64: Fix comment for kvm_reset_vcpu() · a080e323
      Fuad Tabba authored
      The comment for kvm_reset_vcpu() refers to the sysreg table as
      being the table above, probably because of the code extracted at
      commit f4672752
      
       ("arm64: KVM: virtual CPU reset").
      
      Fix the comment to remove the potentially confusing reference.
      
      Signed-off-by: default avatarFuad Tabba <tabba@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20211208193257.667613-2-tabba@google.com
      a080e323
    • Fuad Tabba's avatar
      KVM: arm64: Use defined value for SCTLR_ELx_EE · 500ca524
      Fuad Tabba authored
      
      
      Replace the hardcoded value with the existing definition.
      
      No functional change intended.
      
      Signed-off-by: default avatarFuad Tabba <tabba@google.com>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20211208192810.657360-1-tabba@google.com
      500ca524
    • Sean Christopherson's avatar
      KVM: selftests: Add test to verify TRIPLE_FAULT on invalid L2 guest state · ab1ef344
      Sean Christopherson authored
      Add a selftest to attempt to enter L2 with invalid guests state by
      exiting to userspace via I/O from L2, and then using KVM_SET_SREGS to set
      invalid guest state (marking TR unusable is arbitrary chosen for its
      relative simplicity).
      
      This is a regression test for a bug introduced by commit c8607e4a
      
      
      ("KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if
      !from_vmentry"), which incorrectly set vmx->fail=true when L2 had invalid
      guest state and ultimately triggered a WARN due to nested_vmx_vmexit()
      seeing vmx->fail==true while attempting to synthesize a nested VM-Exit.
      
      The is also a functional test to verify that KVM sythesizes TRIPLE_FAULT
      for L2, which is somewhat arbitrary behavior, instead of emulating L2.
      KVM should never emulate L2 due to invalid guest state, as it's
      architecturally impossible for L1 to run an L2 guest with invalid state
      as nested VM-Enter should always fail, i.e. L1 needs to do the emulation.
      Stuffing state via KVM ioctl() is a non-architctural, out-of-band case,
      hence the TRIPLE_FAULT being rather arbitrary.
      
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211207193006.120997-5-seanjc@google.com>
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ab1ef344
    • Sean Christopherson's avatar
      KVM: VMX: Fix stale docs for kvm-intel.emulate_invalid_guest_state · 0ff29701
      Sean Christopherson authored
      Update the documentation for kvm-intel's emulate_invalid_guest_state to
      rectify the description of KVM's default behavior, and to document that
      the behavior and thus parameter only applies to L1.
      
      Fixes: a27685c3
      
       ("KVM: VMX: Emulate invalid guest state by default")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211207193006.120997-4-seanjc@google.com>
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0ff29701
    • Sean Christopherson's avatar
      KVM: nVMX: Synthesize TRIPLE_FAULT for L2 if emulation is required · cd0e615c
      Sean Christopherson authored
      
      
      Synthesize a triple fault if L2 guest state is invalid at the time of
      VM-Enter, which can happen if L1 modifies SMRAM or if userspace stuffs
      guest state via ioctls(), e.g. KVM_SET_SREGS.  KVM should never emulate
      invalid guest state, since from L1's perspective, it's architecturally
      impossible for L2 to have invalid state while L2 is running in hardware.
      E.g. attempts to set CR0 or CR4 to unsupported values will either VM-Exit
      or #GP.
      
      Modifying vCPU state via RSM+SMRAM and ioctl() are the only paths that
      can trigger this scenario, as nested VM-Enter correctly rejects any
      attempt to enter L2 with invalid state.
      
      RSM is a straightforward case as (a) KVM follows AMD's SMRAM layout and
      behavior, and (b) Intel's SDM states that loading reserved CR0/CR4 bits
      via RSM results in shutdown, i.e. there is precedent for KVM's behavior.
      Following AMD's SMRAM layout is important as AMD's layout saves/restores
      the descriptor cache information, including CS.RPL and SS.RPL, and also
      defines all the fields relevant to invalid guest state as read-only, i.e.
      so long as the vCPU had valid state before the SMI, which is guaranteed
      for L2, RSM will generate valid state unless SMRAM was modified.  Intel's
      layout saves/restores only the selector, which means that scenarios where
      the selector and cached RPL don't match, e.g. conforming code segments,
      would yield invalid guest state.  Intel CPUs fudge around this issued by
      stuffing SS.RPL and CS.RPL on RSM.  Per Intel's SDM on the "Default
      Treatment of RSM", paraphrasing for brevity:
      
        IF internal storage indicates that the [CPU was post-VMXON]
        THEN
           enter VMX operation (root or non-root);
           restore VMX-critical state as defined in Section 34.14.1;
           set to their fixed values any bits in CR0 and CR4 whose values must
           be fixed in VMX operation [unless coming from an unrestricted guest];
           IF RFLAGS.VM = 0 AND (in VMX root operation OR the
              “unrestricted guest” VM-execution control is 0)
           THEN
             CS.RPL := SS.DPL;
             SS.RPL := SS.DPL;
           FI;
           restore current VMCS pointer;
        FI;
      
      Note that Intel CPUs also overwrite the fixed CR0/CR4 bits, whereas KVM
      will sythesize TRIPLE_FAULT in this scenario.  KVM's behavior is allowed
      as both Intel and AMD define CR0/CR4 SMRAM fields as read-only, i.e. the
      only way for CR0 and/or CR4 to have illegal values is if they were
      modified by the L1 SMM handler, and Intel's SDM "SMRAM State Save Map"
      section states "modifying these registers will result in unpredictable
      behavior".
      
      KVM's ioctl() behavior is less straightforward.  Because KVM allows
      ioctls() to be executed in any order, rejecting an ioctl() if it would
      result in invalid L2 guest state is not an option as KVM cannot know if
      a future ioctl() would resolve the invalid state, e.g. KVM_SET_SREGS, or
      drop the vCPU out of L2, e.g. KVM_SET_NESTED_STATE.  Ideally, KVM would
      reject KVM_RUN if L2 contained invalid guest state, but that carries the
      risk of a false positive, e.g. if RSM loaded invalid guest state and KVM
      exited to userspace.  Setting a flag/request to detect such a scenario is
      undesirable because (a) it's extremely unlikely to add value to KVM as a
      whole, and (b) KVM would need to consider ioctl() interactions with such
      a flag, e.g. if userspace migrated the vCPU while the flag were set.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211207193006.120997-3-seanjc@google.com>
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      cd0e615c
    • Sean Christopherson's avatar
      KVM: VMX: Always clear vmx->fail on emulation_required · a80dfc02
      Sean Christopherson authored
      Revert a relatively recent change that set vmx->fail if the vCPU is in L2
      and emulation_required is true, as that behavior is completely bogus.
      Setting vmx->fail and synthesizing a VM-Exit is contradictory and wrong:
      
        (a) it's impossible to have both a VM-Fail and VM-Exit
        (b) vmcs.EXIT_REASON is not modified on VM-Fail
        (c) emulation_required refers to guest state and guest state checks are
            always VM-Exits, not VM-Fails.
      
      For KVM specifically, emulation_required is handled before nested exits
      in __vmx_handle_exit(), thus setting vmx->fail has no immediate effect,
      i.e. KVM calls into handle_invalid_guest_state() and vmx->fail is ignored.
      Setting vmx->fail can ultimately result in a WARN in nested_vmx_vmexit()
      firing when tearing down the VM as KVM never expects vmx->fail to be set
      when L2 is active, KVM always reflects those errors into L1.
      
        ------------[ cut here ]------------
        WARNING: CPU: 0 PID: 21158 at arch/x86/kvm/vmx/nested.c:4548
                                      nested_vmx_vmexit+0x16bd/0x17e0
                                      arch/x86/kvm/vmx/nested.c:4547
        Modules linked in:
        CPU: 0 PID: 21158 Comm: syz-executor.1 Not tainted 5.16.0-rc3-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:nested_vmx_vmexit+0x16bd/0x17e0 arch/x86/kvm/vmx/nested.c:4547
        Code: <0f> 0b e9 2e f8 ff ff e8 57 b3 5d 00 0f 0b e9 00 f1 ff ff 89 e9 80
        Call Trace:
         vmx_leave_nested arch/x86/kvm/vmx/nested.c:6220 [inline]
         nested_vmx_free_vcpu+0x83/0xc0 arch/x86/kvm/vmx/nested.c:330
         vmx_free_vcpu+0x11f/0x2a0 arch/x86/kvm/vmx/vmx.c:6799
         kvm_arch_vcpu_destroy+0x6b/0x240 arch/x86/kvm/x86.c:10989
         kvm_vcpu_destroy+0x29/0x90 arch/x86/kvm/../../../virt/kvm/kvm_main.c:441
         kvm_free_vcpus arch/x86/kvm/x86.c:11426 [inline]
         kvm_arch_destroy_vm+0x3ef/0x6b0 arch/x86/kvm/x86.c:11545
         kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1189 [inline]
         kvm_put_kvm+0x751/0xe40 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1220
         kvm_vcpu_release+0x53/0x60 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3489
         __fput+0x3fc/0x870 fs/file_table.c:280
         task_work_run+0x146/0x1c0 kernel/task_work.c:164
         exit_task_work include/linux/task_work.h:32 [inline]
         do_exit+0x705/0x24f0 kernel/exit.c:832
         do_group_exit+0x168/0x2d0 kernel/exit.c:929
         get_signal+0x1740/0x2120 kernel/signal.c:2852
         arch_do_signal_or_restart+0x9c/0x730 arch/x86/kernel/signal.c:868
         handle_signal_work kernel/entry/common.c:148 [inline]
         exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
         exit_to_user_mode_prepare+0x191/0x220 kernel/entry/common.c:207
         __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
         syscall_exit_to_user_mode+0x2e/0x70 kernel/entry/common.c:300
         do_syscall_64+0x53/0xd0 arch/x86/entry/common.c:86
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: c8607e4a
      
       ("KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if !from_vmentry")
      Reported-by: default avatar <syzbot+f1d2136db9c80d4733e8@syzkaller.appspotmail.com>
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211207193006.120997-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a80dfc02