Commit 3e105853 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull KVM updates from Paolo Bonzini:
 "x86:

   - Support for userspace to emulate Xen hypercalls

   - Raise the maximum number of user memslots

   - Scalability improvements for the new MMU.

     Instead of the complex "fast page fault" logic that is used in
     mmu.c, tdp_mmu.c uses an rwlock so that page faults are concurrent,
     but the code that can run against page faults is limited. Right now
     only page faults take the lock for reading; in the future this will
     be extended to some cases of page table destruction. I hope to
     switch the default MMU around 5.12-rc3 (some testing was delayed
     due to Chinese New Year).

   - Cleanups for MAXPHYADDR checks

   - Use static calls for vendor-specific callbacks

   - On AMD, use VMLOAD/VMSAVE to save and restore host state

   - Stop using deprecated jump label APIs

   - Workaround for AMD erratum that made nested virtualization
     unreliable

   - Support for LBR emulation in the guest

   - Support for communicating bus lock vmexits to userspace

   - Add support for SEV attestation command

   - Miscellaneous cleanups

  PPC:

   - Support for second data watchpoint on POWER10

   - Remove some complex workarounds for buggy early versions of POWER9

   - Guest entry/exit fixes

  ARM64:

   - Make the nVHE EL2 object relocatable

   - Cleanups for concurrent translation faults hitting the same page

   - Support for the standard TRNG hypervisor call

   - A bunch of small PMU/Debug fixes

   - Simplification of the early init hypercall handling

  Non-KVM changes (with acks):

   - Detection of contended rwlocks (implemented only for qrwlocks,
     because KVM only needs it for x86)

   - Allow __DISABLE_EXPORTS from assembly code

   - Provide a saner follow_pfn replacements for modules"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (192 commits)
  KVM: x86/xen: Explicitly pad struct compat_vcpu_info to 64 bytes
  KVM: selftests: Don't bother mapping GVA for Xen shinfo test
  KVM: selftests: Fix hex vs. decimal snafu in Xen test
  KVM: selftests: Fix size of memslots created by Xen tests
  KVM: selftests: Ignore recently added Xen tests' build output
  KVM: selftests: Add missing header file needed by xAPIC IPI tests
  KVM: selftests: Add operand to vmsave/vmload/vmrun in svm.c
  KVM: SVM: Make symbol 'svm_gp_erratum_intercept' static
  locking/arch: Move qrwlock.h include after qspinlock.h
  KVM: PPC: Book3S HV: Fix host radix SLB optimisation with hash guests
  KVM: PPC: Book3S HV: Ensure radix guest has no SLB entries
  KVM: PPC: Don't always report hash MMU capability for P9 < DD2.2
  KVM: PPC: Book3S HV: Save and restore FSCR in the P9 path
  KVM: PPC: remove unneeded semicolon
  KVM: PPC: Book3S HV: Use POWER9 SLBIA IH=6 variant to clear SLB
  KVM: PPC: Book3S HV: No need to clear radix host SLB before loading HPT guest
  KVM: PPC: Book3S HV: Fix radix guest SLB side channel
  KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host without mixed mode support
  KVM: PPC: Book3S HV: Introduce new capability for 2nd DAWR
  KVM: PPC: Book3S HV: Add infrastructure to support 2nd DAWR
  ...
parents 9c5b80b7 8c6e67be
Loading
Loading
Loading
Loading
+21 −0
Original line number Diff line number Diff line
@@ -263,6 +263,27 @@ Returns: 0 on success, -negative on error
                __u32 trans_len;
        };

10. KVM_SEV_GET_ATTESTATION_REPORT
----------------------------------

The KVM_SEV_GET_ATTESTATION_REPORT command can be used by the hypervisor to query the attestation
report containing the SHA-256 digest of the guest memory and VMSA passed through the KVM_SEV_LAUNCH
commands and signed with the PEK. The digest returned by the command should match the digest
used by the guest owner with the KVM_SEV_LAUNCH_MEASURE.

Parameters (in): struct kvm_sev_attestation

Returns: 0 on success, -negative on error

::

        struct kvm_sev_attestation_report {
                __u8 mnonce[16];        /* A random mnonce that will be placed in the report */

                __u64 uaddr;            /* userspace address where the report should be copied */
                __u32 len;
        };

References
==========

+220 −8
Original line number Diff line number Diff line
@@ -960,6 +960,14 @@ memory.
	__u8 pad2[30];
  };

If the KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL flag is returned from the
KVM_CAP_XEN_HVM check, it may be set in the flags field of this ioctl.
This requests KVM to generate the contents of the hypercall page
automatically; hypercalls will be intercepted and passed to userspace
through KVM_EXIT_XEN.  In this case, all of the blob size and address
fields must be zero.

No other flags are currently valid in the struct kvm_xen_hvm_config.

4.29 KVM_GET_CLOCK
------------------
@@ -2268,6 +2276,8 @@ registers, find a list below:
  PPC     KVM_REG_PPC_PSSCR               64
  PPC     KVM_REG_PPC_DEC_EXPIRY          64
  PPC     KVM_REG_PPC_PTCR                64
  PPC     KVM_REG_PPC_DAWR1               64
  PPC     KVM_REG_PPC_DAWRX1              64
  PPC     KVM_REG_PPC_TM_GPR0             64
  ...
  PPC     KVM_REG_PPC_TM_GPR31            64
@@ -4831,6 +4841,101 @@ into user space.
If a vCPU is in running state while this ioctl is invoked, the vCPU may
experience inconsistent filtering behavior on MSR accesses.

4.127 KVM_XEN_HVM_SET_ATTR
--------------------------

:Capability: KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO
:Architectures: x86
:Type: vm ioctl
:Parameters: struct kvm_xen_hvm_attr
:Returns: 0 on success, < 0 on error

::

  struct kvm_xen_hvm_attr {
	__u16 type;
	__u16 pad[3];
	union {
		__u8 long_mode;
		__u8 vector;
		struct {
			__u64 gfn;
		} shared_info;
		__u64 pad[4];
	} u;
  };

type values:

KVM_XEN_ATTR_TYPE_LONG_MODE
  Sets the ABI mode of the VM to 32-bit or 64-bit (long mode). This
  determines the layout of the shared info pages exposed to the VM.

KVM_XEN_ATTR_TYPE_SHARED_INFO
  Sets the guest physical frame number at which the Xen "shared info"
  page resides. Note that although Xen places vcpu_info for the first
  32 vCPUs in the shared_info page, KVM does not automatically do so
  and instead requires that KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO be used
  explicitly even when the vcpu_info for a given vCPU resides at the
  "default" location in the shared_info page. This is because KVM is
  not aware of the Xen CPU id which is used as the index into the
  vcpu_info[] array, so cannot know the correct default location.

KVM_XEN_ATTR_TYPE_UPCALL_VECTOR
  Sets the exception vector used to deliver Xen event channel upcalls.

4.128 KVM_XEN_HVM_GET_ATTR
--------------------------

:Capability: KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO
:Architectures: x86
:Type: vm ioctl
:Parameters: struct kvm_xen_hvm_attr
:Returns: 0 on success, < 0 on error

Allows Xen VM attributes to be read. For the structure and types,
see KVM_XEN_HVM_SET_ATTR above.

4.129 KVM_XEN_VCPU_SET_ATTR
---------------------------

:Capability: KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO
:Architectures: x86
:Type: vcpu ioctl
:Parameters: struct kvm_xen_vcpu_attr
:Returns: 0 on success, < 0 on error

::

  struct kvm_xen_vcpu_attr {
	__u16 type;
	__u16 pad[3];
	union {
		__u64 gpa;
		__u64 pad[4];
	} u;
  };

type values:

KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO
  Sets the guest physical address of the vcpu_info for a given vCPU.

KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO
  Sets the guest physical address of an additional pvclock structure
  for a given vCPU. This is typically used for guest vsyscall support.

4.130 KVM_XEN_VCPU_GET_ATTR
---------------------------

:Capability: KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO
:Architectures: x86
:Type: vcpu ioctl
:Parameters: struct kvm_xen_vcpu_attr
:Returns: 0 on success, < 0 on error

Allows Xen vCPU attributes to be read. For the structure and types,
see KVM_XEN_VCPU_SET_ATTR above.

5. The kvm_run structure
========================
@@ -4893,9 +4998,11 @@ local APIC is not used.
	__u16 flags;

More architecture-specific flags detailing state of the VCPU that may
affect the device's behavior.  The only currently defined flag is
KVM_RUN_X86_SMM, which is valid on x86 machines and is set if the
VCPU is in system management mode.
affect the device's behavior. Current defined flags:
  /* x86, set if the VCPU is in system management mode */
  #define KVM_RUN_X86_SMM     (1 << 0)
  /* x86, set if bus lock detected in VM */
  #define KVM_RUN_BUS_LOCK    (1 << 1)

::

@@ -4996,13 +5103,18 @@ to the byte array.

.. note::

      For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR,
      For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN,
      KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding
      operations are complete (and guest state is consistent) only after userspace
      has re-entered the kernel with KVM_RUN.  The kernel side will first finish
      incomplete operations and then check for pending signals.  Userspace
      can re-enter the guest with an unmasked signal pending to complete
      pending operations.
      incomplete operations and then check for pending signals.

      The pending state of the operation is not preserved in state which is
      visible to userspace, thus userspace should ensure that the operation is
      completed before performing a live migration.  Userspace can re-enter the
      guest with an unmasked signal pending or with the immediate_exit field set
      to complete pending operations without allowing any further instructions
      to be executed.

::

@@ -5327,6 +5439,34 @@ wants to write. Once finished processing the event, user space must continue
vCPU execution. If the MSR write was unsuccessful, user space also sets the
"error" field to "1".

::


		struct kvm_xen_exit {
  #define KVM_EXIT_XEN_HCALL          1
			__u32 type;
			union {
				struct {
					__u32 longmode;
					__u32 cpl;
					__u64 input;
					__u64 result;
					__u64 params[6];
				} hcall;
			} u;
		};
		/* KVM_EXIT_XEN */
                struct kvm_hyperv_exit xen;

Indicates that the VCPU exits into userspace to process some tasks
related to Xen emulation.

Valid values for 'type' are:

  - KVM_EXIT_XEN_HCALL -- synchronously notify user-space about Xen hypercall.
    Userspace is expected to place the hypercall result into the appropriate
    field before invoking KVM_RUN again.

::

		/* Fix the size of the union. */
@@ -6038,6 +6178,53 @@ KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR exit notifications which user space
can then handle to implement model specific MSR handling and/or user notifications
to inform a user that an MSR was not handled.

7.22 KVM_CAP_X86_BUS_LOCK_EXIT
-------------------------------

:Architectures: x86
:Target: VM
:Parameters: args[0] defines the policy used when bus locks detected in guest
:Returns: 0 on success, -EINVAL when args[0] contains invalid bits

Valid bits in args[0] are::

  #define KVM_BUS_LOCK_DETECTION_OFF      (1 << 0)
  #define KVM_BUS_LOCK_DETECTION_EXIT     (1 << 1)

Enabling this capability on a VM provides userspace with a way to select
a policy to handle the bus locks detected in guest. Userspace can obtain
the supported modes from the result of KVM_CHECK_EXTENSION and define it
through the KVM_ENABLE_CAP.

KVM_BUS_LOCK_DETECTION_OFF and KVM_BUS_LOCK_DETECTION_EXIT are supported
currently and mutually exclusive with each other. More bits can be added in
the future.

With KVM_BUS_LOCK_DETECTION_OFF set, bus locks in guest will not cause vm exits
so that no additional actions are needed. This is the default mode.

With KVM_BUS_LOCK_DETECTION_EXIT set, vm exits happen when bus lock detected
in VM. KVM just exits to userspace when handling them. Userspace can enforce
its own throttling or other policy based mitigations.

This capability is aimed to address the thread that VM can exploit bus locks to
degree the performance of the whole system. Once the userspace enable this
capability and select the KVM_BUS_LOCK_DETECTION_EXIT mode, KVM will set the
KVM_RUN_BUS_LOCK flag in vcpu-run->flags field and exit to userspace. Concerning
the bus lock vm exit can be preempted by a higher priority VM exit, the exit
notifications to userspace can be KVM_EXIT_BUS_LOCK or other reasons.
KVM_RUN_BUS_LOCK flag is used to distinguish between them.

7.22 KVM_CAP_PPC_DAWR1
----------------------

:Architectures: ppc
:Parameters: none
:Returns: 0 on success, -EINVAL when CPU doesn't support 2nd DAWR

This capability can be used to check / enable 2nd DAWR feature provided
by POWER10 processor.

8. Other capabilities.
======================

@@ -6415,7 +6602,6 @@ guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf
(0x40000001). Otherwise, a guest may use the paravirtual features
regardless of what has actually been exposed through the CPUID leaf.


8.29 KVM_CAP_DIRTY_LOG_RING
---------------------------

@@ -6502,3 +6688,29 @@ KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG. After enabling
KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
machine will switch to ring-buffer dirty page tracking and further
KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.

8.30 KVM_CAP_XEN_HVM
--------------------

:Architectures: x86

This capability indicates the features that Xen supports for hosting Xen
PVHVM guests. Valid flags are::

  #define KVM_XEN_HVM_CONFIG_HYPERCALL_MSR	(1 << 0)
  #define KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL	(1 << 1)
  #define KVM_XEN_HVM_CONFIG_SHARED_INFO	(1 << 2)

The KVM_XEN_HVM_CONFIG_HYPERCALL_MSR flag indicates that the KVM_XEN_HVM_CONFIG
ioctl is available, for the guest to set its hypercall page.

If KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL is also set, the same flag may also be
provided in the flags to KVM_XEN_HVM_CONFIG, without providing hypercall page
contents, to request that KVM generate hypercall page content automatically
and also enable interception of guest hypercalls with KVM_EXIT_XEN.

The KVM_XEN_HVM_CONFIG_SHARED_INFO flag indicates the availability of the
KVM_XEN_HVM_SET_ATTR, KVM_XEN_HVM_GET_ATTR, KVM_XEN_VCPU_SET_ATTR and
KVM_XEN_VCPU_GET_ATTR ioctls, as well as the delivery of exception vectors
for event channel upcalls when the evtchn_upcall_pending field of a vcpu's
vcpu_info is set.
+8 −1
Original line number Diff line number Diff line
@@ -16,7 +16,14 @@ The acquisition orders for mutexes are as follows:
- kvm->slots_lock is taken outside kvm->irq_lock, though acquiring
  them together is quite rare.

On x86, vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock.
On x86:

- vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock

- kvm->arch.mmu_lock is an rwlock.  kvm->arch.tdp_mmu_pages_lock is
  taken inside kvm->arch.mmu_lock, and cannot be taken without already
  holding kvm->arch.mmu_lock (typically with ``read_lock``, otherwise
  there's no need to take kvm->arch.tdp_mmu_pages_lock at all).

Everything else is a leaf: no other lock is taken inside the critical
sections.
+27 −2
Original line number Diff line number Diff line
@@ -7,6 +7,9 @@
#ifndef __ARM64_HYP_IMAGE_H__
#define __ARM64_HYP_IMAGE_H__

#define __HYP_CONCAT(a, b)	a ## b
#define HYP_CONCAT(a, b)	__HYP_CONCAT(a, b)

/*
 * KVM nVHE code has its own symbol namespace prefixed with __kvm_nvhe_,
 * to separate it from the kernel proper.
@@ -21,9 +24,31 @@
 */
#define HYP_SECTION_NAME(NAME)	.hyp##NAME

/* Symbol defined at the beginning of each hyp section. */
#define HYP_SECTION_SYMBOL_NAME(NAME) \
	HYP_CONCAT(__hyp_section_, HYP_SECTION_NAME(NAME))

/*
 * Helper to generate linker script statements starting a hyp section.
 *
 * A symbol with a well-known name is defined at the first byte. This
 * is used as a base for hyp relocations (see gen-hyprel.c). It must
 * be defined inside the section so the linker of `vmlinux` cannot
 * separate it from the section data.
 */
#define BEGIN_HYP_SECTION(NAME)				\
	HYP_SECTION_NAME(NAME) : {			\
		HYP_SECTION_SYMBOL_NAME(NAME) = .;

/* Helper to generate linker script statements ending a hyp section. */
#define END_HYP_SECTION					\
	}

/* Defines an ELF hyp section from input section @NAME and its subsections. */
#define HYP_SECTION(NAME)			\
	HYP_SECTION_NAME(NAME) : { *(NAME NAME##.*) }
	BEGIN_HYP_SECTION(NAME)			\
		*(NAME NAME##.*)		\
	END_HYP_SECTION

/*
 * Defines a linker script alias of a kernel-proper symbol referenced by
+0 −20
Original line number Diff line number Diff line
@@ -199,26 +199,6 @@ extern void __vgic_v3_init_lrs(void);

extern u32 __kvm_get_mdcr_el2(void);

/*
 * Obtain the PC-relative address of a kernel symbol
 * s: symbol
 *
 * The goal of this macro is to return a symbol's address based on a
 * PC-relative computation, as opposed to a loading the VA from a
 * constant pool or something similar. This works well for HYP, as an
 * absolute VA is guaranteed to be wrong. Only use this if trying to
 * obtain the address of a symbol (i.e. not something you obtained by
 * following a pointer).
 */
#define hyp_symbol_addr(s)						\
	({								\
		typeof(s) *addr;					\
		asm("adrp	%0, %1\n"				\
		    "add	%0, %0, :lo12:%1\n"			\
		    : "=r" (addr) : "S" (&s));				\
		addr;							\
	})

#define __KVM_EXTABLE(from, to)						\
	"	.pushsection	__kvm_ex_table, \"a\"\n"		\
	"	.align		3\n"					\
Loading