Commit eb561891 authored by Paolo Bonzini's avatar Paolo Bonzini
Browse files
KVM/arm64 updates for 6.2

- Enable the per-vcpu dirty-ring tracking mechanism, together with an
  option to keep the good old dirty log around for pages that are
  dirtied by something other than a vcpu.

- Switch to the relaxed parallel fault handling, using RCU to delay
  page table reclaim and giving better performance under load.

- Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping
  option, which multi-process VMMs such as crosvm rely on.

- Merge the pKVM shadow vcpu state tracking that allows the hypervisor
  to have its own view of a vcpu, keeping that state private.

- Add support for the PMUv3p5 architecture revision, bringing support
  for 64bit counters on systems that support it, and fix the
  no-quite-compliant CHAIN-ed counter support for the machines that
  actually exist out there.

- Fix a handful of minor issues around 52bit VA/PA support (64kB pages
  only) as a prefix of the oncoming support for 4kB and 16kB pages.

- Add/Enable/Fix a bunch of selftests covering memslots, breakpoints,
  stage-2 faults and access tracking. You name it, we got it, we
  probably broke it.

- Pick a small set of documentation and spelling fixes, because no
  good merge window would be complete without those.

As a side effect, this tag also drags:

- The 'kvmarm-fixes-6.1-3' tag as a dependency to the dirty-ring
  series

- A shared branch with the arm64 tree that repaints all the system
  registers to match the ARM ARM's naming, and resulting in
  interesting conflicts
parents 1e79a9e3 753d734f
Loading
Loading
Loading
Loading
+31 −10
Original line number Diff line number Diff line
@@ -7418,8 +7418,9 @@ hibernation of the host; however the VMM needs to manually save/restore the
tags as appropriate if the VM is migrated.

When this capability is enabled all memory in memslots must be mapped as
not-shareable (no MAP_SHARED), attempts to create a memslot with a
MAP_SHARED mmap will result in an -EINVAL return.
``MAP_ANONYMOUS`` or with a RAM-based file mapping (``tmpfs``, ``memfd``),
attempts to create a memslot with an invalid mmap will result in an
-EINVAL return.

When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
perform a bulk copy of tags to/from the guest.
@@ -7954,7 +7955,7 @@ regardless of what has actually been exposed through the CPUID leaf.
8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
----------------------------------------------------------

:Architectures: x86
:Architectures: x86, arm64
:Parameters: args[0] - size of the dirty log ring

KVM is capable of tracking dirty memory using ring buffers that are
@@ -8036,13 +8037,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl). To achieve that, one
needs to kick the vcpu out of KVM_RUN using a signal.  The resulting
vmexit ensures that all dirty GFNs are flushed to the dirty rings.

NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG.  After enabling
KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
machine will switch to ring-buffer dirty page tracking and further
KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.

NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
should be exposed by weakly ordered architecture, in order to indicate
the additional memory ordering requirements imposed on userspace when
@@ -8051,6 +8045,33 @@ Architecture with TSO-like ordering (such as x86) are allowed to
expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
to userspace.

After enabling the dirty rings, the userspace needs to detect the
capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the
ring structures can be backed by per-slot bitmaps. With this capability
advertised, it means the architecture can dirty guest pages without
vcpu/ring context, so that some of the dirty information will still be
maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
can't be enabled if the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
hasn't been enabled, or any memslot has been existing.

Note that the bitmap here is only a backup of the ring structure. The
use of the ring and bitmap combination is only beneficial if there is
only a very small amount of memory that is dirtied out of vcpu/ring
context. Otherwise, the stand-alone per-slot bitmap mechanism needs to
be considered.

To collect dirty bits in the backup bitmap, userspace can use the same
KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG isn't needed as long as all
the generation of the dirty bits is done in a single pass. Collecting
the dirty bitmap should be the very last thing that the VMM does before
considering the state as complete. VMM needs to ensure that the dirty
state is final and avoid missing dirty pages from another ioctl ordered
after the bitmap collection.

NOTE: One example of using the backup bitmap is saving arm64 vgic/its
tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
KVM device "kvm-arm-vgic-its" when dirty ring is enabled.

8.30 KVM_CAP_XEN_HVM
--------------------

+8 −6
Original line number Diff line number Diff line
@@ -23,21 +23,23 @@ the PV_TIME_FEATURES hypercall should be probed using the SMCCC 1.1
ARCH_FEATURES mechanism before calling it.

PV_TIME_FEATURES
    ============= ========    ==========

    ============= ========    =================================================
    Function ID:  (uint32)    0xC5000020
    PV_call_id:   (uint32)    The function to query for support.
                              Currently only PV_TIME_ST is supported.
    Return value: (int64)     NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant
                              PV-time feature is supported by the hypervisor.
    ============= ========    ==========
    ============= ========    =================================================

PV_TIME_ST
    ============= ========    ==========

    ============= ========    ==============================================
    Function ID:  (uint32)    0xC5000021
    Return value: (int64)     IPA of the stolen time data structure for this
                              VCPU. On failure:
                              NOT_SUPPORTED (-1)
    ============= ========    ==========
    ============= ========    ==============================================

The IPA returned by PV_TIME_ST should be mapped by the guest as normal memory
with inner and outer write back caching attributes, in the inner shareable
@@ -76,5 +78,5 @@ It is advisable that one or more 64k pages are set aside for the purpose of
these structures and not used for other purposes, this enables the guest to map
the region using 64k pages and avoids conflicting attributes with other memory.

For the user space interface see Documentation/virt/kvm/devices/vcpu.rst
section "3. GROUP: KVM_ARM_VCPU_PVTIME_CTRL".
For the user space interface see
:ref:`Documentation/virt/kvm/devices/vcpu.rst <kvm_arm_vcpu_pvtime_ctrl>`.
 No newline at end of file
+4 −1
Original line number Diff line number Diff line
@@ -52,7 +52,10 @@ KVM_DEV_ARM_VGIC_GRP_CTRL

    KVM_DEV_ARM_ITS_SAVE_TABLES
      save the ITS table data into guest RAM, at the location provisioned
      by the guest in corresponding registers/table entries.
      by the guest in corresponding registers/table entries. Should userspace
      require a form of dirty tracking to identify which pages are modified
      by the saving process, it should use a bitmap even if using another
      mechanism to track the memory dirtied by the vCPUs.

      The layout of the tables in guest memory defines an ABI. The entries
      are laid out in little endian format as described in the last paragraph.
+2 −0
Original line number Diff line number Diff line
@@ -171,6 +171,8 @@ configured values on other VCPUs. Userspace should configure the interrupt
numbers on at least one VCPU after creating all VCPUs and before running any
VCPUs.

.. _kvm_arm_vcpu_pvtime_ctrl:

3. GROUP: KVM_ARM_VCPU_PVTIME_CTRL
==================================

+1 −0
Original line number Diff line number Diff line
@@ -1965,6 +1965,7 @@ config ARM64_MTE
	depends on ARM64_PAN
	select ARCH_HAS_SUBPAGE_FAULTS
	select ARCH_USES_HIGH_VMA_FLAGS
	select ARCH_USES_PG_ARCH_X
	help
	  Memory Tagging (part of the ARMv8.5 Extensions) provides
	  architectural support for run-time, always-on detection of
Loading