Unverified Commit c8ab1a53 authored by openeuler-ci-bot's avatar openeuler-ci-bot Committed by Gitee
Browse files

!5545 backport dirty-ring feature

Merge Pull Request from: @wenzhiwei11 
 
The new dirty ring interface is another way to collect dirty pages for
the virtual machines. It is different from the existing dirty logging
interface in a few ways, majorly:

  - Data format: The dirty data was in a ring format rather than a
    bitmap format, so dirty bits to sync for dirty logging does not
    depend on the size of guest memory any more, but speed of
    dirtying.  Also, the dirty ring is per-vcpu, while the dirty
    bitmap is per-vm.

  - Data copy: The sync of dirty pages does not need data copy any more,
    but instead the ring is shared between the userspace and kernel by
    page sharings (mmap() on vcpu fd)

  - Interface: Instead of using the old KVM_GET_DIRTY_LOG,
    KVM_CLEAR_DIRTY_LOG interfaces, the new ring uses the new
    KVM_RESET_DIRTY_RINGS ioctl when we want to reset the collected
    dirty pages to protected mode again (works like
    KVM_CLEAR_DIRTY_LOG, but ring based).  To collecting dirty bits,
    we only need to read the ring data, no ioctl is needed. 
 
Link:https://gitee.com/openeuler/kernel/pulls/5545

 

Reviewed-by: default avatarKevin Zhu <zhukeqian1@huawei.com>
Signed-off-by: default avatarJialin Zhang <zhangjialin11@huawei.com>
parents 2306cf63 a1dc28f6
Loading
Loading
Loading
Loading
+98 −0
Original line number Diff line number Diff line
@@ -265,6 +265,17 @@ The KVM_RUN ioctl (cf.) communicates with userspace via a shared
memory region.  This ioctl returns the size of that region.  See the
KVM_RUN documentation for details.

Besides the size of the KVM_RUN communication region, other areas of
the VCPU file descriptor can be mmap-ed, including:

- if KVM_CAP_COALESCED_MMIO is available, a page at
  KVM_COALESCED_MMIO_PAGE_OFFSET * PAGE_SIZE; for historical reasons,
  this page is included in the result of KVM_GET_VCPU_MMAP_SIZE.
  KVM_CAP_COALESCED_MMIO is not documented yet.

- if KVM_CAP_DIRTY_LOG_RING is available, a number of pages at
  KVM_DIRTY_LOG_PAGE_OFFSET * PAGE_SIZE.  For more information on
  KVM_CAP_DIRTY_LOG_RING, see section 8.3.

4.6 KVM_SET_MEMORY_REGION
-------------------------
@@ -6763,6 +6774,93 @@ guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf
(0x40000001). Otherwise, a guest may use the paravirtual features
regardless of what has actually been exposed through the CPUID leaf.

8.29 KVM_CAP_DIRTY_LOG_RING
---------------------------

:Architectures: x86
:Parameters: args[0] - size of the dirty log ring

KVM is capable of tracking dirty memory using ring buffers that are
mmaped into userspace; there is one dirty ring per vcpu.

The dirty ring is available to userspace as an array of
``struct kvm_dirty_gfn``.  Each dirty entry it's defined as::

  struct kvm_dirty_gfn {
          __u32 flags;
          __u32 slot; /* as_id | slot_id */
          __u64 offset;
  };

The following values are defined for the flags field to define the
current state of the entry::

  #define KVM_DIRTY_GFN_F_DIRTY           BIT(0)
  #define KVM_DIRTY_GFN_F_RESET           BIT(1)
  #define KVM_DIRTY_GFN_F_MASK            0x3

Userspace should call KVM_ENABLE_CAP ioctl right after KVM_CREATE_VM
ioctl to enable this capability for the new guest and set the size of
the rings.  Enabling the capability is only allowed before creating any
vCPU, and the size of the ring must be a power of two.  The larger the
ring buffer, the less likely the ring is full and the VM is forced to
exit to userspace. The optimal size depends on the workload, but it is
recommended that it be at least 64 KiB (4096 entries).

Just like for dirty page bitmaps, the buffer tracks writes to
all user memory regions for which the KVM_MEM_LOG_DIRTY_PAGES flag was
set in KVM_SET_USER_MEMORY_REGION.  Once a memory region is registered
with the flag set, userspace can start harvesting dirty pages from the
ring buffer.

An entry in the ring buffer can be unused (flag bits ``00``),
dirty (flag bits ``01``) or harvested (flag bits ``1X``).  The
state machine for the entry is as follows::

          dirtied         harvested        reset
     00 -----------> 01 -------------> 1X -------+
      ^                                          |
      |                                          |
      +------------------------------------------+

To harvest the dirty pages, userspace accesses the mmaped ring buffer
to read the dirty GFNs.  If the flags has the DIRTY bit set (at this stage
the RESET bit must be cleared), then it means this GFN is a dirty GFN.
The userspace should harvest this GFN and mark the flags from state
``01b`` to ``1Xb`` (bit 0 will be ignored by KVM, but bit 1 must be set
to show that this GFN is harvested and waiting for a reset), and move
on to the next GFN.  The userspace should continue to do this until the
flags of a GFN have the DIRTY bit cleared, meaning that it has harvested
all the dirty GFNs that were available.

It's not necessary for userspace to harvest the all dirty GFNs at once.
However it must collect the dirty GFNs in sequence, i.e., the userspace
program cannot skip one dirty GFN to collect the one next to it.

After processing one or more entries in the ring buffer, userspace
calls the VM ioctl KVM_RESET_DIRTY_RINGS to notify the kernel about
it, so that the kernel will reprotect those collected GFNs.
Therefore, the ioctl must be called *before* reading the content of
the dirty pages.

The dirty ring can get full.  When it happens, the KVM_RUN of the
vcpu will return with exit reason KVM_EXIT_DIRTY_LOG_FULL.

The dirty ring interface has a major difference comparing to the
KVM_GET_DIRTY_LOG interface in that, when reading the dirty ring from
userspace, it's still possible that the kernel has not yet flushed the
processor's dirty page buffers into the kernel buffer (with dirty bitmaps, the
flushing is done by the KVM_GET_DIRTY_LOG ioctl).  To achieve that, one
needs to kick the vcpu out of KVM_RUN using a signal.  The resulting
vmexit ensures that all dirty GFNs are flushed to the dirty rings.

NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG.  After enabling
KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
machine will switch to ring-buffer dirty page tracking and further
KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.

8.31 KVM_CAP_PTP_KVM
--------------------

+5 −1
Original line number Diff line number Diff line
@@ -1340,6 +1340,7 @@ struct kvm_x86_ops {
	void (*enable_log_dirty_pt_masked)(struct kvm *kvm,
					   struct kvm_memory_slot *slot,
					   gfn_t offset, unsigned long mask);
	int (*cpu_dirty_log_size)(void);

	/* pmu operations of sub-arch */
	const struct kvm_pmu_ops *pmu_ops;
@@ -1808,7 +1809,8 @@ void __kvm_request_immediate_exit(struct kvm_vcpu *vcpu);

int kvm_is_in_guest(void);

int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size);
void __user *__x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa,
				     u32 size);
bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu);
bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu);

@@ -1855,5 +1857,7 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
#define GET_SMSTATE(type, buf, offset)		\
	(*(type *)((buf) + (offset) - 0x7e00))

int kvm_cpu_dirty_log_size(void);

int alloc_all_memslots_rmaps(struct kvm *kvm);
#endif /* _ASM_X86_KVM_HOST_H */
+1 −0
Original line number Diff line number Diff line
@@ -12,6 +12,7 @@

#define KVM_PIO_PAGE_OFFSET 1
#define KVM_COALESCED_MMIO_PAGE_OFFSET 2
#define KVM_DIRTY_LOG_PAGE_OFFSET 64

#define DE_VECTOR 0
#define DB_VECTOR 1
+2 −1
Original line number Diff line number Diff line
@@ -10,7 +10,8 @@ endif
KVM := ../../../virt/kvm

kvm-y			+= $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \
				$(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o
				$(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o \
				$(KVM)/dirty_ring.o
kvm-$(CONFIG_KVM_ASYNC_PF)	+= $(KVM)/async_pf.o

kvm-y			+= x86.o emulate.o i8259.o irq.o lapic.o \
+9 −1
Original line number Diff line number Diff line
@@ -808,7 +808,7 @@ gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn,
	slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
	if (!slot || slot->flags & KVM_MEMSLOT_INVALID)
		return NULL;
	if (no_dirty_log && slot->dirty_bitmap)
	if (no_dirty_log && kvm_slot_dirty_track_enabled(slot))
		return NULL;

	return slot;
@@ -1287,6 +1287,14 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
		kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
}

int kvm_cpu_dirty_log_size(void)
{
	if (kvm_x86_ops.cpu_dirty_log_size)
		return kvm_x86_ops.cpu_dirty_log_size();
	
	return 0;
}

bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
				    struct kvm_memory_slot *slot, u64 gfn)
{
Loading