Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20170228a' into staging (251501a3) · Commits · SUMMER2020 / students / proj-2021291

docs/migration.txt

+71 −0

Original line number	Diff line number	Diff line
		@@ -161,6 +161,11 @@ include/hw/hw.h.

		=== More about versions ===

		Version numbers are intended for major incompatible changes to the
		migration of a device, and using them breaks backwards-migration
		compatibility; in general most changes can be made by adding Subsections
		(see below) or _TEST macros (see below) which won't break compatibility.

		You can see that there are several version fields:

		- version_id: the maximum version_id supported by VMState for that device.
		@@ -175,6 +180,9 @@ version_id. And the function load_state_old() (if present) is able to
		load state from minimum_version_id_old to minimum_version_id. This
		function is deprecated and will be removed when no more users are left.

		Saving state will always create a section with the 'version_id' value
		and thus can't be loaded by any older QEMU.

		=== Massaging functions ===

		Sometimes, it is not enough to be able to save the state directly
		@@ -292,6 +300,56 @@ save/send this state when we are in the middle of a pio operation
		not enabled, the values on that fields are garbage and don't need to
		be sent.

		Using a condition function that checks a 'property' to determine whether
		to send a subsection allows backwards migration compatibility when
		new subsections are added.

		For example;
		a) Add a new property using DEFINE_PROP_BOOL - e.g. support-foo and
		default it to true.
		b) Add an entry to the HW_COMPAT_ for the previous version
		that sets the property to false.
		c) Add a static bool support_foo function that tests the property.
		d) Add a subsection with a .needed set to the support_foo function
		e) (potentially) Add a pre_load that sets up a default value for 'foo'
		to be used if the subsection isn't loaded.

		Now that subsection will not be generated when using an older
		machine type and the migration stream will be accepted by older
		QEMU versions. pre-load functions can be used to initialise state
		on the newer version so that they default to suitable values
		when loading streams created by older QEMU versions that do not
		generate the subsection.

		In some cases subsections are added for data that had been accidentally
		omitted by earlier versions; if the missing data causes the migration
		process to succeed but the guest to behave badly then it may be better
		to send the subsection and cause the migration to explicitly fail
		with the unknown subsection error. If the bad behaviour only happens
		with certain data values, making the subsection conditional on
		the data value (rather than the machine type) allows migrations to succeed
		in most cases. In general the preference is to tie the subsection to
		the machine type, and allow reliable migrations, unless the behaviour
		from omission of the subsection is really bad.

		= Not sending existing elements =

		Sometimes members of the VMState are no longer needed;
		removing them will break migration compatibility
		making them version dependent and bumping the version will break backwards
		migration compatibility.

		The best way is to:
		a) Add a new property/compatibility/function in the same way for subsections
		above.
		b) replace the VMSTATE macro with the _TEST version of the macro, e.g.:
		VMSTATE_UINT32(foo, barstruct)
		becomes
		VMSTATE_UINT32_TEST(foo, barstruct, pre_version_baz)

		Sometime in the future when we no longer care about the ancient
		versions these can be killed off.

		= Return path =

		In most migration scenarios there is only a single data path that runs
		@@ -482,3 +540,16 @@ request for a page that has already been sent is ignored. Duplicate requests
		such as this can happen as a page is sent at about the same time the
		destination accesses it.

		=== Postcopy with hugepages ===

		Postcopy now works with hugetlbfs backed memory:
		a) The linux kernel on the destination must support userfault on hugepages.
		b) The huge-page configuration on the source and destination VMs must be
		identical; i.e. RAMBlocks on both sides must use the same page size.
		c) Note that -mem-path /dev/hugepages will fall back to allocating normal
		RAM if it doesn't have enough hugepages, triggering (b) to fail.
		Using -mem-prealloc enforces the allocation using hugepages.
		d) Care should be taken with the size of hugepage used; postcopy with 2MB
		hugepages works well, however 1GB hugepages are likely to be problematic
		since it takes ~1 second to transfer a 1GB hugepage across a 10Gbps link,
		and until the full page is transferred the destination thread is blocked.

exec.c

+83 −0

Original line number	Diff line number	Diff line
		@@ -45,6 +45,12 @@
		#include "exec/address-spaces.h"
		#include "sysemu/xen-mapcache.h"
		#include "trace-root.h"

		#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
		#include <fcntl.h>
		#include <linux/falloc.h>
		#endif

		#endif
		#include "exec/cpu-all.h"
		#include "qemu/rcu_queue.h"
		@@ -1518,6 +1524,19 @@ size_t qemu_ram_pagesize(RAMBlock *rb)
		return rb->page_size;
		}

		/* Returns the largest size of page in use */
		size_t qemu_ram_pagesize_largest(void)
		{
		RAMBlock *block;
		size_t largest = 0;

		QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
		largest = MAX(largest, qemu_ram_pagesize(block));
		}

		return largest;
		}

		static int memory_try_enable_merging(void *addr, size_t len)
		{
		if (!machine_mem_merge(current_machine)) {
		@@ -3294,4 +3313,68 @@ int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
		rcu_read_unlock();
		return ret;
		}

		/*
		* Unmap pages of memory from start to start+length such that
		* they a) read as 0, b) Trigger whatever fault mechanism
		* the OS provides for postcopy.
		* The pages must be unmapped by the end of the function.
		* Returns: 0 on success, none-0 on failure
		*
		*/
		int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
		{
		int ret = -1;

		uint8_t *host_startaddr = rb->host + start;

		if ((uintptr_t)host_startaddr & (rb->page_size - 1)) {
		error_report("ram_block_discard_range: Unaligned start address: %p",
		host_startaddr);
		goto err;
		}

		if ((start + length) <= rb->used_length) {
		uint8_t *host_endaddr = host_startaddr + length;
		if ((uintptr_t)host_endaddr & (rb->page_size - 1)) {
		error_report("ram_block_discard_range: Unaligned end address: %p",
		host_endaddr);
		goto err;
		}

		errno = ENOTSUP; /* If we are missing MADVISE etc */

		if (rb->page_size == qemu_host_page_size) {
		#if defined(CONFIG_MADVISE)
		/* Note: We need the madvise MADV_DONTNEED behaviour of definitely
		* freeing the page.
		*/
		ret = madvise(host_startaddr, length, MADV_DONTNEED);
		#endif
		} else {
		/* Huge page case - unfortunately it can't do DONTNEED, but
		* it can do the equivalent by FALLOC_FL_PUNCH_HOLE in the
		* huge page file.
		*/
		#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
		ret = fallocate(rb->fd, FALLOC_FL_PUNCH_HOLE \| FALLOC_FL_KEEP_SIZE,
		start, length);
		#endif
		}
		if (ret) {
		ret = -errno;
		error_report("ram_block_discard_range: Failed to discard range "
		"%s:%" PRIx64 " +%zx (%d)",
		rb->idstr, start, length, ret);
		}
		} else {
		error_report("ram_block_discard_range: Overrun block '%s' (%" PRIu64
		"/%zx/" RAM_ADDR_FMT")",
		rb->idstr, start, length, rb->used_length);
		}

		err:
		return ret;
		}

		#endif

hw/core/qdev.c

+7 −0

Original line number	Diff line number	Diff line
		@@ -37,6 +37,7 @@
		#include "hw/boards.h"
		#include "hw/sysbus.h"
		#include "qapi-event.h"
		#include "migration/migration.h"

		int qdev_hotplug = 0;
		static bool qdev_hot_added = false;
		@@ -903,6 +904,7 @@ static void device_set_realized(Object obj, bool value, Error *errp)
		Error *local_err = NULL;
		bool unattached_parent = false;
		static int unattached_count;
		int ret;

		if (dev->hotplugged && !dc->hotpluggable) {
		error_setg(errp, QERR_DEVICE_NO_HOTPLUG, object_get_typename(obj));
		@@ -910,6 +912,11 @@ static void device_set_realized(Object obj, bool value, Error *errp)
		}

		if (value && !dev->realized) {
		ret = check_migratable(obj, &local_err);
		if (ret < 0) {
		goto fail;
		}

		if (!obj->parent) {
		gchar *name = g_strdup_printf("device[%d]", unattached_count++);

hw/usb/bus.c

+0 −19

Original line number	Diff line number	Diff line
		@@ -8,7 +8,6 @@
		#include "monitor/monitor.h"
		#include "trace.h"
		#include "qemu/cutils.h"
		#include "migration/migration.h"

		static void usb_bus_dev_print(Monitor mon, DeviceState qdev, int indent);

		@@ -688,8 +687,6 @@ USBDevice usbdevice_create(const char cmdline)
		const char *params;
		int len;
		USBDevice *dev;
		ObjectClass *klass;
		DeviceClass *dc;

		params = strchr(cmdline,':');
		if (params) {
		@@ -724,22 +721,6 @@ USBDevice usbdevice_create(const char cmdline)
		return NULL;
		}

		klass = object_class_by_name(f->name);
		if (klass == NULL) {
		error_report("Device '%s' not found", f->name);
		return NULL;
		}

		dc = DEVICE_CLASS(klass);

		if (only_migratable) {
		if (dc->vmsd->unmigratable) {
		error_report("Device %s is not migratable, but --only-migratable "
		"was specified", f->name);
		return NULL;
		}
		}

		if (f->usbdevice_init) {
		dev = f->usbdevice_init(bus, params);
		} else {

include/exec/cpu-common.h

+2 −0

Original line number	Diff line number	Diff line
		@@ -64,6 +64,7 @@ void qemu_ram_set_idstr(RAMBlock block, const char name, DeviceState *dev);
		void qemu_ram_unset_idstr(RAMBlock *block);
		const char qemu_ram_get_idstr(RAMBlock rb);
		size_t qemu_ram_pagesize(RAMBlock *block);
		size_t qemu_ram_pagesize_largest(void);

		void cpu_physical_memory_rw(hwaddr addr, uint8_t *buf,
		int len, int is_write);
		@@ -105,6 +106,7 @@ typedef int (RAMBlockIterFunc)(const char block_name, void host_addr,
		ram_addr_t offset, ram_addr_t length, void *opaque);

		int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
		int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length);

		#endif