Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20151110' into staging (a77067f6) · Commits · SUMMER2020 / students / proj-2021291

balloon.c

+11 −0

Original line number	Diff line number	Diff line
		@@ -36,6 +36,17 @@
		static QEMUBalloonEvent *balloon_event_fn;
		static QEMUBalloonStatus *balloon_stat_fn;
		static void *balloon_opaque;
		static bool balloon_inhibited;

		bool qemu_balloon_is_inhibited(void)
		{
		return balloon_inhibited;
		}

		void qemu_balloon_inhibit(bool state)
		{
		balloon_inhibited = state;
		}

		static bool have_balloon(Error **errp)
		{

docs/migration.txt

+191 −0

Original line number	Diff line number	Diff line
		@@ -291,3 +291,194 @@ save/send this state when we are in the middle of a pio operation
		(that is what ide_drive_pio_state_needed() checks). If DRQ_STAT is
		not enabled, the values on that fields are garbage and don't need to
		be sent.

		= Return path =

		In most migration scenarios there is only a single data path that runs
		from the source VM to the destination, typically along a single fd (although
		possibly with another fd or similar for some fast way of throwing pages across).

		However, some uses need two way communication; in particular the Postcopy
		destination needs to be able to request pages on demand from the source.

		For these scenarios there is a 'return path' from the destination to the source;
		qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for the return
		path.

		Source side
		Forward path - written by migration thread
		Return path - opened by main thread, read by return-path thread

		Destination side
		Forward path - read by main thread
		Return path - opened by main thread, written by main thread AND postcopy
		thread (protected by rp_mutex)

		= Postcopy =
		'Postcopy' migration is a way to deal with migrations that refuse to converge
		(or take too long to converge) its plus side is that there is an upper bound on
		the amount of migration traffic and time it takes, the down side is that during
		the postcopy phase, a failure of either side or the network connection causes
		the guest to be lost.

		In postcopy the destination CPUs are started before all the memory has been
		transferred, and accesses to pages that are yet to be transferred cause
		a fault that's translated by QEMU into a request to the source QEMU.

		Postcopy can be combined with precopy (i.e. normal migration) so that if precopy
		doesn't finish in a given time the switch is made to postcopy.

		=== Enabling postcopy ===

		To enable postcopy, issue this command on the monitor prior to the
		start of migration:

		migrate_set_capability x-postcopy-ram on

		The normal commands are then used to start a migration, which is still
		started in precopy mode. Issuing:

		migrate_start_postcopy

		will now cause the transition from precopy to postcopy.
		It can be issued immediately after migration is started or any
		time later on. Issuing it after the end of a migration is harmless.

		Note: During the postcopy phase, the bandwidth limits set using
		migrate_set_speed is ignored (to avoid delaying requested pages that
		the destination is waiting for).

		=== Postcopy device transfer ===

		Loading of device data may cause the device emulation to access guest RAM
		that may trigger faults that have to be resolved by the source, as such
		the migration stream has to be able to respond with page data during the
		device load, and hence the device data has to be read from the stream completely
		before the device load begins to free the stream up. This is achieved by
		'packaging' the device data into a blob that's read in one go.

		Source behaviour

		Until postcopy is entered the migration stream is identical to normal
		precopy, except for the addition of a 'postcopy advise' command at
		the beginning, to tell the destination that postcopy might happen.
		When postcopy starts the source sends the page discard data and then
		forms the 'package' containing:

		Command: 'postcopy listen'
		The device state
		A series of sections, identical to the precopy streams device state stream
		containing everything except postcopiable devices (i.e. RAM)
		Command: 'postcopy run'

		The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
		contents are formatted in the same way as the main migration stream.

		During postcopy the source scans the list of dirty pages and sends them
		to the destination without being requested (in much the same way as precopy),
		however when a page request is received from the destination, the dirty page
		scanning restarts from the requested location. This causes requested pages
		to be sent quickly, and also causes pages directly after the requested page
		to be sent quickly in the hope that those pages are likely to be used
		by the destination soon.

		Destination behaviour

		Initially the destination looks the same as precopy, with a single thread
		reading the migration stream; the 'postcopy advise' and 'discard' commands
		are processed to change the way RAM is managed, but don't affect the stream
		processing.

		------------------------------------------------------------------------------
		1 2 3 4 5 6 7
		main -----DISCARD-CMD_PACKAGED ( LISTEN DEVICE DEVICE DEVICE RUN )
		thread \| \|
		\| (page request)
		\| \___
		v \
		listen thread: --- page -- page -- page -- page -- page --

		a b c
		------------------------------------------------------------------------------

		On receipt of CMD_PACKAGED (1)
		All the data associated with the package - the ( ... ) section in the
		diagram - is read into memory (into a QEMUSizedBuffer), and the main thread
		recurses into qemu_loadvm_state_main to process the contents of the package (2)
		which contains commands (3,6) and devices (4...)

		On receipt of 'postcopy listen' - 3 -(i.e. the 1st command in the package)
		a new thread (a) is started that takes over servicing the migration stream,
		while the main thread carries on loading the package. It loads normal
		background page data (b) but if during a device load a fault happens (5) the
		returned page (c) is loaded by the listen thread allowing the main threads
		device load to carry on.

		The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the destination
		CPUs start running.
		At the end of the CMD_PACKAGED (7) the main thread returns to normal running behaviour
		and is no longer used by migration, while the listen thread carries
		on servicing page data until the end of migration.

		=== Postcopy states ===

		Postcopy moves through a series of states (see postcopy_state) from
		ADVISE->DISCARD->LISTEN->RUNNING->END

		Advise: Set at the start of migration if postcopy is enabled, even
		if it hasn't had the start command; here the destination
		checks that its OS has the support needed for postcopy, and performs
		setup to ensure the RAM mappings are suitable for later postcopy.
		The destination will fail early in migration at this point if the
		required OS support is not present.
		(Triggered by reception of POSTCOPY_ADVISE command)

		Discard: Entered on receipt of the first 'discard' command; prior to
		the first Discard being performed, hugepages are switched off
		(using madvise) to ensure that no new huge pages are created
		during the postcopy phase, and to cause any huge pages that
		have discards on them to be broken.

		Listen: The first command in the package, POSTCOPY_LISTEN, switches
		the destination state to Listen, and starts a new thread
		(the 'listen thread') which takes over the job of receiving
		pages off the migration stream, while the main thread carries
		on processing the blob. With this thread able to process page
		reception, the destination now 'sensitises' the RAM to detect
		any access to missing pages (on Linux using the 'userfault'
		system).

		Running: POSTCOPY_RUN causes the destination to synchronise all
		state and start the CPUs and IO devices running. The main
		thread now finishes processing the migration package and
		now carries on as it would for normal precopy migration
		(although it can't do the cleanup it would do as it
		finishes a normal migration).

		End: The listen thread can now quit, and perform the cleanup of migration
		state, the migration is now complete.

		=== Source side page maps ===

		The source side keeps two bitmaps during postcopy; 'the migration bitmap'
		and 'unsent map'. The 'migration bitmap' is basically the same as in
		the precopy case, and holds a bit to indicate that page is 'dirty' -
		i.e. needs sending. During the precopy phase this is updated as the CPU
		dirties pages, however during postcopy the CPUs are stopped and nothing
		should dirty anything any more.

		The 'unsent map' is used for the transition to postcopy. It is a bitmap that
		has a bit cleared whenever a page is sent to the destination, however during
		the transition to postcopy mode it is combined with the migration bitmap
		to form a set of pages that:
		a) Have been sent but then redirtied (which must be discarded)
		b) Have not yet been sent - which also must be discarded to cause any
		transparent huge pages built during precopy to be broken.

		Note that the contents of the unsentmap are sacrificed during the calculation
		of the discard set and thus aren't valid once in postcopy. The dirtymap
		is still valid and is used to ensure that no page is sent more than once. Any
		request for a page that has already been sent is ignored. Duplicate requests
		such as this can happen as a page is sent at about the same time the
		destination accesses it.

exec.c

+79 −13

Original line number	Diff line number	Diff line
		@@ -1377,6 +1377,11 @@ static RAMBlock *find_ram_block(ram_addr_t addr)
		return NULL;
		}

		const char qemu_ram_get_idstr(RAMBlock rb)
		{
		return rb->idstr;
		}

		/* Called with iothread lock held. */
		void qemu_ram_set_idstr(ram_addr_t addr, const char name, DeviceState dev)
		{
		@@ -1447,7 +1452,7 @@ int qemu_ram_resize(ram_addr_t base, ram_addr_t newsize, Error **errp)

		assert(block);

		newsize = TARGET_PAGE_ALIGN(newsize);
		newsize = HOST_PAGE_ALIGN(newsize);

		if (block->used_length == newsize) {
		return 0;
		@@ -1591,7 +1596,7 @@ ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
		return -1;
		}

		size = TARGET_PAGE_ALIGN(size);
		size = HOST_PAGE_ALIGN(size);
		new_block = g_malloc0(sizeof(*new_block));
		new_block->mr = mr;
		new_block->used_length = size;
		@@ -1627,8 +1632,8 @@ ram_addr_t qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
		ram_addr_t addr;
		Error *local_err = NULL;

		size = TARGET_PAGE_ALIGN(size);
		max_size = TARGET_PAGE_ALIGN(max_size);
		size = HOST_PAGE_ALIGN(size);
		max_size = HOST_PAGE_ALIGN(max_size);
		new_block = g_malloc0(sizeof(*new_block));
		new_block->mr = mr;
		new_block->resized = resized;
		@@ -1877,8 +1882,16 @@ static void qemu_ram_ptr_length(ram_addr_t addr, hwaddr size)
		}
		}

		/* Some of the softmmu routines need to translate from a host pointer
		* (typically a TLB entry) back to a ram offset.
		/*
		* Translates a host ptr back to a RAMBlock, a ram_addr and an offset
		* in that RAMBlock.
		*
		* ptr: Host pointer to look up
		* round_offset: If true round the result offset down to a page boundary
		* *ram_addr: set to result ram_addr
		* *offset: set to result offset within the RAMBlock
		*
		* Returns: RAMBlock (or NULL if not found)
		*
		* By the time this function returns, the returned pointer is not protected
		* by RCU anymore. If the caller is not within an RCU critical section and
		@@ -1886,18 +1899,22 @@ static void qemu_ram_ptr_length(ram_addr_t addr, hwaddr size)
		* pointer, such as a reference to the region that includes the incoming
		* ram_addr_t.
		*/
		MemoryRegion qemu_ram_addr_from_host(void ptr, ram_addr_t *ram_addr)
		RAMBlock qemu_ram_block_from_host(void ptr, bool round_offset,
		ram_addr_t *ram_addr,
		ram_addr_t *offset)
		{
		RAMBlock *block;
		uint8_t *host = ptr;
		MemoryRegion *mr;

		if (xen_enabled()) {
		rcu_read_lock();
		*ram_addr = xen_ram_addr_from_mapcache(ptr);
		mr = qemu_get_ram_block(*ram_addr)->mr;
		block = qemu_get_ram_block(*ram_addr);
		if (block) {
		*offset = (host - block->host);
		}
		rcu_read_unlock();
		return mr;
		return block;
		}

		rcu_read_lock();
		@@ -1920,10 +1937,49 @@ MemoryRegion qemu_ram_addr_from_host(void ptr, ram_addr_t *ram_addr)
		return NULL;

		found:
		*ram_addr = block->offset + (host - block->host);
		mr = block->mr;
		*offset = (host - block->host);
		if (round_offset) {
		*offset &= TARGET_PAGE_MASK;
		}
		ram_addr = block->offset + offset;
		rcu_read_unlock();
		return mr;
		return block;
		}

		/*
		* Finds the named RAMBlock
		*
		* name: The name of RAMBlock to find
		*
		* Returns: RAMBlock (or NULL if not found)
		*/
		RAMBlock qemu_ram_block_by_name(const char name)
		{
		RAMBlock *block;

		QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
		if (!strcmp(name, block->idstr)) {
		return block;
		}
		}

		return NULL;
		}

		/* Some of the softmmu routines need to translate from a host pointer
		(typically a TLB entry) back to a ram offset. */
		MemoryRegion qemu_ram_addr_from_host(void ptr, ram_addr_t *ram_addr)
		{
		RAMBlock *block;
		ram_addr_t offset; /* Not used */

		block = qemu_ram_block_from_host(ptr, false, ram_addr, &offset);

		if (!block) {
		return NULL;
		}

		return block->mr;
		}

		static void notdirty_mem_write(void *opaque, hwaddr ram_addr,
		@@ -3502,6 +3558,16 @@ int cpu_memory_rw_debug(CPUState *cpu, target_ulong addr,
		}
		return 0;
		}

		/*
		* Allows code that needs to deal with migration bitmaps etc to still be built
		* target independent.
		*/
		size_t qemu_target_page_bits(void)
		{
		return TARGET_PAGE_BITS;
		}

		#endif

		/*

hmp-commands.hx

+15 −0

Original line number	Diff line number	Diff line
		@@ -1005,6 +1005,21 @@ STEXI
		@item migrate_set_parameter @var{parameter} @var{value}
		@findex migrate_set_parameter
		Set the parameter @var{parameter} for migration.
		ETEXI

		{
		.name = "migrate_start_postcopy",
		.args_type = "",
		.params = "",
		.help = "Switch migration to postcopy mode",
		.mhandler.cmd = hmp_migrate_start_postcopy,
		},

		STEXI
		@item migrate_start_postcopy
		@findex migrate_start_postcopy
		Switch in-progress migration to postcopy mode. Ignored after the end of
		migration (or once already in postcopy).
		ETEXI

		{

hmp.c

+7 −0

Original line number	Diff line number	Diff line
		@@ -1293,6 +1293,13 @@ void hmp_client_migrate_info(Monitor mon, const QDict qdict)
		hmp_handle_error(mon, &err);
		}

		void hmp_migrate_start_postcopy(Monitor mon, const QDict qdict)
		{
		Error *err = NULL;
		qmp_migrate_start_postcopy(&err);
		hmp_handle_error(mon, &err);
		}

		void hmp_set_password(Monitor mon, const QDict qdict)
		{
		const char *protocol = qdict_get_str(qdict, "protocol");