Commit a77067f6 authored by Peter Maydell's avatar Peter Maydell
Browse files

Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20151110' into staging



migration/next for 20151110

# gpg: Signature made Tue 10 Nov 2015 14:23:26 GMT using RSA key ID 5872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>"
# gpg:                 aka "Juan Quintela <quintela@trasno.org>"

* remotes/juanquintela/tags/migration/20151110: (57 commits)
  migration: qemu_savevm_state_cleanup becomes mandatory operation
  Inhibit ballooning during postcopy
  Disable mlock around incoming postcopy
  End of migration for postcopy
  Postcopy: Mark nohugepage before discard
  postcopy: Wire up loadvm_postcopy_handle_ commands
  Start up a postcopy/listener thread ready for incoming page data
  Postcopy; Handle userfault requests
  Round up RAMBlock sizes to host page sizes
  Host page!=target page: Cleanup bitmaps
  Don't iterate on precopy-only devices during postcopy
  Don't sync dirty bitmaps in postcopy
  postcopy: Check order of received target pages
  Postcopy: Use helpers to map pages during migration
  postcopy_ram.c: place_page and helpers
  Page request: Consume pages off the post-copy queue
  Page request: Process incoming page request
  Page request: Add MIG_RP_MSG_REQ_PAGES reverse command
  Postcopy: End of iteration
  Postcopy: Postcopy startup in migration thread
  ...

Signed-off-by: default avatarPeter Maydell <peter.maydell@linaro.org>
parents a1a88589 15b3b8ea
Loading
Loading
Loading
Loading
+11 −0
Original line number Diff line number Diff line
@@ -36,6 +36,17 @@
static QEMUBalloonEvent *balloon_event_fn;
static QEMUBalloonStatus *balloon_stat_fn;
static void *balloon_opaque;
static bool balloon_inhibited;

bool qemu_balloon_is_inhibited(void)
{
    return balloon_inhibited;
}

void qemu_balloon_inhibit(bool state)
{
    balloon_inhibited = state;
}

static bool have_balloon(Error **errp)
{
+191 −0
Original line number Diff line number Diff line
@@ -291,3 +291,194 @@ save/send this state when we are in the middle of a pio operation
(that is what ide_drive_pio_state_needed() checks).  If DRQ_STAT is
not enabled, the values on that fields are garbage and don't need to
be sent.

= Return path =

In most migration scenarios there is only a single data path that runs
from the source VM to the destination, typically along a single fd (although
possibly with another fd or similar for some fast way of throwing pages across).

However, some uses need two way communication; in particular the Postcopy
destination needs to be able to request pages on demand from the source.

For these scenarios there is a 'return path' from the destination to the source;
qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for the return
path.

  Source side
     Forward path - written by migration thread
     Return path  - opened by main thread, read by return-path thread

  Destination side
     Forward path - read by main thread
     Return path  - opened by main thread, written by main thread AND postcopy
                    thread (protected by rp_mutex)

= Postcopy =
'Postcopy' migration is a way to deal with migrations that refuse to converge
(or take too long to converge) its plus side is that there is an upper bound on
the amount of migration traffic and time it takes, the down side is that during
the postcopy phase, a failure of *either* side or the network connection causes
the guest to be lost.

In postcopy the destination CPUs are started before all the memory has been
transferred, and accesses to pages that are yet to be transferred cause
a fault that's translated by QEMU into a request to the source QEMU.

Postcopy can be combined with precopy (i.e. normal migration) so that if precopy
doesn't finish in a given time the switch is made to postcopy.

=== Enabling postcopy ===

To enable postcopy, issue this command on the monitor prior to the
start of migration:

migrate_set_capability x-postcopy-ram on

The normal commands are then used to start a migration, which is still
started in precopy mode.  Issuing:

migrate_start_postcopy

will now cause the transition from precopy to postcopy.
It can be issued immediately after migration is started or any
time later on.  Issuing it after the end of a migration is harmless.

Note: During the postcopy phase, the bandwidth limits set using
migrate_set_speed is ignored (to avoid delaying requested pages that
the destination is waiting for).

=== Postcopy device transfer ===

Loading of device data may cause the device emulation to access guest RAM
that may trigger faults that have to be resolved by the source, as such
the migration stream has to be able to respond with page data *during* the
device load, and hence the device data has to be read from the stream completely
before the device load begins to free the stream up.  This is achieved by
'packaging' the device data into a blob that's read in one go.

Source behaviour

Until postcopy is entered the migration stream is identical to normal
precopy, except for the addition of a 'postcopy advise' command at
the beginning, to tell the destination that postcopy might happen.
When postcopy starts the source sends the page discard data and then
forms the 'package' containing:

   Command: 'postcopy listen'
   The device state
      A series of sections, identical to the precopy streams device state stream
      containing everything except postcopiable devices (i.e. RAM)
   Command: 'postcopy run'

The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
contents are formatted in the same way as the main migration stream.

During postcopy the source scans the list of dirty pages and sends them
to the destination without being requested (in much the same way as precopy),
however when a page request is received from the destination, the dirty page
scanning restarts from the requested location.  This causes requested pages
to be sent quickly, and also causes pages directly after the requested page
to be sent quickly in the hope that those pages are likely to be used
by the destination soon.

Destination behaviour

Initially the destination looks the same as precopy, with a single thread
reading the migration stream; the 'postcopy advise' and 'discard' commands
are processed to change the way RAM is managed, but don't affect the stream
processing.

------------------------------------------------------------------------------
                        1      2   3     4 5                      6   7
main -----DISCARD-CMD_PACKAGED ( LISTEN  DEVICE     DEVICE DEVICE RUN )
thread                             |       |
                                   |     (page request)
                                   |        \___
                                   v            \
listen thread:                     --- page -- page -- page -- page -- page --

                                   a   b        c
------------------------------------------------------------------------------

On receipt of CMD_PACKAGED (1)
   All the data associated with the package - the ( ... ) section in the
diagram - is read into memory (into a QEMUSizedBuffer), and the main thread
recurses into qemu_loadvm_state_main to process the contents of the package (2)
which contains commands (3,6) and devices (4...)

On receipt of 'postcopy listen' - 3 -(i.e. the 1st command in the package)
a new thread (a) is started that takes over servicing the migration stream,
while the main thread carries on loading the package.   It loads normal
background page data (b) but if during a device load a fault happens (5) the
returned page (c) is loaded by the listen thread allowing the main threads
device load to carry on.

The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the destination
CPUs start running.
At the end of the CMD_PACKAGED (7) the main thread returns to normal running behaviour
and is no longer used by migration, while the listen thread carries
on servicing page data until the end of migration.

=== Postcopy states ===

Postcopy moves through a series of states (see postcopy_state) from
ADVISE->DISCARD->LISTEN->RUNNING->END

  Advise:  Set at the start of migration if postcopy is enabled, even
           if it hasn't had the start command; here the destination
           checks that its OS has the support needed for postcopy, and performs
           setup to ensure the RAM mappings are suitable for later postcopy.
           The destination will fail early in migration at this point if the
           required OS support is not present.
           (Triggered by reception of POSTCOPY_ADVISE command)

  Discard: Entered on receipt of the first 'discard' command; prior to
           the first Discard being performed, hugepages are switched off
           (using madvise) to ensure that no new huge pages are created
           during the postcopy phase, and to cause any huge pages that
           have discards on them to be broken.

  Listen:  The first command in the package, POSTCOPY_LISTEN, switches
           the destination state to Listen, and starts a new thread
           (the 'listen thread') which takes over the job of receiving
           pages off the migration stream, while the main thread carries
           on processing the blob.  With this thread able to process page
           reception, the destination now 'sensitises' the RAM to detect
           any access to missing pages (on Linux using the 'userfault'
           system).

  Running: POSTCOPY_RUN causes the destination to synchronise all
           state and start the CPUs and IO devices running.  The main
           thread now finishes processing the migration package and
           now carries on as it would for normal precopy migration
           (although it can't do the cleanup it would do as it
           finishes a normal migration).

  End:     The listen thread can now quit, and perform the cleanup of migration
           state, the migration is now complete.

=== Source side page maps ===

The source side keeps two bitmaps during postcopy; 'the migration bitmap'
and 'unsent map'.  The 'migration bitmap' is basically the same as in
the precopy case, and holds a bit to indicate that page is 'dirty' -
i.e. needs sending.  During the precopy phase this is updated as the CPU
dirties pages, however during postcopy the CPUs are stopped and nothing
should dirty anything any more.

The 'unsent map' is used for the transition to postcopy. It is a bitmap that
has a bit cleared whenever a page is sent to the destination, however during
the transition to postcopy mode it is combined with the migration bitmap
to form a set of pages that:
   a) Have been sent but then redirtied (which must be discarded)
   b) Have not yet been sent - which also must be discarded to cause any
      transparent huge pages built during precopy to be broken.

Note that the contents of the unsentmap are sacrificed during the calculation
of the discard set and thus aren't valid once in postcopy.  The dirtymap
is still valid and is used to ensure that no page is sent more than once.  Any
request for a page that has already been sent is ignored.  Duplicate requests
such as this can happen as a page is sent at about the same time the
destination accesses it.
+79 −13
Original line number Diff line number Diff line
@@ -1377,6 +1377,11 @@ static RAMBlock *find_ram_block(ram_addr_t addr)
    return NULL;
}

const char *qemu_ram_get_idstr(RAMBlock *rb)
{
    return rb->idstr;
}

/* Called with iothread lock held.  */
void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev)
{
@@ -1447,7 +1452,7 @@ int qemu_ram_resize(ram_addr_t base, ram_addr_t newsize, Error **errp)

    assert(block);

    newsize = TARGET_PAGE_ALIGN(newsize);
    newsize = HOST_PAGE_ALIGN(newsize);

    if (block->used_length == newsize) {
        return 0;
@@ -1591,7 +1596,7 @@ ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
        return -1;
    }

    size = TARGET_PAGE_ALIGN(size);
    size = HOST_PAGE_ALIGN(size);
    new_block = g_malloc0(sizeof(*new_block));
    new_block->mr = mr;
    new_block->used_length = size;
@@ -1627,8 +1632,8 @@ ram_addr_t qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
    ram_addr_t addr;
    Error *local_err = NULL;

    size = TARGET_PAGE_ALIGN(size);
    max_size = TARGET_PAGE_ALIGN(max_size);
    size = HOST_PAGE_ALIGN(size);
    max_size = HOST_PAGE_ALIGN(max_size);
    new_block = g_malloc0(sizeof(*new_block));
    new_block->mr = mr;
    new_block->resized = resized;
@@ -1877,8 +1882,16 @@ static void *qemu_ram_ptr_length(ram_addr_t addr, hwaddr *size)
    }
}

/* Some of the softmmu routines need to translate from a host pointer
 * (typically a TLB entry) back to a ram offset.
/*
 * Translates a host ptr back to a RAMBlock, a ram_addr and an offset
 * in that RAMBlock.
 *
 * ptr: Host pointer to look up
 * round_offset: If true round the result offset down to a page boundary
 * *ram_addr: set to result ram_addr
 * *offset: set to result offset within the RAMBlock
 *
 * Returns: RAMBlock (or NULL if not found)
 *
 * By the time this function returns, the returned pointer is not protected
 * by RCU anymore.  If the caller is not within an RCU critical section and
@@ -1886,18 +1899,22 @@ static void *qemu_ram_ptr_length(ram_addr_t addr, hwaddr *size)
 * pointer, such as a reference to the region that includes the incoming
 * ram_addr_t.
 */
MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
                                   ram_addr_t *ram_addr,
                                   ram_addr_t *offset)
{
    RAMBlock *block;
    uint8_t *host = ptr;
    MemoryRegion *mr;

    if (xen_enabled()) {
        rcu_read_lock();
        *ram_addr = xen_ram_addr_from_mapcache(ptr);
        mr = qemu_get_ram_block(*ram_addr)->mr;
        block = qemu_get_ram_block(*ram_addr);
        if (block) {
            *offset = (host - block->host);
        }
        rcu_read_unlock();
        return mr;
        return block;
    }

    rcu_read_lock();
@@ -1920,10 +1937,49 @@ MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
    return NULL;

found:
    *ram_addr = block->offset + (host - block->host);
    mr = block->mr;
    *offset = (host - block->host);
    if (round_offset) {
        *offset &= TARGET_PAGE_MASK;
    }
    *ram_addr = block->offset + *offset;
    rcu_read_unlock();
    return mr;
    return block;
}

/*
 * Finds the named RAMBlock
 *
 * name: The name of RAMBlock to find
 *
 * Returns: RAMBlock (or NULL if not found)
 */
RAMBlock *qemu_ram_block_by_name(const char *name)
{
    RAMBlock *block;

    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
        if (!strcmp(name, block->idstr)) {
            return block;
        }
    }

    return NULL;
}

/* Some of the softmmu routines need to translate from a host pointer
   (typically a TLB entry) back to a ram offset.  */
MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
{
    RAMBlock *block;
    ram_addr_t offset; /* Not used */

    block = qemu_ram_block_from_host(ptr, false, ram_addr, &offset);

    if (!block) {
        return NULL;
    }

    return block->mr;
}

static void notdirty_mem_write(void *opaque, hwaddr ram_addr,
@@ -3502,6 +3558,16 @@ int cpu_memory_rw_debug(CPUState *cpu, target_ulong addr,
    }
    return 0;
}

/*
 * Allows code that needs to deal with migration bitmaps etc to still be built
 * target independent.
 */
size_t qemu_target_page_bits(void)
{
    return TARGET_PAGE_BITS;
}

#endif

/*
+15 −0
Original line number Diff line number Diff line
@@ -1005,6 +1005,21 @@ STEXI
@item migrate_set_parameter @var{parameter} @var{value}
@findex migrate_set_parameter
Set the parameter @var{parameter} for migration.
ETEXI

    {
        .name       = "migrate_start_postcopy",
        .args_type  = "",
        .params     = "",
        .help       = "Switch migration to postcopy mode",
        .mhandler.cmd = hmp_migrate_start_postcopy,
    },

STEXI
@item migrate_start_postcopy
@findex migrate_start_postcopy
Switch in-progress migration to postcopy mode. Ignored after the end of
migration (or once already in postcopy).
ETEXI

    {
+7 −0
Original line number Diff line number Diff line
@@ -1293,6 +1293,13 @@ void hmp_client_migrate_info(Monitor *mon, const QDict *qdict)
    hmp_handle_error(mon, &err);
}

void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict)
{
    Error *err = NULL;
    qmp_migrate_start_postcopy(&err);
    hmp_handle_error(mon, &err);
}

void hmp_set_password(Monitor *mon, const QDict *qdict)
{
    const char *protocol  = qdict_get_str(qdict, "protocol");
Loading