xenbus: Avoid deadlock during suspend due to open transactions
During a suspend/resume, the xenwatch thread waits for all outstanding xenstore requests and transactions to complete. This does not work correctly for transactions started by userspace because it waits for them to complete after freezing userspace threads which means the transactions have no way of completing, resulting in a deadlock. This is trivial to reproduce by running this script and then suspending the VM: import pyxs, time c = pyxs.client.Client(xen_bus_path="/dev/xen/xenbus") c.connect() c.transaction() time.sleep(3600) Even if this deadlock were resolved, misbehaving userspace should not prevent a VM from being migrated. So, instead of waiting for these transactions to complete before suspending, store the current generation id for each transaction when it is started. The global generation id is incremented during resume. If the caller commits the transaction and the generation id does not match the current generation id, return EAGAIN so that they try again. If the transaction was instead discarded, return OK since no changes were made anyway. This only affects users of the xenbus file interface. In-kernel users of xenbus are assumed to be well-behaved and complete all transactions before freezing. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Please register or sign in to comment