Commit cac85e46 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge tag 'vfio-v6.3-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Remove redundant resource check in vfio-platform (Angus Chen)

 - Use GFP_KERNEL_ACCOUNT for persistent userspace allocations, allowing
   removal of arbitrary kernel limits in favor of cgroup control (Yishai
   Hadas)

 - mdev tidy-ups, including removing the module-only build restriction
   for sample drivers, Kconfig changes to select mdev support,
   documentation movement to keep sample driver usage instructions with
   sample drivers rather than with API docs, remove references to
   out-of-tree drivers in docs (Christoph Hellwig)

 - Fix collateral breakages from mdev Kconfig changes (Arnd Bergmann)

 - Make mlx5 migration support match device support, improve source and
   target flows to improve pre-copy support and reduce downtime (Yishai
   Hadas)

 - Convert additional mdev sysfs case to use sysfs_emit() (Bo Liu)

 - Resolve copy-paste error in mdev mbochs sample driver Kconfig (Ye
   Xingchen)

 - Avoid propagating missing reset error in vfio-platform if reset
   requirement is relaxed by module option (Tomasz Duszynski)

 - Range size fixes in mlx5 variant driver for missed last byte and
   stricter range calculation (Yishai Hadas)

 - Fixes to suspended vaddr support and locked_vm accounting, excluding
   mdev configurations from the former due to potential to indefinitely
   block kernel threads, fix underflow and restore locked_vm on new mm
   (Steve Sistare)

 - Update outdated vfio documentation due to new IOMMUFD interfaces in
   recent kernels (Yi Liu)

 - Resolve deadlock between group_lock and kvm_lock, finally (Matthew
   Rosato)

 - Fix NULL pointer in group initialization error path with IOMMUFD (Yan
   Zhao)

* tag 'vfio-v6.3-rc1' of https://github.com/awilliam/linux-vfio: (32 commits)
  vfio: Fix NULL pointer dereference caused by uninitialized group->iommufd
  docs: vfio: Update vfio.rst per latest interfaces
  vfio: Update the kdoc for vfio_device_ops
  vfio/mlx5: Fix range size calculation upon tracker creation
  vfio: no need to pass kvm pointer during device open
  vfio: fix deadlock between group lock and kvm lock
  vfio: revert "iommu driver notify callback"
  vfio/type1: revert "implement notify callback"
  vfio/type1: revert "block on invalid vaddr"
  vfio/type1: restore locked_vm
  vfio/type1: track locked_vm per dma
  vfio/type1: prevent underflow of locked_vm via exec()
  vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR
  vfio: platform: ignore missing reset if disabled at module init
  vfio/mlx5: Improve the target side flow to reduce downtime
  vfio/mlx5: Improve the source side flow upon pre_copy
  vfio/mlx5: Check whether VF is migratable
  samples: fix the prompt about SAMPLE_VFIO_MDEV_MBOCHS
  vfio/mdev: Use sysfs_emit() to instead of sprintf()
  vfio-mdev: add back CONFIG_VFIO dependency
  ...
parents 84cc6674 d649c34c
Loading
Loading
Loading
Loading
+1 −107
Original line number Diff line number Diff line
@@ -60,7 +60,7 @@ devices as examples, as these devices are the first devices to use this module::
     |   mdev.ko     |
     | +-----------+ |  mdev_register_parent() +--------------+
     | |           | +<------------------------+              |
     | |           | |                         |  nvidia.ko   |<-> physical
     | |           | |                         | ccw_device.ko|<-> physical
     | |           | +------------------------>+              |    device
     | |           | |        callbacks        +--------------+
     | | Physical  | |
@@ -69,12 +69,6 @@ devices as examples, as these devices are the first devices to use this module::
     | |           | |                         |  i915.ko     |<-> physical
     | |           | +------------------------>+              |    device
     | |           | |        callbacks        +--------------+
     | |           | |
     | |           | |  mdev_register_parent() +--------------+
     | |           | +<------------------------+              |
     | |           | |                         | ccw_device.ko|<-> physical
     | |           | +------------------------>+              |    device
     | |           | |        callbacks        +--------------+
     | +-----------+ |
     +---------------+

@@ -270,106 +264,6 @@ these callbacks are supported in the TYPE1 IOMMU module. To enable them for
other IOMMU backend modules, such as PPC64 sPAPR module, they need to provide
these two callback functions.

Using the Sample Code
=====================

mtty.c in samples/vfio-mdev/ directory is a sample driver program to
demonstrate how to use the mediated device framework.

The sample driver creates an mdev device that simulates a serial port over a PCI
card.

1. Build and load the mtty.ko module.

   This step creates a dummy device, /sys/devices/virtual/mtty/mtty/

   Files in this device directory in sysfs are similar to the following::

     # tree /sys/devices/virtual/mtty/mtty/
        /sys/devices/virtual/mtty/mtty/
        |-- mdev_supported_types
        |   |-- mtty-1
        |   |   |-- available_instances
        |   |   |-- create
        |   |   |-- device_api
        |   |   |-- devices
        |   |   `-- name
        |   `-- mtty-2
        |       |-- available_instances
        |       |-- create
        |       |-- device_api
        |       |-- devices
        |       `-- name
        |-- mtty_dev
        |   `-- sample_mtty_dev
        |-- power
        |   |-- autosuspend_delay_ms
        |   |-- control
        |   |-- runtime_active_time
        |   |-- runtime_status
        |   `-- runtime_suspended_time
        |-- subsystem -> ../../../../class/mtty
        `-- uevent

2. Create a mediated device by using the dummy device that you created in the
   previous step::

     # echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" >	\
              /sys/devices/virtual/mtty/mtty/mdev_supported_types/mtty-2/create

3. Add parameters to qemu-kvm::

     -device vfio-pci,\
      sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001

4. Boot the VM.

   In the Linux guest VM, with no hardware on the host, the device appears
   as  follows::

     # lspci -s 00:05.0 -xxvv
     00:05.0 Serial controller: Device 4348:3253 (rev 10) (prog-if 02 [16550])
             Subsystem: Device 4348:3253
             Physical Slot: 5
             Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
     Stepping- SERR- FastB2B- DisINTx-
             Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
     <TAbort- <MAbort- >SERR- <PERR- INTx-
             Interrupt: pin A routed to IRQ 10
             Region 0: I/O ports at c150 [size=8]
             Region 1: I/O ports at c158 [size=8]
             Kernel driver in use: serial
     00: 48 43 53 32 01 00 00 02 10 02 00 07 00 00 00 00
     10: 51 c1 00 00 59 c1 00 00 00 00 00 00 00 00 00 00
     20: 00 00 00 00 00 00 00 00 00 00 00 00 48 43 53 32
     30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 00 00

     In the Linux guest VM, dmesg output for the device is as follows:

     serial 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10
     0000:00:05.0: ttyS1 at I/O 0xc150 (irq = 10) is a 16550A
     0000:00:05.0: ttyS2 at I/O 0xc158 (irq = 10) is a 16550A


5. In the Linux guest VM, check the serial ports::

     # setserial -g /dev/ttyS*
     /dev/ttyS0, UART: 16550A, Port: 0x03f8, IRQ: 4
     /dev/ttyS1, UART: 16550A, Port: 0xc150, IRQ: 10
     /dev/ttyS2, UART: 16550A, Port: 0xc158, IRQ: 10

6. Using minicom or any terminal emulation program, open port /dev/ttyS1 or
   /dev/ttyS2 with hardware flow control disabled.

7. Type data on the minicom terminal or send data to the terminal emulation
   program and read the data.

   Data is loop backed from hosts mtty driver.

8. Destroy the mediated device that you created::

     # echo 1 > /sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001/remove

References
==========

+60 −22
Original line number Diff line number Diff line
@@ -249,19 +249,21 @@ VFIO bus driver API

VFIO bus drivers, such as vfio-pci make use of only a few interfaces
into VFIO core.  When devices are bound and unbound to the driver,
the driver should call vfio_register_group_dev() and
vfio_unregister_group_dev() respectively::
Following interfaces are called when devices are bound to and
unbound from the driver::

	void vfio_init_group_dev(struct vfio_device *device,
				struct device *dev,
				const struct vfio_device_ops *ops);
	void vfio_uninit_group_dev(struct vfio_device *device);
	int vfio_register_group_dev(struct vfio_device *device);
	int vfio_register_emulated_iommu_dev(struct vfio_device *device);
	void vfio_unregister_group_dev(struct vfio_device *device);

The driver should embed the vfio_device in its own structure and call
vfio_init_group_dev() to pre-configure it before going to registration
and call vfio_uninit_group_dev() after completing the un-registration.
The driver should embed the vfio_device in its own structure and use
vfio_alloc_device() to allocate the structure, and can register
@init/@release callbacks to manage any private state wrapping the
vfio_device::

	vfio_alloc_device(dev_struct, member, dev, ops);
	void vfio_put_device(struct vfio_device *device);

vfio_register_group_dev() indicates to the core to begin tracking the
iommu_group of the specified dev and register the dev as owned by a VFIO bus
driver. Once vfio_register_group_dev() returns it is possible for userspace to
@@ -270,28 +272,64 @@ ready before calling it. The driver provides an ops structure for callbacks
similar to a file operations structure::

	struct vfio_device_ops {
		int	(*open)(struct vfio_device *vdev);
		char	*name;
		int	(*init)(struct vfio_device *vdev);
		void	(*release)(struct vfio_device *vdev);
		int	(*bind_iommufd)(struct vfio_device *vdev,
					struct iommufd_ctx *ictx, u32 *out_device_id);
		void	(*unbind_iommufd)(struct vfio_device *vdev);
		int	(*attach_ioas)(struct vfio_device *vdev, u32 *pt_id);
		int	(*open_device)(struct vfio_device *vdev);
		void	(*close_device)(struct vfio_device *vdev);
		ssize_t	(*read)(struct vfio_device *vdev, char __user *buf,
				size_t count, loff_t *ppos);
		ssize_t	(*write)(struct vfio_device *vdev,
				 const char __user *buf,
				 size_t size, loff_t *ppos);
		ssize_t	(*write)(struct vfio_device *vdev, const char __user *buf,
			 size_t count, loff_t *size);
		long	(*ioctl)(struct vfio_device *vdev, unsigned int cmd,
				 unsigned long arg);
		int	(*mmap)(struct vfio_device *vdev,
				struct vm_area_struct *vma);
		int	(*mmap)(struct vfio_device *vdev, struct vm_area_struct *vma);
		void	(*request)(struct vfio_device *vdev, unsigned int count);
		int	(*match)(struct vfio_device *vdev, char *buf);
		void	(*dma_unmap)(struct vfio_device *vdev, u64 iova, u64 length);
		int	(*device_feature)(struct vfio_device *device, u32 flags,
					  void __user *arg, size_t argsz);
	};

Each function is passed the vdev that was originally registered
in the vfio_register_group_dev() call above.  This allows the bus driver
to obtain its private data using container_of().  The open/release
callbacks are issued when a new file descriptor is created for a
device (via VFIO_GROUP_GET_DEVICE_FD).  The ioctl interface provides
a direct pass through for VFIO_DEVICE_* ioctls.  The read/write/mmap
interfaces implement the device region access defined by the device's
own VFIO_DEVICE_GET_REGION_INFO ioctl.
in the vfio_register_group_dev() or vfio_register_emulated_iommu_dev()
call above. This allows the bus driver to obtain its private data using
container_of().

::

	- The init/release callbacks are issued when vfio_device is initialized
	  and released.

	- The open/close device callbacks are issued when the first
	  instance of a file descriptor for the device is created (eg.
	  via VFIO_GROUP_GET_DEVICE_FD) for a user session.

	- The ioctl callback provides a direct pass through for some VFIO_DEVICE_*
	  ioctls.

	- The [un]bind_iommufd callbacks are issued when the device is bound to
	  and unbound from iommufd.

	- The attach_ioas callback is issued when the device is attached to an
	  IOAS managed by the bound iommufd. The attached IOAS is automatically
	  detached when the device is unbound from iommufd.

	- The read/write/mmap callbacks implement the device region access defined
	  by the device's own VFIO_DEVICE_GET_REGION_INFO ioctl.

	- The request callback is issued when device is going to be unregistered,
	  such as when trying to unbind the device from the vfio bus driver.

	- The dma_unmap callback is issued when a range of iovas are unmapped
	  in the container or IOAS attached by the device. Drivers which make
	  use of the vfio page pinning interface must implement this callback in
	  order to unpin pages within the dma_unmap range. Drivers must tolerate
	  this callback even before calls to open_device().

PPC64 sPAPR implementation note
-------------------------------
+0 −1
Original line number Diff line number Diff line
@@ -553,7 +553,6 @@ These are the steps:
   * ZCRYPT
   * S390_AP_IOMMU
   * VFIO
   * VFIO_MDEV
   * KVM

   If using make menuconfig select the following to build the vfio_ap module::
+0 −1
Original line number Diff line number Diff line
@@ -21882,7 +21882,6 @@ F: tools/testing/selftests/filesystems/fat/
VFIO DRIVER
M:	Alex Williamson <alex.williamson@redhat.com>
R:	Cornelia Huck <cohuck@redhat.com>
L:	kvm@vger.kernel.org
S:	Maintained
T:	git https://github.com/awilliam/linux-vfio.git
+6 −2
Original line number Diff line number Diff line
@@ -714,7 +714,9 @@ config EADM_SCH
config VFIO_CCW
	def_tristate n
	prompt "Support for VFIO-CCW subchannels"
	depends on S390_CCW_IOMMU && VFIO_MDEV
	depends on S390_CCW_IOMMU
	depends on VFIO
	select VFIO_MDEV
	help
	  This driver allows usage of I/O subchannels via VFIO-CCW.

@@ -724,8 +726,10 @@ config VFIO_CCW
config VFIO_AP
	def_tristate n
	prompt "VFIO support for AP devices"
	depends on S390_AP_IOMMU && VFIO_MDEV && KVM
	depends on S390_AP_IOMMU && KVM
	depends on VFIO
	depends on ZCRYPT
	select VFIO_MDEV
	help
	  This driver grants access to Adjunct Processor (AP) devices
	  via the VFIO mediated device interface.
Loading