Commit ec0e2dc8 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge tag 'vfio-v6.6-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - VFIO direct character device (cdev) interface support. This extracts
   the vfio device fd from the container and group model, and is
   intended to be the native uAPI for use with IOMMUFD (Yi Liu)

 - Enhancements to the PCI hot reset interface in support of cdev usage
   (Yi Liu)

 - Fix a potential race between registering and unregistering vfio files
   in the kvm-vfio interface and extend use of a lock to avoid extra
   drop and acquires (Dmitry Torokhov)

 - A new vfio-pci variant driver for the AMD/Pensando Distributed
   Services Card (PDS) Ethernet device, supporting live migration (Brett
   Creeley)

 - Cleanups to remove redundant owner setup in cdx and fsl bus drivers,
   and simplify driver init/exit in fsl code (Li Zetao)

 - Fix uninitialized hole in data structure and pad capability
   structures for alignment (Stefan Hajnoczi)

* tag 'vfio-v6.6-rc1' of https://github.com/awilliam/linux-vfio: (53 commits)
  vfio/pds: Send type for SUSPEND_STATUS command
  vfio/pds: fix return value in pds_vfio_get_lm_file()
  pds_core: Fix function header descriptions
  vfio: align capability structures
  vfio/type1: fix cap_migration information leak
  vfio/fsl-mc: Use module_fsl_mc_driver macro to simplify the code
  vfio/cdx: Remove redundant initialization owner in vfio_cdx_driver
  vfio/pds: Add Kconfig and documentation
  vfio/pds: Add support for firmware recovery
  vfio/pds: Add support for dirty page tracking
  vfio/pds: Add VFIO live migration support
  vfio/pds: register with the pds_core PF
  pds_core: Require callers of register/unregister to pass PF drvdata
  vfio/pds: Initial support for pds VFIO driver
  vfio: Commonize combine_ranges for use in other VFIO drivers
  kvm/vfio: avoid bouncing the mutex when adding and deleting groups
  kvm/vfio: ensure kvg instance stays around in kvm_vfio_group_add()
  docs: vfio: Add vfio device cdev description
  vfio: Compile vfio_group infrastructure optionally
  vfio: Move the IOMMU_CAP_CACHE_COHERENCY check in __vfio_register_dev()
  ...
parents b6f6167e 642265e2
Loading
Loading
Loading
Loading
+144 −3
Original line number Diff line number Diff line
@@ -239,6 +239,137 @@ group and can access them as follows::
	/* Gratuitous device reset and go... */
	ioctl(device, VFIO_DEVICE_RESET);

IOMMUFD and vfio_iommu_type1
----------------------------

IOMMUFD is the new user API to manage I/O page tables from userspace.
It intends to be the portal of delivering advanced userspace DMA
features (nested translation [5]_, PASID [6]_, etc.) while also providing
a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use
cases.  Eventually the vfio_iommu_type1 driver, as well as the legacy
vfio container and group model is intended to be deprecated.

The IOMMUFD backwards compatibility interface can be enabled two ways.
In the first method, the kernel can be configured with
CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem
transparently provides the entire infrastructure for the VFIO
container and IOMMU backend interfaces.  The compatibility mode can
also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is
simply symlink'd to /dev/iommu.  Note that at the time of writing, the
compatibility mode is not entirely feature complete relative to
VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to
provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface.  Therefore
it is not generally advisable at this time to switch from native VFIO
implementations to the IOMMUFD compatibility interfaces.

Long term, VFIO users should migrate to device access through the cdev
interface described below, and native access through the IOMMUFD
provided interfaces.

VFIO Device cdev
----------------

Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
in a VFIO group.

With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
by directly opening a character device /dev/vfio/devices/vfioX where
"X" is the number allocated uniquely by VFIO for registered devices.
cdev interface does not support noiommu devices, so user should use
the legacy group interface if noiommu is wanted.

The cdev only works with IOMMUFD.  Both VFIO drivers and applications
must adapt to the new cdev security model which requires using
VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
actually use the device.  Once BIND succeeds then a VFIO device can
be fully accessed by the user.

VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
Hence those modules can be fully compiled out in an environment
where no legacy VFIO application exists.

So far SPAPR does not support IOMMUFD yet.  So it cannot support device
cdev either.

vfio device cdev access is still bound by IOMMU group semantics, ie. there
can be only one DMA owner for the group.  Devices belonging to the same
group can not be bound to multiple iommufd_ctx or shared between native
kernel and vfio bus driver or other driver supporting the driver_managed_dma
flag.  A violation of this ownership requirement will fail at the
VFIO_DEVICE_BIND_IOMMUFD ioctl, which gates full device access.

Device cdev Example
-------------------

Assume user wants to access PCI device 0000:6a:01.0::

	$ ls /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/
	vfio0

This device is therefore represented as vfio0.  The user can verify
its existence::

	$ ls -l /dev/vfio/devices/vfio0
	crw------- 1 root root 511, 0 Feb 16 01:22 /dev/vfio/devices/vfio0
	$ cat /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/vfio0/dev
	511:0
	$ ls -l /dev/char/511\:0
	lrwxrwxrwx 1 root root 21 Feb 16 01:22 /dev/char/511:0 -> ../vfio/devices/vfio0

Then provide the user with access to the device if unprivileged
operation is desired::

	$ chown user:user /dev/vfio/devices/vfio0

Finally the user could get cdev fd by::

	cdev_fd = open("/dev/vfio/devices/vfio0", O_RDWR);

An opened cdev_fd doesn't give the user any permission of accessing
the device except binding the cdev_fd to an iommufd.  After that point
then the device is fully accessible including attaching it to an
IOMMUFD IOAS/HWPT to enable userspace DMA::

	struct vfio_device_bind_iommufd bind = {
		.argsz = sizeof(bind),
		.flags = 0,
	};
	struct iommu_ioas_alloc alloc_data  = {
		.size = sizeof(alloc_data),
		.flags = 0,
	};
	struct vfio_device_attach_iommufd_pt attach_data = {
		.argsz = sizeof(attach_data),
		.flags = 0,
	};
	struct iommu_ioas_map map = {
		.size = sizeof(map),
		.flags = IOMMU_IOAS_MAP_READABLE |
			 IOMMU_IOAS_MAP_WRITEABLE |
			 IOMMU_IOAS_MAP_FIXED_IOVA,
		.__reserved = 0,
	};

	iommufd = open("/dev/iommu", O_RDWR);

	bind.iommufd = iommufd;
	ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind);

	ioctl(iommufd, IOMMU_IOAS_ALLOC, &alloc_data);
	attach_data.pt_id = alloc_data.out_ioas_id;
	ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data);

	/* Allocate some space and setup a DMA mapping */
	map.user_va = (int64_t)mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE,
				    MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
	map.iova = 0; /* 1MB starting at 0x0 from device view */
	map.length = 1024 * 1024;
	map.ioas_id = alloc_data.out_ioas_id;;

	ioctl(iommufd, IOMMU_IOAS_MAP, &map);

	/* Other device operations as stated in "VFIO Usage Example" */

VFIO User API
-------------------------------------------------------------------------------

@@ -279,6 +410,7 @@ similar to a file operations structure::
					struct iommufd_ctx *ictx, u32 *out_device_id);
		void	(*unbind_iommufd)(struct vfio_device *vdev);
		int	(*attach_ioas)(struct vfio_device *vdev, u32 *pt_id);
		void	(*detach_ioas)(struct vfio_device *vdev);
		int	(*open_device)(struct vfio_device *vdev);
		void	(*close_device)(struct vfio_device *vdev);
		ssize_t	(*read)(struct vfio_device *vdev, char __user *buf,
@@ -315,9 +447,10 @@ container_of().
	- The [un]bind_iommufd callbacks are issued when the device is bound to
	  and unbound from iommufd.

	- The attach_ioas callback is issued when the device is attached to an
	  IOAS managed by the bound iommufd. The attached IOAS is automatically
	  detached when the device is unbound from iommufd.
	- The [de]attach_ioas callback is issued when the device is attached to
	  and detached from an IOAS managed by the bound iommufd. However, the
	  attached IOAS can also be automatically detached when the device is
	  unbound from iommufd.

	- The read/write/mmap callbacks implement the device region access defined
	  by the device's own VFIO_DEVICE_GET_REGION_INFO ioctl.
@@ -564,3 +697,11 @@ This implementation has some specifics:
				\-0d.1

	00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)

.. [5] Nested translation is an IOMMU feature which supports two stage
   address translations.  This improves the address translation efficiency
   in IOMMU virtualization.

.. [6] PASID stands for Process Address Space ID, introduced by PCI
   Express.  It is a prerequisite for Shared Virtual Addressing (SVA)
   and Scalable I/O Virtualization (Scalable IOV).
+79 −0
Original line number Diff line number Diff line
.. SPDX-License-Identifier: GPL-2.0+
.. note: can be edited and viewed with /usr/bin/formiko-vim

==========================================================
PCI VFIO driver for the AMD/Pensando(R) DSC adapter family
==========================================================

AMD/Pensando Linux VFIO PCI Device Driver
Copyright(c) 2023 Advanced Micro Devices, Inc.

Overview
========

The ``pds-vfio-pci`` module is a PCI driver that supports Live Migration
capable Virtual Function (VF) devices in the DSC hardware.

Using the device
================

The pds-vfio-pci device is enabled via multiple configuration steps and
depends on the ``pds_core`` driver to create and enable SR-IOV Virtual
Function devices.

Shown below are the steps to bind the driver to a VF and also to the
associated auxiliary device created by the ``pds_core`` driver. This
example assumes the pds_core and pds-vfio-pci modules are already
loaded.

.. code-block:: bash
  :name: example-setup-script

  #!/bin/bash

  PF_BUS="0000:60"
  PF_BDF="0000:60:00.0"
  VF_BDF="0000:60:00.1"

  # Prevent non-vfio VF driver from probing the VF device
  echo 0 > /sys/class/pci_bus/$PF_BUS/device/$PF_BDF/sriov_drivers_autoprobe

  # Create single VF for Live Migration via pds_core
  echo 1 > /sys/bus/pci/drivers/pds_core/$PF_BDF/sriov_numvfs

  # Allow the VF to be bound to the pds-vfio-pci driver
  echo "pds-vfio-pci" > /sys/class/pci_bus/$PF_BUS/device/$VF_BDF/driver_override

  # Bind the VF to the pds-vfio-pci driver
  echo "$VF_BDF" > /sys/bus/pci/drivers/pds-vfio-pci/bind

After performing the steps above, a file in /dev/vfio/<iommu_group>
should have been created.


Enabling the driver
===================

The driver is enabled via the standard kernel configuration system,
using the make command::

  make oldconfig/menuconfig/etc.

The driver is located in the menu structure at:

  -> Device Drivers
    -> VFIO Non-Privileged userspace driver framework
      -> VFIO support for PDS PCI devices

Support
=======

For general Linux networking support, please use the netdev mailing
list, which is monitored by Pensando personnel::

  netdev@vger.kernel.org

For more specific support needs, please use the Pensando driver support
email::

  drivers@pensando.io
+1 −0
Original line number Diff line number Diff line
@@ -16,6 +16,7 @@ Contents:
   altera/altera_tse
   amd/pds_core
   amd/pds_vdpa
   amd/pds_vfio_pci
   aquantia/atlantic
   chelsio/cxgb
   cirrus/cs89x0
+31 −16
Original line number Diff line number Diff line
@@ -9,22 +9,34 @@ Device types supported:
  - KVM_DEV_TYPE_VFIO

Only one VFIO instance may be created per VM.  The created device
tracks VFIO groups in use by the VM and features of those groups
important to the correctness and acceleration of the VM.  As groups
are enabled and disabled for use by the VM, KVM should be updated
about their presence.  When registered with KVM, a reference to the
VFIO-group is held by KVM.
tracks VFIO files (group or device) in use by the VM and features
of those groups/devices important to the correctness and acceleration
of the VM.  As groups/devices are enabled and disabled for use by the
VM, KVM should be updated about their presence.  When registered with
KVM, a reference to the VFIO file is held by KVM.

Groups:
  KVM_DEV_VFIO_GROUP

KVM_DEV_VFIO_GROUP attributes:
  KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
	kvm_device_attr.addr points to an int32_t file descriptor
	for the VFIO group.
  KVM_DEV_VFIO_GROUP_DEL: Remove a VFIO group from VFIO-KVM device tracking
	kvm_device_attr.addr points to an int32_t file descriptor
	for the VFIO group.
  KVM_DEV_VFIO_FILE
	alias: KVM_DEV_VFIO_GROUP

KVM_DEV_VFIO_FILE attributes:
  KVM_DEV_VFIO_FILE_ADD: Add a VFIO file (group/device) to VFIO-KVM device
	tracking

	kvm_device_attr.addr points to an int32_t file descriptor for the
	VFIO file.

  KVM_DEV_VFIO_FILE_DEL: Remove a VFIO file (group/device) from VFIO-KVM
	device tracking

	kvm_device_attr.addr points to an int32_t file descriptor for the
	VFIO file.

KVM_DEV_VFIO_GROUP (legacy kvm device group restricted to the handling of VFIO group fd):
  KVM_DEV_VFIO_GROUP_ADD: same as KVM_DEV_VFIO_FILE_ADD for group fd only

  KVM_DEV_VFIO_GROUP_DEL: same as KVM_DEV_VFIO_FILE_DEL for group fd only

  KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE: attaches a guest visible TCE table
	allocated by sPAPR KVM.
	kvm_device_attr.addr points to a struct::
@@ -40,7 +52,10 @@ KVM_DEV_VFIO_GROUP attributes:
	- @tablefd is a file descriptor for a TCE table allocated via
	  KVM_CREATE_SPAPR_TCE.

The GROUP_ADD operation above should be invoked prior to accessing the
The FILE/GROUP_ADD operation above should be invoked prior to accessing the
device file descriptor via VFIO_GROUP_GET_DEVICE_FD in order to support
drivers which require a kvm pointer to be set in their .open_device()
callback.
callback.  It is the same for device file descriptor via character device
open which gets device access via VFIO_DEVICE_BIND_IOMMUFD.  For such file
descriptors, FILE_ADD should be invoked before VFIO_DEVICE_BIND_IOMMUFD
to support the drivers mentioned in prior sentence as well.
+7 −0
Original line number Diff line number Diff line
@@ -22482,6 +22482,13 @@ S: Maintained
P:	Documentation/driver-api/vfio-pci-device-specific-driver-acceptance.rst
F:	drivers/vfio/pci/*/
VFIO PDS PCI DRIVER
M:	Brett Creeley <brett.creeley@amd.com>
L:	kvm@vger.kernel.org
S:	Maintained
F:	Documentation/networking/device_drivers/ethernet/amd/pds_vfio_pci.rst
F:	drivers/vfio/pci/pds/
VFIO PLATFORM DRIVER
M:	Eric Auger <eric.auger@redhat.com>
L:	kvm@vger.kernel.org
Loading