Unverified Commit 4ab3abdd authored by openeuler-ci-bot's avatar openeuler-ci-bot Committed by Gitee
Browse files

!223 SPR: IDXD driver (on top of OLK-5.10) - DSA/IAA incremental backporting...

!223 SPR: IDXD driver (on top of OLK-5.10) - DSA/IAA incremental backporting patches until upstream 6.1

Merge Pull Request from: @xiaochenshen 
 
 **IDXD kernel driver:** 
IDXD driver is the common driver framework of Intel Data Stream Accelerator (DSA) and Intel In-memory Analytics Accelerator (IAA). This patchset covers the incremental backporting kernel patches until upstream 6.1. It fixes issues:
1. https://gitee.com/openeuler/intel-kernel/issues/I596WO 
2. https://gitee.com/openeuler/intel-kernel/issues/I590PB

 **DSA – Intel Data Streaming Accelerator:** 
Intel DSA is a high-performance data copy and transformation accelerator that is integrated in Intel Sapphire Rapids (SPR) processors, targeted for optimizing streaming data movement and transformation operations common with applications for high-performance storage, networking, persistent memory, and various data processing applications. See more details in DSA spec:
https://software.intel.com/content/www/us/en/develop/articles/intel-data-streaming-accelerator-architecture-specification.html

 **IAA - Intel In-memory Analytics Accelerator:** 
Intel In-memory Analytics Accelerator is the integrated accelerator that accelerates analytics primitives (scan, filter, etc.), CRC calculations, compression, decompression, and more on Intel Sapphire Rapids (SPR) processors. See more details in IAA spec:
https://cdrdv2.intel.com/v1/dl/getContent/721858

 **There are 173 patches in total in this patch set. It covers:** 
1. IDXD driver incremental patches between 5.10 LTS and upstream 6.1 (Shared WQ, SVM, IAA, driver refactoring and bug fixes).
2. ENQCMD and PASID re-enabling patches (as dependencies of IDXD driver)
3. Other dependencies in IOMMU driver.
4. kABI fixes for OpenEuler.
5. Enable necessary kernel configs in openeuler_defconfig.

 **Passed tests:** 
1. Unit tests: passed
- accel-config test
- accel-config/test dsa_user_test_runner.sh
- accel-config/test iaa_user_test_runner.sh
- Kernel dmatest test (SVA disabled: "modprobe idxd sva=0")
- Intel internal DSA config test suite (dsa_config_bat_tests, dsa_config_func_tests)
- Intel internal IAX config test suite (iax_config_bat_tests, iax_config_func_tests)
3. Build successfully.
4. Boot test: passed.

 **Kernel config changes against default:**
```
@@ -6381,7 +6381,11 @@ CONFIG_DMA_VIRTUAL_CHANNELS=y
 CONFIG_DMA_ACPI=y
 # CONFIG_ALTERA_MSGDMA is not set
 CONFIG_INTEL_IDMA64=m
+CONFIG_INTEL_IDXD_BUS=m
 CONFIG_INTEL_IDXD=m
+# CONFIG_INTEL_IDXD_COMPAT is not set
+CONFIG_INTEL_IDXD_SVM=y
+CONFIG_INTEL_IDXD_PERFMON=y
 CONFIG_INTEL_IOATDMA=m
 # CONFIG_PLX_DMA is not set
 # CONFIG_QCOM_HIDMA_MGMT is not set
@@ -6632,11 +6636,12 @@ CONFIG_IOMMU_SUPPORT=y
 # CONFIG_IOMMU_DEBUGFS is not set
 CONFIG_IOMMU_DEFAULT_PASSTHROUGH=y
 CONFIG_IOMMU_DMA=y
+CONFIG_IOMMU_SVA=y
 CONFIG_AMD_IOMMU=y
 CONFIG_AMD_IOMMU_V2=m
 CONFIG_DMAR_TABLE=y
 CONFIG_INTEL_IOMMU=y
-# CONFIG_INTEL_IOMMU_SVM is not set
+CONFIG_INTEL_IOMMU_SVM=y
 # CONFIG_INTEL_IOMMU_DEFAULT_ON is not set
 CONFIG_INTEL_IOMMU_FLOPPY_WA=y
 # CONFIG_INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON is not set
```

 **Kernel command line to enable intel iommu scalable mode (in grub.cfg):**
```
intel_iommu=on,sm_on
``` 
 
Link:https://gitee.com/openeuler/kernel/pulls/223

 
Reviewed-by: default avatarZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: default avatarChen Wei <chenwei@xfusion.com>
Reviewed-by: default avatarLiu Chao <liuchao173@huawei.com>
Reviewed-by: default avatarJun Tian <jun.j.tian@intel.com>
Signed-off-by: default avatarZheng Zengkai <zhengzengkai@huawei.com>
parents c5a37a37 92762229
Loading
Loading
Loading
Loading
+118 −11
Original line number Diff line number Diff line
@@ -22,6 +22,7 @@ Date: Oct 25, 2019
KernelVersion:  5.6.0
Contact:        dmaengine@vger.kernel.org
Description:    The largest number of work descriptors in a batch.
                It's not visible when the device does not support batch.

What:           /sys/bus/dsa/devices/dsa<m>/max_work_queues_size
Date:           Oct 25, 2019
@@ -41,14 +42,16 @@ KernelVersion: 5.6.0
Contact:        dmaengine@vger.kernel.org
Description:    The maximum number of groups can be created under this device.

What:           /sys/bus/dsa/devices/dsa<m>/max_tokens
Date:           Oct 25, 2019
KernelVersion:  5.6.0
What:           /sys/bus/dsa/devices/dsa<m>/max_read_buffers
Date:           Dec 10, 2021
KernelVersion:  5.17.0
Contact:        dmaengine@vger.kernel.org
Description:    The total number of bandwidth tokens supported by this device.
		The bandwidth tokens represent resources within the DSA
Description:    The total number of read buffers supported by this device.
		The read buffers represent resources within the DSA
		implementation, and these resources are allocated by engines to
		support operations.
		support operations. See DSA spec v1.2 9.2.4 Total Read Buffers.
		It's not visible when the device does not support Read Buffer
		allocation control.

What:           /sys/bus/dsa/devices/dsa<m>/max_transfer_size
Date:           Oct 25, 2019
@@ -77,6 +80,13 @@ Contact: dmaengine@vger.kernel.org
Description:    The operation capability bit mask specify the operation types
		supported by the this device.

What:		/sys/bus/dsa/devices/dsa<m>/pasid_enabled
Date:		Oct 27, 2020
KernelVersion:	5.11.0
Contact:	dmaengine@vger.kernel.org
Description:	To indicate if PASID (process address space identifier) is
		enabled or not for this device.

What:           /sys/bus/dsa/devices/dsa<m>/state
Date:           Oct 25, 2019
KernelVersion:  5.6.0
@@ -108,19 +118,30 @@ KernelVersion: 5.6.0
Contact:        dmaengine@vger.kernel.org
Description:    To indicate if this device is configurable or not.

What:           /sys/bus/dsa/devices/dsa<m>/token_limit
Date:           Oct 25, 2019
KernelVersion:  5.6.0
What:           /sys/bus/dsa/devices/dsa<m>/read_buffer_limit
Date:           Dec 10, 2021
KernelVersion:  5.17.0
Contact:        dmaengine@vger.kernel.org
Description:    The maximum number of bandwidth tokens that may be in use at
Description:    The maximum number of read buffers that may be in use at
		one time by operations that access low bandwidth memory in the
		device.
		device. See DSA spec v1.2 9.2.8 GENCFG on Global Read Buffer Limit.
		It's not visible when the device does not support Read Buffer
		allocation control.

What:		/sys/bus/dsa/devices/dsa<m>/cmd_status
Date:		Aug 28, 2020
KernelVersion:	5.10.0
Contact:	dmaengine@vger.kernel.org
Description:	The last executed device administrative command's status/error.
		Also last configuration error overloaded.
		Writing to it will clear the status.

What:		/sys/bus/dsa/devices/wq<m>.<n>/block_on_fault
Date:		Oct 27, 2020
KernelVersion:	5.11.0
Contact:	dmaengine@vger.kernel.org
Description:	To indicate block on fault is allowed or not for the work queue
		to support on demand paging.

What:           /sys/bus/dsa/devices/wq<m>.<n>/group_id
Date:           Oct 25, 2019
@@ -189,9 +210,95 @@ KernelVersion: 5.10.0
Contact:	dmaengine@vger.kernel.org
Description:	The max batch size for this workqueue. Cannot exceed device
		max batch size. Configurable parameter.
		It's not visible when the device does not support batch.

What:		/sys/bus/dsa/devices/wq<m>.<n>/ats_disable
Date:		Nov 13, 2020
KernelVersion:	5.11.0
Contact:	dmaengine@vger.kernel.org
Description:	Indicate whether ATS disable is turned on for the workqueue.
		0 indicates ATS is on, and 1 indicates ATS is off for the workqueue.

What:		/sys/bus/dsa/devices/wq<m>.<n>/occupancy
Date		May 25, 2021
KernelVersion:	5.14.0
Contact:	dmaengine@vger.kernel.org
Description:	Show the current number of entries in this WQ if WQ Occupancy
		Support bit WQ capabilities is 1.

What:		/sys/bus/dsa/devices/wq<m>.<n>/enqcmds_retries
Date		Oct 29, 2021
KernelVersion:	5.17.0
Contact:	dmaengine@vger.kernel.org
Description:	Indicate the number of retires for an enqcmds submission on a sharedwq.
		A max value to set attribute is capped at 64.

What:		/sys/bus/dsa/devices/wq<m>.<n>/op_config
Date:		Sept 14, 2022
KernelVersion:	6.0.0
Contact:	dmaengine@vger.kernel.org
Description:	Shows the operation capability bits displayed in bitmap format
		presented by %*pb printk() output format specifier.
		The attribute can be configured when the WQ is disabled in
		order to configure the WQ to accept specific bits that
		correlates to the operations allowed. It's visible only
		on platforms that support the capability.

What:           /sys/bus/dsa/devices/engine<m>.<n>/group_id
Date:           Oct 25, 2019
KernelVersion:  5.6.0
Contact:        dmaengine@vger.kernel.org
Description:    The group that this engine belongs to.

What:		/sys/bus/dsa/devices/group<m>.<n>/use_read_buffer_limit
Date:		Dec 10, 2021
KernelVersion:	5.17.0
Contact:	dmaengine@vger.kernel.org
Description:	Enable the use of global read buffer limit for the group. See DSA
		spec v1.2 9.2.18 GRPCFG Use Global Read Buffer Limit.
		It's not visible when the device does not support Read Buffer
		allocation control.

What:		/sys/bus/dsa/devices/group<m>.<n>/read_buffers_allowed
Date:		Dec 10, 2021
KernelVersion:	5.17.0
Contact:	dmaengine@vger.kernel.org
Description:	Indicates max number of read buffers that may be in use at one time
		by all engines in the group. See DSA spec v1.2 9.2.18 GRPCFG Read
		Buffers Allowed.
		It's not visible when the device does not support Read Buffer
		allocation control.

What:		/sys/bus/dsa/devices/group<m>.<n>/read_buffers_reserved
Date:		Dec 10, 2021
KernelVersion:	5.17.0
Contact:	dmaengine@vger.kernel.org
Description:	Indicates the number of Read Buffers reserved for the use of
		engines in the group. See DSA spec v1.2 9.2.18 GRPCFG Read Buffers
		Reserved.
		It's not visible when the device does not support Read Buffer
		allocation control.

What:		/sys/bus/dsa/devices/group<m>.<n>/desc_progress_limit
Date:		Sept 14, 2022
KernelVersion:	6.0.0
Contact:	dmaengine@vger.kernel.org
Description:	Allows control of the number of work descriptors that can be
		concurrently processed by an engine in the group as a fraction
		of the Maximum Work Descriptors in Progress value specified in
		the ENGCAP register. The acceptable values are 0 (default),
		1 (1/2 of max value), 2 (1/4 of the max value), and 3 (1/8 of
		the max value). It's visible only on platforms that support
		the capability.

What:		/sys/bus/dsa/devices/group<m>.<n>/batch_progress_limit
Date:		Sept 14, 2022
KernelVersion:	6.0.0
Contact:	dmaengine@vger.kernel.org
Description:	Allows control of the number of batch descriptors that can be
		concurrently processed by an engine in the group as a fraction
		of the Maximum Batch Descriptors in Progress value specified in
		the ENGCAP register. The acceptable values are 0 (default),
		1 (1/2 of max value), 2 (1/4 of the max value), and 3 (1/8 of
		the max value). It's visible only on platforms that support
		the capability.
+30 −0
Original line number Diff line number Diff line
What:		/sys/bus/event_source/devices/dsa*/format
Date:		April 2021
KernelVersion:  5.13
Contact:	Tom Zanussi <tom.zanussi@linux.intel.com>
Description:	Read-only.  Attribute group to describe the magic bits
		that go into perf_event_attr.config or
		perf_event_attr.config1 for the IDXD DSA pmu.  (See also
		ABI/testing/sysfs-bus-event_source-devices-format).

		Each attribute in this group defines a bit range in
		perf_event_attr.config or perf_event_attr.config1.
		All supported attributes are listed below (See the
		IDXD DSA Spec for possible attribute values)::

		    event_category = "config:0-3"    - event category
		    event          = "config:4-31"   - event ID

		    filter_wq      = "config1:0-31"  - workqueue filter
		    filter_tc      = "config1:32-39" - traffic class filter
		    filter_pgsz    = "config1:40-43" - page size filter
		    filter_sz      = "config1:44-51" - transfer size filter
		    filter_eng     = "config1:52-59" - engine filter

What:		/sys/bus/event_source/devices/dsa*/cpumask
Date:		April 2021
KernelVersion:  5.13
Contact:	Tom Zanussi <tom.zanussi@linux.intel.com>
Description:    Read-only.  This file always returns the cpu to which the
                IDXD DSA pmu is bound for access to all dsa pmu
		performance monitoring events.
+11 −0
Original line number Diff line number Diff line
@@ -1747,6 +1747,17 @@
			In such case C2/C3 won't be used again.
			idle=nomwait: Disable mwait for CPU C-states

	idxd.sva=	[HW]
			Format: <bool>
			Allow force disabling of Shared Virtual Memory (SVA)
			support for the idxd driver. By default it is set to
			true (1).

	idxd.tc_override= [HW]
			Format: <bool>
			Allow override of default traffic class configuration
			for the device. By default it is set to false (0).

	ieee754=	[MIPS] Select IEEE Std 754 conformance mode
			Format: { strict | legacy | 2008 | relaxed }
			Default: strict
+41 −12
Original line number Diff line number Diff line
@@ -104,18 +104,47 @@ The MSR must be configured on each logical CPU before any application
thread can interact with a device. Threads that belong to the same
process share the same page tables, thus the same MSR value.

PASID is cleared when a process is created. The PASID allocation and MSR
programming may occur long after a process and its threads have been created.
One thread must call iommu_sva_bind_device() to allocate the PASID for the
process. If a thread uses ENQCMD without the MSR first being populated, a #GP
will be raised. The kernel will update the PASID MSR with the PASID for all
threads in the process. A single process PASID can be used simultaneously
with multiple devices since they all share the same address space.

One thread can call iommu_sva_unbind_device() to free the allocated PASID.
The kernel will clear the PASID MSR for all threads belonging to the process.

New threads inherit the MSR value from the parent.
PASID Life Cycle Management
===========================

PASID is initialized as INVALID_IOASID (-1) when a process is created.

Only processes that access SVA-capable devices need to have a PASID
allocated. This allocation happens when a process opens/binds an SVA-capable
device but finds no PASID for this process. Subsequent binds of the same, or
other devices will share the same PASID.

Although the PASID is allocated to the process by opening a device,
it is not active in any of the threads of that process. It's loaded to the
IA32_PASID MSR lazily when a thread tries to submit a work descriptor
to a device using the ENQCMD.

That first access will trigger a #GP fault because the IA32_PASID MSR
has not been initialized with the PASID value assigned to the process
when the device was opened. The Linux #GP handler notes that a PASID has
been allocated for the process, and so initializes the IA32_PASID MSR
and returns so that the ENQCMD instruction is re-executed.

On fork(2) or exec(2) the PASID is removed from the process as it no
longer has the same address space that it had when the device was opened.

On clone(2) the new task shares the same address space, so will be
able to use the PASID allocated to the process. The IA32_PASID is not
preemptively initialized as the PASID value might not be allocated yet or
the kernel does not know whether this thread is going to access the device
and the cleared IA32_PASID MSR reduces context switch overhead by xstate
init optimization. Since #GP faults have to be handled on any threads that
were created before the PASID was assigned to the mm of the process, newly
created threads might as well be treated in a consistent way.

Due to complexity of freeing the PASID and clearing all IA32_PASID MSRs in
all threads in unbind, free the PASID lazily only on mm exit.

If a process does a close(2) of the device file descriptor and munmap(2)
of the device MMIO portal, then the driver will unbind the device. The
PASID is still marked VALID in the PASID_MSR for any threads in the
process that accessed the device. But this is harmless as without the
MMIO portal they cannot submit new work to the device.

Relationships
=============
+2 −1
Original line number Diff line number Diff line
@@ -8949,7 +8949,8 @@ S: Supported
Q:	https://patchwork.kernel.org/project/linux-dmaengine/list/
F:	drivers/dma/ioat*
INTEL IADX DRIVER
INTEL IDXD DRIVER
M:	Fenghua Yu <fenghua.yu@intel.com>
M:	Dave Jiang <dave.jiang@intel.com>
L:	dmaengine@vger.kernel.org
S:	Supported
Loading