Commit 0227a749 authored by Zhang Zekun's avatar Zhang Zekun
Browse files

iommu/iova: increase the iova_rcache depot max size to 128

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7ASVH


CVE: NA

---------------------------------------

In fio test with iodepth=256 with allowd cpus to 0-255, we observe a
serve performance decrease. The statistic of cache hit rate are
relatively low. Here are some statistics about the iova_cpu_rcahe of
all cpus:

iova alloc order		0	1	2	3	4	5
----------------------------------------------------------------------
average cpu_rcache hit rate	0.9941	0.7408	0.8109	0.8854	0.9082	0.8887

Jobs: 12 (f=12): [R(12)][20.0%][r=1091MiB/s][r=279k IOPS][eta 00m:28s]
Jobs: 12 (f=12): [R(12)][22.2%][r=1426MiB/s][r=365k IOPS][eta 00m:28s]
Jobs: 12 (f=12): [R(12)][25.0%][r=1607MiB/s][r=411k IOPS][eta 00m:27s]
Jobs: 12 (f=12): [R(12)][27.8%][r=1501MiB/s][r=384k IOPS][eta 00m:26s]
Jobs: 12 (f=12): [R(12)][30.6%][r=1486MiB/s][r=380k IOPS][eta 00m:25s]
Jobs: 12 (f=12): [R(12)][33.3%][r=1393MiB/s][r=357k IOPS][eta 00m:24s]
Jobs: 12 (f=12): [R(12)][36.1%][r=1550MiB/s][r=397k IOPS][eta 00m:23s]
Jobs: 12 (f=12): [R(12)][38.9%][r=1485MiB/s][r=380k IOPS][eta 00m:22s]

The under lying hisi sas driver has 16 thread irqs to free iova, but
these irq call back function will only free iovas on 16 certain cpus(cpu{0,
16,32...,240}). For example, thread irq which smp affinity is 0-15, will
only free iova on cpu 0. However, the driver will alloc iova on all
cpus(cpu{0-255}), cpus without free iova in local cpu_rcache need to get
free iovas from iova_rcache->depot. The current size of
iova_rcache->depot max size is 32, and it seems to be too small for 256
users (16 cpus will put iovas to iova_rcache->depot and 240 cpus will
try to get iova from it). Set iova_rcache->depot to 128 can fix the
performance issue, and the performance can return to normal.

iova alloc order		0	1	2	3	4	5
----------------------------------------------------------------------
average cpu_rcache hit rate	0.9925	0.9736	0.9789	0.9867	0.9889	0.9906

Jobs: 12 (f=12): [R(12)][12.9%][r=7526MiB/s][r=1927k IOPS][eta 04m:30s]
Jobs: 12 (f=12): [R(12)][13.2%][r=7527MiB/s][r=1927k IOPS][eta 04m:29s]
Jobs: 12 (f=12): [R(12)][13.5%][r=7529MiB/s][r=1927k IOPS][eta 04m:28s]
Jobs: 12 (f=12): [R(12)][13.9%][r=7531MiB/s][r=1928k IOPS][eta 04m:27s]
Jobs: 12 (f=12): [R(12)][14.2%][r=7529MiB/s][r=1928k IOPS][eta 04m:26s]
Jobs: 12 (f=12): [R(12)][14.5%][r=7528MiB/s][r=1927k IOPS][eta 04m:25s]
Jobs: 12 (f=12): [R(12)][14.8%][r=7527MiB/s][r=1927k IOPS][eta 04m:24s]
Jobs: 12 (f=12): [R(12)][15.2%][r=7525MiB/s][r=1926k IOPS][eta 04m:23s]

Signed-off-by: default avatarZhang Zekun <zhangzekun11@huawei.com>
parent 7ac36644
Loading
Loading
Loading
Loading
+10 −0
Original line number Diff line number Diff line
@@ -437,5 +437,15 @@ config SMMU_BYPASS_DEV

	  This feature will be replaced by ACPI IORT RMR node, which will be
	  upstreamed in mainline.
config IOVA_MAX_GLOBAL_MAGS
	int "Set the max iova global magzines in iova rcache"
	range 16 2048
	default "32"
	help
	  Iova rcache global magizine is shared among every cpu. The size of
	  it can be a bottle neck when lots of cpus are contending to use it.
	  If you are suffering from the speed of allocing iova with more than
	  128 cpus, try to tune this config larger.


endif # IOMMU_SUPPORT
+1 −1
Original line number Diff line number Diff line
@@ -26,7 +26,7 @@ struct iova_magazine;
struct iova_cpu_rcache;

#define IOVA_RANGE_CACHE_MAX_SIZE 6	/* log of max cached IOVA range size (in pages) */
#define MAX_GLOBAL_MAGS 32	/* magazines per bin */
#define MAX_GLOBAL_MAGS CONFIG_IOVA_MAX_GLOBAL_MAGS	/* magazines per bin */

struct iova_rcache {
	spinlock_t lock;