Commit 13150149 authored by Ard Biesheuvel's avatar Ard Biesheuvel Committed by Catalin Marinas
Browse files

arm64: fpsimd: run kernel mode NEON with softirqs disabled



Kernel mode NEON can be used in task or softirq context, but only in
a non-nesting manner, i.e., softirq context is only permitted if the
interrupt was not taken at a point where the kernel was using the NEON
in task context.

This means all users of kernel mode NEON have to be aware of this
limitation, and either need to provide scalar fallbacks that may be much
slower (up to 20x for AES instructions) and potentially less safe, or
use an asynchronous interface that defers processing to a later time
when the NEON is guaranteed to be available.

Given that grabbing and releasing the NEON is cheap, we can relax this
restriction, by increasing the granularity of kernel mode NEON code, and
always disabling softirq processing while the NEON is being used in task
context.

Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
Acked-by: default avatarWill Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210302090118.30666-4-ardb@kernel.org


Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
parent 4c4dcd35
Loading
Loading
Loading
Loading
+1 −1
Original line number Diff line number Diff line
@@ -700,7 +700,7 @@ AES_FUNC_START(aes_mac_update)
	cbz		w5, .Lmacout
	encrypt_block	v0, w2, x1, x7, w8
	st1		{v0.16b}, [x4]			/* return dg */
	cond_yield	.Lmacout, x7
	cond_yield	.Lmacout, x7, x8
	b		.Lmacloop4x
.Lmac1x:
	add		w3, w3, #4
+1 −1
Original line number Diff line number Diff line
@@ -121,7 +121,7 @@ CPU_LE( rev32 v11.16b, v11.16b )
	add		dgav.4s, dgav.4s, dg0v.4s

	cbz		w2, 2f
	cond_yield	3f, x5
	cond_yield	3f, x5, x6
	b		0b

	/*
+1 −1
Original line number Diff line number Diff line
@@ -129,7 +129,7 @@ CPU_LE( rev32 v19.16b, v19.16b )

	/* handled all input blocks? */
	cbz		w2, 2f
	cond_yield	3f, x5
	cond_yield	3f, x5, x6
	b		0b

	/*
+2 −2
Original line number Diff line number Diff line
@@ -184,11 +184,11 @@ SYM_FUNC_START(sha3_ce_transform)
	eor	 v0.16b,  v0.16b, v31.16b

	cbnz	w8, 3b
	cond_yield 3f, x8
	cond_yield 4f, x8, x9
	cbnz	w2, 0b

	/* save state */
3:	st1	{ v0.1d- v3.1d}, [x0], #32
4:	st1	{ v0.1d- v3.1d}, [x0], #32
	st1	{ v4.1d- v7.1d}, [x0], #32
	st1	{ v8.1d-v11.1d}, [x0], #32
	st1	{v12.1d-v15.1d}, [x0], #32
+1 −1
Original line number Diff line number Diff line
@@ -195,7 +195,7 @@ CPU_LE( rev64 v19.16b, v19.16b )
	add		v10.2d, v10.2d, v2.2d
	add		v11.2d, v11.2d, v3.2d

	cond_yield	3f, x4
	cond_yield	3f, x4, x5
	/* handled all input blocks? */
	cbnz		w2, 0b

Loading