Skip to content
Commit ade1fa24 authored by Wilco Dijkstra's avatar Wilco Dijkstra
Browse files

AArch64: Add optimized Q-register memcpy



Add a new memcpy using 128-bit Q registers - this is faster on modern
cores and reduces codesize.  Similar to the generic memcpy, small cases
include copies up to 32 bytes.  64-128 byte copies are split into two
cases to improve performance of 64-96 byte copies.  Large copies align
the source rather than the destination.

bench-memcpy-random is ~9% faster than memcpy_falkor on Neoverse N1,
so make this memcpy the default on N1 (on Centriq it is 15% faster than
memcpy_falkor).

Passes GLIBC regression tests.

Reviewed-by: default avatarSzabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 4a733bf3)
parent afc53d52
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment