Skip to content
Commit b31bd114 authored by Wilco Dijkstra's avatar Wilco Dijkstra
Browse files

AArch64: Improve A64FX memcpy



v2 is a complete rewrite of the A64FX memcpy. Performance is improved
by streamlining the code, aligning all large copies and using a single
unrolled loop for all sizes. The code size for memcpy and memmove goes
down from 1796 bytes to 868 bytes. Performance is better in all cases:
bench-memcpy-random is 2.3% faster overall, bench-memcpy-large is ~33%
faster for large sizes, bench-memcpy-walk is 25% faster for small sizes
and 20% for the largest sizes. The geomean of all tests in bench-memcpy
is 5.1% faster, and total time is reduced by 4%.

Reviewed-by: default avatarSzabolcs Nagy <szabolcs.nagy@arm.com>
parent b51eb35c
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment