AArch64: Update A64FX memset not to degrade at 16KB
This patch updates unroll8 code so as not to degrade at the peak
performance 16KB for both FX1000 and FX700.
Inserted 2 instructions at the beginning of the unroll8 loop,
cmp and branch, are a workaround that is found heuristically.
Reviewed-by:
Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Loading
Please register or sign in to comment