Commit 3c998069 authored Jun 27, 2022 by Danila Kutenin Committed by Szabolcs Nagy Jul 06, 2022

aarch64: Optimize string functions with shrn instruction

We found that string functions were using AND+ADDP
to find the nibble/syndrome mask but there is an easier
opportunity through `SHRN dst.8b, src.8h, 4` (shift
right every 2 bytes by 4 and narrow to 1 byte) and has
same latency on all SIMD ARMv8 targets as ADDP. There
are also possible gaps for memcmp but that's for
another patch.

We see 10-20% savings for small-mid size cases (<=128)
which are primary cases for general workloads.

parent bd0b5883

Show whitespace changes

Inline Side-by-side

mirror @mirror
mentioned in commit e213c220
· Apr 11, 2024

mentioned in commit e213c220

mentioned in commit e213c2205eda08ddd4682fef77cd301cdb846794

Toggle commit list
mirror @mirror
mentioned in commit fa9cd4fd
· Apr 11, 2024

mentioned in commit fa9cd4fd

mentioned in commit fa9cd4fd3c45fec2211a4603777fec2e0501f32b

Toggle commit list
mirror @mirror
mentioned in commit c4f4b53e
· Apr 11, 2024

mentioned in commit c4f4b53e

mentioned in commit c4f4b53eeefe08ec557c8bfe8e0c9f5871a4487a

Toggle commit list
mirror @mirror
mentioned in commit ea25fe55
· Apr 11, 2024

mentioned in commit ea25fe55

mentioned in commit ea25fe5599068cbadb19e0b2d8640353c32758eb

Toggle commit list

Please register or to comment