Skip to content
Commit 5777eaed authored by Robin Murphy's avatar Robin Murphy Committed by Will Deacon
Browse files

arm64: Implement optimised checksum routine



Apparently there exist certain workloads which rely heavily on software
checksumming, for which the generic do_csum() implementation becomes a
significant bottleneck. Therefore let's give arm64 its own optimised
version - for ease of maintenance this foregoes assembly or intrisics,
and is thus not actually arm64-specific, but does rely heavily on C
idioms that translate well to the A64 ISA and the typical load/store
capabilities of most ARMv8 CPU cores.

The resulting increase in checksum throughput scales nicely with buffer
size, tending towards 4x for a small in-order core (Cortex-A53), and up
to 6x or more for an aggressive big core (Ampere eMAG).

Reported-by: default avatarLingyan Huang <huanglingyan2@huawei.com>
Tested-by: default avatarLingyan Huang <huanglingyan2@huawei.com>
Signed-off-by: default avatarRobin Murphy <robin.murphy@arm.com>
Signed-off-by: default avatarWill Deacon <will@kernel.org>
parent 46cf053e
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment