Skip to content
Commit 59803e81 authored by Sajan Karumanchi's avatar Sajan Karumanchi Committed by Florian Weimer
Browse files

x86: Optimizing memcpy for AMD Zen architecture.



Modifying the shareable cache '__x86_shared_cache_size', which is a
factor in computing the non-temporal threshold parameter
'__x86_shared_non_temporal_threshold' to optimize memcpy for AMD Zen
architectures.
In the existing implementation, the shareable cache is computed as 'L3
per thread, L2 per core'. Recomputing this shareable cache as 'L3 per
CCX(Core-Complex)' has brought in performance gains.
As per the large bench variant results, this patch also addresses the
regression problem on AMD Zen architectures.

Reviewed-by: default avatarPremachandra Mallappa <premachandra.mallappa@amd.com>
parent 641a1248
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment