Commit 8813b268 authored Oct 28, 2020 by Sajan Karumanchi Committed by Florian Weimer Oct 30, 2020

x86: Optimizing memcpy for AMD Zen architecture.



Modifying the shareable cache '__x86_shared_cache_size', which is a
factor in computing the non-temporal threshold parameter
'__x86_shared_non_temporal_threshold' to optimize memcpy for AMD Zen
architectures.
In the existing implementation, the shareable cache is computed as 'L3
per thread, L2 per core'. Recomputing this shareable cache as 'L3 per
CCX(Core-Complex)' has brought in performance gains.
As per the large bench variant results, this patch also addresses the
regression problem on AMD Zen architectures.

Backport of commit 59803e81 upstream,
with the fix from cb3a749a ("x86:
Restore processing of cache size tunables in init_cacheinfo") applied.

Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
Co-Authored-by: Florian Weimer <fweimer@redhat.com>

parent e61a8fd8

Show whitespace changes

Inline Side-by-side

mirror @mirror
mentioned in commit 537fc6aa
· May 03, 2022

mentioned in commit 537fc6aa

mentioned in commit 537fc6aa6969d5612e953f40d2c4853f33bc1a76

Toggle commit list

Please register or to comment