Commit 59803e81 authored Oct 28, 2020 by Sajan Karumanchi Committed by Florian Weimer Oct 28, 2020

x86: Optimizing memcpy for AMD Zen architecture.



Modifying the shareable cache '__x86_shared_cache_size', which is a
factor in computing the non-temporal threshold parameter
'__x86_shared_non_temporal_threshold' to optimize memcpy for AMD Zen
architectures.
In the existing implementation, the shareable cache is computed as 'L3
per thread, L2 per core'. Recomputing this shareable cache as 'L3 per
CCX(Core-Complex)' has brought in performance gains.
As per the large bench variant results, this patch also addresses the
regression problem on AMD Zen architectures.

Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>

parent 641a1248

Show whitespace changes

Inline Side-by-side

mirror @mirror
mentioned in commit 48cf525f
· Jan 27, 2021

mentioned in commit 48cf525f

mentioned in commit 48cf525f4b2ceb84acdc172cdbbee37b88399adc

Toggle commit list
mirror @mirror
mentioned in commit 8813b268
· Jan 27, 2021

mentioned in commit 8813b268

mentioned in commit 8813b2682e4094e43b0cf1634e99619f1b8b2c62

Toggle commit list

Please register or to comment