arm64: xchg: Implement cmpxchg_double
The arm64 architecture has the ability to exclusively load and store a pair of registers from an address (ldxp/stxp). Also the SLUB can take advantage of a cmpxchg_double implementation to avoid taking some locks. This patch provides an implementation of cmpxchg_double for 64-bit pairs, and activates the logic required for the SLUB to use these functions (HAVE_ALIGNED_STRUCT_PAGE and HAVE_CMPXCHG_DOUBLE). Also definitions of this_cpu_cmpxchg_8 and this_cpu_cmpxchg_double_8 are wired up to cmpxchg_local and cmpxchg_double_local (rather than the stock implementations that perform non-atomic operations with interrupts disabled) as they are used by the SLUB. On a Juno platform running on only the A57s I get quite a noticeable performance improvement with 5 runs of hackbench on v3.17: Baseline | With Patch -----------------+----------- Mean 119.2312 | 106.1782 StdDev 0.4919 | 0.4494 (times taken to complete `./hackbench 100 process 1000', in seconds) Signed-off-by: Steve Capper <steve.capper@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>