aarch64: Merge __local_multiple_threads offset with memory reference
This also highlights that we'd been loading 64-bits instead of the proper 32-bits. Caught by the linker as a relocation error, since the variable happened to be unaligned for 64-bits.
Loading