aarch64: Use memcpy_simd as the default memcpy
Since __memcpy_simd is the fastest memcpy on almost all cores, replace the generic memcpy with it. (cherry picked from commit e6f3fe36)
Loading
Since __memcpy_simd is the fastest memcpy on almost all cores, replace the generic memcpy with it. (cherry picked from commit e6f3fe36)