x86: Optimize memmove-vec-unaligned-erms.S
No bug. The optimizations are as follows: 1) Always align entry to 64 bytes. This makes behavior more predictable and makes other frontend optimizations easier. 2) Make the L(more_8x_vec) cases 4k aliasing aware. This can have significant benefits in the case that: 0 < (dst - src) < [256, 512] 3) Align before `rep movsb`. For ERMS this is roughly a [0, 30%] improvement and for FSRM [-10%, 25%]. In addition to these primary changes there is general cleanup throughout to optimize the aligning routines and control flow logic. Signed-off-by:Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by:
H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit a6b7502e)
Loading
Please register or sign in to comment