x86: Optimize str{n}casecmp TOLOWER logic in strcmp-sse42.S
Slightly faster method of doing TOLOWER that saves an
instruction.
Also replace the hard coded 5-byte no with .p2align 4. On builds with
CET enabled this misaligned entry to strcasecmp.
geometric_mean(N=40) of all benchmarks New / Original: .920
All string/memory tests pass.
Reviewed-by:
H.J. Lu <hjl.tools@gmail.com>
Loading