x86: Optimize strcmp-avx2.S
Optimization are primarily to the loop logic and how the page cross
logic interacts with the loop.
The page cross logic is at times more expensive for short strings near
the end of a page but not crossing the page. This is done to retest
the page cross conditions with a non-faulty check and to improve the
logic for entering the loop afterwards. This is only particular cases,
however, and is general made up for by more than 10x improvements on
the transition from the page cross -> loop case.
The non-page cross cases are improved most for smaller sizes [0, 128]
and go about even for (128, 4096]. The loop page cross logic is
improved so some more significant speedup is seen there as well.
test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass.
Signed-off-by:
Noah Goldstein <goldstein.w.n@gmail.com>
-
mentioned in commit c15efd01
-
mentioned in commit d2990327
-
mentioned in commit 0d5b36c8
-
mentioned in commit b68e782f
-
mentioned in commit 250e2777
-
mentioned in commit 63d1ff4a
-
mentioned in commit 80f86910
-
mentioned in commit 10f79d36
-
mentioned in commit fc7de1d9
-
mentioned in commit f31a5a88
-
mentioned in commit 4d64c644
-
mentioned in commit 35f655c8
-
mentioned in commit 050d6529
-
mentioned in commit 007581a1
-
mentioned in commit 414fc856
-
mentioned in commit 26d4359b
-
mentioned in commit b9cbb8dd
-
mentioned in commit 67e86374
-
mentioned in commit e1561d8c
-
mentioned in commit 1523fbed
-
mentioned in commit 0161ebf1
-
mentioned in commit 46479e5d
-
mentioned in commit 29c577e0
-
mentioned in commit 7afbd1e5
-
mentioned in commit 7e1326c4
-
mentioned in commit 6260de88
-
mentioned in commit d8bf4388