Skip to content
Commit c796418d authored by Noah Goldstein's avatar Noah Goldstein Committed by Sunil K Pandey
Browse files

x86: Optimize L(less_vec) case in memcmp-evex-movbe.S



No bug.
Optimizations are twofold.

1) Replace page cross and 0/1 checks with masked load instructions in
   L(less_vec). In applications this reduces branch-misses in the
   hot [0, 32] case.
2) Change controlflow so that L(less_vec) case gets the fall through.

Change 2) helps copies in the [0, 32] size range but comes at the cost
of copies in the [33, 64] size range.  From profiles of GCC and
Python3, 94%+ and 99%+ of calls are in the [0, 32] range so this
appears to the the right tradeoff.

Signed-off-by: default avatarNoah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: default avatarH.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit abddd61d)
parent f3a99b22
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment