Skip to content
Commit 82a707ae authored by Raghuveer Devulapalli's avatar Raghuveer Devulapalli Committed by Sunil K Pandey
Browse files

x86_64: Add strstr function with 512-bit EVEX



Adding a 512-bit EVEX version of strstr. The algorithm works as follows:

(1) We spend a few cycles at the begining to peek into the needle. We
locate an edge in the needle (first occurance of 2 consequent distinct
characters) and also store the first 64-bytes into a zmm register.

(2) We search for the edge in the haystack by looking into one cache
line of the haystack at a time. This avoids having to read past a page
boundary which can cause a seg fault.

(3) If an edge is found in the haystack we first compare the first
64-bytes of the needle (already stored in a zmm register) before we
proceed with a full string compare performed byte by byte.

Benchmarking results: (old = strstr_sse2_unaligned, new = strstr_avx512)

Geometric mean of all benchmarks: new / old =  0.66

Difficult skiptable(0) : new / old =  0.02
Difficult skiptable(1) : new / old =  0.01
Difficult 2-way : new / old =  0.25
Difficult testing first 2 : new / old =  1.26
Difficult skiptable(0) : new / old =  0.05
Difficult skiptable(1) : new / old =  0.06
Difficult 2-way : new / old =  0.26
Difficult testing first 2 : new / old =  1.05
Difficult skiptable(0) : new / old =  0.42
Difficult skiptable(1) : new / old =  0.24
Difficult 2-way : new / old =  0.21
Difficult testing first 2 : new / old =  1.04
Reviewed-by: default avatarH.J. Lu <hjl.tools@gmail.com>

(cherry picked from commit 5082a287)

x86: Remove __mmask intrinsics in strstr-avx512.c

The intrinsics are not available before GCC7 and using standard
operators generates code of equivalent or better quality.

Removed:
    _cvtmask64_u64
    _kshiftri_mask64
    _kand_mask64

Geometric Mean of 5 Runs of Full Benchmark Suite New / Old: 0.958

(cherry picked from commit f2698954)
parent f6bc52f0
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment