Skip to content
Commit 84cfc647 authored by Adhemerval Zanella Netto's avatar Adhemerval Zanella Netto Committed by Adhemerval Zanella
Browse files

x86: Add AVX2 optimized chacha20

It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-amd64-avx2.S.  It is used only if AVX2 is supported
and enabled by the architecture.

As for generic implementation, the last step that XOR with the
input is omited.  The final state register clearing is also
omitted.

On a Ryzen 9 5900X it shows the following improvements (using
formatted bench-arc4random data):

SSE                                        MB/s
-----------------------------------------------
arc4random [single-thread]               704.25
arc4random_buf(16) [single-thread]      1018.17
arc4random_buf(32) [single-thread]      1315.27
arc4random_buf(48) [single-thread]      1449.36
arc4random_buf(64) [single-thread]      1511.16
arc4random_buf(80) [single-thread]      1539.48
arc4random_buf(96) [single-thread]      1571.06
arc4random_buf(112) [single-thread]     1596.16
arc4random_buf(128) [single-thread]     1613.48
-----------------------------------------------

AVX2                                       MB/s
-----------------------------------------------
arc4random [single-thread]               922.61
arc4random_buf(16) [single-thread]      1478.70
arc4random_buf(32) [single-thread]      2241.80
arc4random_buf(48) [single-thread]      2681.28
arc4random_buf(64) [single-thread]      2913.43
arc4random_buf(80) [single-thread]      3009.73
arc4random_buf(96) [single-thread]      3141.16
arc4random_buf(112) [single-thread]     3254.46
arc4random_buf(128) [single-thread]     3305.02
-----------------------------------------------

Checked on x86_64-linux-gnu.
parent e169aff0
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment