Skip to content
Commit 6e98983c authored by Paul Zimmermann's avatar Paul Zimmermann Committed by Adhemerval Zanella
Browse files

math: Optimized generic exp10f with wrappers

It is inspired by expf and reuses its tables and internal functions.
The error checks are inlined and errno setting is in separate tail
called functions, but the wrappers are kept in this patch to handle
the _LIB_VERSION==_SVID_ case.

Double precision arithmetics is used which is expected to be faster on
most targets (including soft-float) than using single precision and it
is easier to get good precision result with it.

Result for x86_64 (i7-4790K CPU @ 4.00GHz) are:

Before new code:
  "exp10f": {
   "workload-spec2017.wrf (adapted)": {
    "duration": 4.0414e+09,
    "iterations": 1.00128e+08,
    "reciprocal-throughput": 26.6818,
    "latency": 54.043,
    "max-throughput": 3.74787e+07,
    "min-throughput": 1.85038e+07
   }

With new code:
  "exp10f": {
   "workload-spec2017.wrf (adapted)": {
    "duration": 4.11951e+09,
    "iterations": 1.23968e+08,
    "reciprocal-throughput": 21.0581,
    "latency": 45.4028,
    "max-throughput": 4.74876e+07,
    "min-throughput": 2.20251e+07
   }

Result for aarch64 (A72 @ 2GHz) are:

Before new code:
  "exp10f": {
   "workload-spec2017.wrf (adapted)": {
    "duration": 4.62362e+09,
    "iterations": 3.3376e+07,
    "reciprocal-throughput": 127.698,
    "latency": 149.365,
    "max-throughput": 7.831e+06,
    "min-throughput": 6.69501e+06
   }

With new code:
  "exp10f": {
   "workload-spec2017.wrf (adapted)": {
    "duration": 4.29108e+09,
    "iterations": 6.6752e+07,
    "reciprocal-throughput": 51.2111,
    "latency": 77.3568,
    "max-throughput": 1.9527e+07,
    "min-throughput": 1.29271e+07
   }

Checked on x86_64-linux-gnu, powerpc64le-linux-gnu, aarch64-linux-gnu,
and sparc64-linux-gnu.
parent 2004063f
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment