x86: optimize left-shift-by-1
These can be replaced by adds when acting on a register operand. While for the scalar forms there's no gain in encoding size, ADD generally has higher throughput than SHL. EFLAGS set by ADD are a superset of those set by SHL (AF in particular is undefined there). For the SIMD cases the transformation also reduced code size, by eliminating the 1-byte immediate from the resulting encoding. Note that this transformation is not applied by gcc13 (according to my observations), so would - as of now - even improve compiler generated code.
Loading
Please register or sign in to comment