GCC generates slow code when targeting more recent sse version
I have very simple test program like below. Just sum all uint8 values in array.
GCC seems to generate significantly slower code when targeting sse4 or avx2.
Code is significantly faster with ssse3.
GCC generates slow code when targeting more newer sse version
I have very simple test program like below. Just sum all uint8 values in array.
GCC seems to generate significantly slower code when targeting sse4 or avx2.
Code is significantly faster with ssse3.