Relative Content

Tag Archive for simdavx

What is the most edfficient way to pack elements in AVX?

Consider we have 16 * 256i data in 16 m256i rigisters ix[16], and I want to pack them in the following way:

data details:

ix[0] : a0, a1, a2, … , a30, a31

ix[1] : b0, b1, b2, … , b30, b31

..

ix[15] : z0, z1, z2, … , z30, z31

expectations:

consecutive mem:

a0, b0, c0, …, z0, a1, b1, c1, …, z1, a2, …, a31, …, z31.

Apprently _mm256_extract_epi8 can do this, but I think there must be a more efficient way, please answer if got some idea. Thanks a lot!