Simpler way to efficiently copy strided pixel data using RISC-V vector assembly (V)?
Problem statement void copy(char *dst, const char *src, const ptrdiff_t stride, const int w, int h) { do { memcpy(dst, src, w); dst += stride; src += stride; } while (–h); } I’m struggling to find a way to efficiently implement this using RISC-V scalable vectors in a way that does not make unnecessary assumptions […]