looking for efficient SIMD solution I am trying to optimize a loop with simd. I need suggestion for one loop.