Existing CUDA GEMM implementation Is there any existing CUDA GEMM implementation that suffices my need: