CUDA – access violation on each function
I have a C# app that runs CUDA code; I wanted to use cudaMalloc but apparently I have access violation at any CUDA API functions (cudaSetDevice, cudaMemGetInfo, cudaMalloc…). I am pretty sure the program ran once this morning (I had cudaMemGetInfo and return after that).
In CUDA, is it faster to generate matrices on the CPU or through a kernel in the GPU?
I’m a beginner on CUDA and C++ and I’m trying to multiply two 1000 x 1000 matrices that both contain randomly generated values from 0 to 100. For this I’m using Visual Studio. I’m also trying to compare how fast the entire program executes (calculation time + other operation times etc.) on GPU and on CPU. I made the multiplication that runs on CPU with C++. Sorry if my code looks clumsy, I’m still learning.
how to copy 3D array on device CUDA, where is third dimension is not constant
I am implementing a parallel two-dimensional SPH method. I have a problem with efficiently copying a three-dimensional array to a device.
I partially solved the problem by copying this entire array to the device, but this solution was achieved by converting a three-dimensional array into a one-dimensional one, while I had to make the 3rd dimension constant, otherwise I just didn’t figure out how to do it. After all, the number of particles in a grid cell can be different. And now I have a lot of inefficient memory allocated on the device.
CUBLAS root kernel is called twice for a reduction operation
I am making a call to the cublasSasum
function only once from cuBLAS. I do see that the actual kernel it calls (asum_kernel
) is called twice as seen from profiling via nsys
. I am computing a sum for a total of 4096^2 elements.