Relative Content

Tag Archive for c++cuda

CUDA – access violation on each function

I have a C# app that runs CUDA code; I wanted to use cudaMalloc but apparently I have access violation at any CUDA API functions (cudaSetDevice, cudaMemGetInfo, cudaMalloc…). I am pretty sure the program ran once this morning (I had cudaMemGetInfo and return after that).

In CUDA, is it faster to generate matrices on the CPU or through a kernel in the GPU?

I’m a beginner on CUDA and C++ and I’m trying to multiply two 1000 x 1000 matrices that both contain randomly generated values from 0 to 100. For this I’m using Visual Studio. I’m also trying to compare how fast the entire program executes (calculation time + other operation times etc.) on GPU and on CPU. I made the multiplication that runs on CPU with C++. Sorry if my code looks clumsy, I’m still learning.

how to copy 3D array on device CUDA, where is third dimension is not constant

I am implementing a parallel two-dimensional SPH method. I have a problem with efficiently copying a three-dimensional array to a device.
I partially solved the problem by copying this entire array to the device, but this solution was achieved by converting a three-dimensional array into a one-dimensional one, while I had to make the 3rd dimension constant, otherwise I just didn’t figure out how to do it. After all, the number of particles in a grid cell can be different. And now I have a lot of inefficient memory allocated on the device.

CUBLAS root kernel is called twice for a reduction operation

I am making a call to the cublasSasum function only once from cuBLAS. I do see that the actual kernel it calls (asum_kernel) is called twice as seen from profiling via nsys. I am computing a sum for a total of 4096^2 elements.

Thiết kế website giá rẻ

Danh mục