Relative Content

Tag Archive for cuda

.cuh and .h file difference CUDA

When I declare an extern __device__ variable in a .cuh file, which is defined in a .cu file, and then try to use it in a different .cu file, I get a multiple definition error, but when I change the .cuh file to .h file, the error disappears. Is there a difference between .cuh and .h file? This post https://forums.developer.nvidia.com/t/whats-the-difference-between-cuh-and-h/266214 suggests there is no difference between .cuh and .h files, so why is this happening?

Referencing a pitched pointer in device function CUDA

I have created a 3D matrix using cudaMalloc3D using cudaPitchedPtr, I would like to reference the created matrix from a device function as well. Does copying the pitched pointer into __device__ cudaPitchedPtr and then referencing it work? For example –

Copying a 1D array to 3d pitched array CUDA

I need to copy a 1d array into a 3d pitched array. Each thread in the kernel copies one row into a 3d array. Is there anyway to do it using cudaMemcpy or cudaMemcpy3d?

Allocating memory dynamically for each thread CUDA

I need to allocate an array for each thread, but the length of the array is known only at runtime. Once the array length is calculated, it is a constant value. cudaMalloc does not seem to work inside the kernel.
Is there anyway I can do it? Something like this –

How do you Allocate a structure that contains a double pointer using cudaMalloc

I have tried everything to allocate a struct containing a double pointer to device memory using cudaMalloc. I know this question has been asked multiple times yet the question is always for only a single pointer in a structure. I cannot flatten the data because training.data[] index is linked to many other functions of the program. If i cant index the data using training.data[1][n], training.data[2][n]……etc then i cant make use of the data. Ive tried this but keep getting a memory access violation error:

Why sometimes the same kernel executes 10x slower?

Here’s the code:

How to get a Cuda event time in CPU timeline?

Here’s the pseudo code:

How to do cudaMemcpy with priority

Objective: I have two groups of data that need to be copied to the GPU. The first group is large and has a lower priority, while the second one is smaller and has a higher priority, such as metadata for a new job. The cudaMempy of low-priority one is issued first. I want to ensure […]

Thiết kế website giá rẻ

Danh mục