Relative Content

Tag Archive for c++cuda

Why is the object I just constructed in CUDA not matching what it looks like just before it returns from the constructor?

I am writing CUDA all on the device. I have a class to emulate strings since the string class cannot be used on a GPU. My class has wchar_t * string in data_ and a size_ for the length of the string. I call into the constructor at one point when creating a new variable instance, and inside the constructor all goes well. Just before returning I can see it is still fine. But s soon as I return it has garbage, even for the size_, and the memory location seems to have moved slightly (which could explain the garbage). The problem with garbage for size_ goes away if I don’t do the cudaMalloc in the
constructor (ie if I only set _size to len it comes back fine from the constructor).
The disassembly of a GPU is foreign to me so it’s hard to tell what’s going wrong
on the return to the caller.

How to properly free a Cuda context?

I am implementing Optix denoising inside my C++ path tracer. I then need to create a Cuda context before calling Optix kernels. That context should be created every time i spawn a rendering thread since each thread have its own Cuda context

identifier “atomicAdd” in cuda

I was running the k-means algorithm using cuda and encountered a problem in this part of the code before for if (idx < numPoints) { atomicAdd(&counts[points[idx].cluster], 1);
code:

identifier “atomicAdd” in cuda

I was running the k-means algorithm using cuda and encountered a problem in this part of the code before for if (idx < numPoints) { atomicAdd(&counts[points[idx].cluster], 1);
code:

Perform quick flip operations on matrices using CUDA

I want to perform A fast flip operation similar to Matlab for 3D matrix in CUDA C++, but I have encountered a speed bottleneck and need to ask for help. The following will take 222 matrix A to demonstrate the flip function as an example (A = reshape(1:8,2,2,2):