Relative Content

Tag Archive for pythonfunctioncudanumba

Performance issue using multiple kernels in a row in numba cuda

I’m running a simulation that involves using multiple kernels (7) subsequently on some data. All the data is transferred on the device previously so there is no data transfer during the call of the kernels. I’m doing some performance tests on the simulation. A simulation step involves calling all kernels in a row: