Relative Content

Tag Archive for cudasynchronizationcpunccl

How to collect CPU time when code is running on CPU

I am investigating creating a tracer for NCCL and have a problem for time synchronization between CPU and GPU. The existing way I found for this was to use cudaHostalloc() to allocate a pointer documenting the CPU timestamp, so that both CPU and GPU can have access to the pointer and the CPU writes to it while the GPU reads from it, therefore the GPU knows the CPU timestamp. We have a CPU thread where the CPU loops to modify the timestamp. On the GPU side, the GPU reads from it and also documents the corresponding GPU timestamp using clock64(). Thus, CPU and GPU synchronization can be done.