Memory transfers not always overlapping with kernel calls
In the following sample code (compiled with --default-stream per-thread
), I don’t understand why memory transfers are only sometimes executed concurrently with kernel calls:
In the following sample code (compiled with --default-stream per-thread
), I don’t understand why memory transfers are only sometimes executed concurrently with kernel calls: