How the a neural network is mapped to a GPU?
I want to understand when a GPU executes a neural network, how the operations are mapped to the GPU’s hardware resources. I am familiar with the architecture of GPUs (especially NVIDIA) and I generally know how an NN is executed by them, but I do not know how to get to detailed and fine-grain scheduling of operations to the hardware resources and how the cores execute them. I am wondering if there is any tool or a set of tools for that.