Cost of replacing virtual functions with variants on GPU
Virtual functions can be used to implement dynamic polymorphism. I know that in most C++ compilers, it is implemented with a vtable
, and each class can find correct function definition with an extra vtable
read (one extra memory access). I think it is not efficient on GPU (CUDA): since I don’t know how nvcc
deals with virtual functions, I assume it is the same with CPU and the vtable
, together with the memory it points to locates on global memory. Then the extra memory access (on GPU) can be slow, since it has no chance to be coalesced and it accesses the slow global memory.