ONNX runs significantly slower with CUDAExecutionProvider compared to CPUExecutionProvider
I’m running an ONNX model exported from TorchDynamo, and noticed that it runs significantly slower with CUDAExecutionProvider
compared to CPUExecutionProvider
, with the CPU being ~3x as fast.