Difference between model-analyzer inference speed with actual inference speed
I am trying to improve the inference speed of my encoder model using Nvidia triton server. But I am not able to get the model-analyzer infer/sec speed when manually testing it out. I am using the best configuration as recommended by model-analyzer. I am currently seeing half the inference rate. Am I doing something wrong here?