PyTorch, Mistral-7B – How to load model only once, so it does not require to be loaded at each inference call?
I utilize the code below to run the Mistral-7b model.
However, loading the model itself is where most of the computation time is spent (Loading checkpoint shards:
)