Relative Content

Tag Archive for huggingfacehuggingface-hub

Call Text-Generation-Inference (TGI) with a batch of prompts

In Text-Generation-Inference (TGI), I see that there is a parameter of --max-batch-total-tokens, indicating that there is a batch request capability available via TGI. But, when I see the API guide, I cannot find anything related to that. For example, for /generate, the input format is