How to reduce the latency in RaG based local chatbot for pdf (Ollama,Llama3,pgvector)?
i am trying to build a local chatbot for pdf’s using RAG,Ollama,llama3 ,pgvector and streamlit. It is working fine but the time take to generate first token is almost 262.5005s or even more. I don’t have a GPU. Working on windows 11 and CPU of 16gb RAM.When i run the app and upload any pdf it takes almost 7-8minutes to respond to each query. I was thinking if there’s any way we can preprocess the pdf(1000pdf) beforehand and than inject to the vectordata base? Any suggestion would be helpful.