Loading int8 version of llama3 from llama.cpp
I’m trying to load an 8 bit quantized version of llama3 on my local laptop (linux) from llama.cpp, but the process is getting killed due to memory exceeding
Is there any way around this?
I’m trying to load an 8 bit quantized version of llama3 on my local laptop (linux) from llama.cpp, but the process is getting killed due to memory exceeding
Is there any way around this?