LLaMA 3.1 Fine-tuning with QLoRA – CUDA Out of Memory Error
I’m trying to fine-tune the LLaMA 3.1 8 billion parameters model using the QLoRA technique with the help of the 4-bit bitsandbytes library on a mental health conversations dataset from Hugging Face. However, when I run the code, I’m encountering a torch.cuda.OutOfMemoryError
. I’ve tried using multiple GPUs and also higher GPU memory, but the error persists.