Relative Content

Tag Archive for nlphuggingface-transformerslarge-language-model

Failed to import transformers.integrations.peft

RuntimeError: Failed to import transformers.models.bert.modeling_bert because of the following error (look up to see its traceback):
Failed to import transformers.integrations.peft because of the following error (look up to see its traceback):
/usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE

Memory Error While Fine-tuning 13B parameters on 8 H100 GPUs

I am currently trying to fine-tune an AYA model which has 12.95 B parameters on 8 H100 GPUs, but I’m encountering a memory error. My system has 640 GB of GPU RAM, which I assumed would be sufficient for this task. I’m not using PEFT or LoRA, and my batch size is set to 1.
I’m wondering if anyone has encountered a similar issue and could provide some guidance. How many GPUs are typically recommended for this task? Any help would be greatly appreciated.

Memory Error While Fine-tuning AYA on 8 H100 GPUs

I am currently trying to fine-tune an AYA model on 8 H100 GPUs, but I’m encountering a memory error. My system has 640 GB of GPU RAM, which I assumed would be sufficient for this task. I’m not using PEFT or LoRA, and my batch size is set to 1.
I’m wondering if anyone has encountered a similar issue and could provide some guidance. How many GPUs are typically recommended for this task? Any help would be greatly appreciated.