NLP: LayoutXLM (HF model) inference- index out of bounds: 0 <= tmp30 < 1L

I am getting this during inference and I am desperate after days of debugging – hoping for any help! Thank you!

what():  index out of bounds: 0 <= tmp30 < 1L

Up the strack trace of the error:

0   c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*)

I am training on

Ubuntu 22.04
nvidia-smi:
NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3
env:
cuda 12.1
cudnn 9.1
datasets 2.15.0
transformers 4.36.2
torch 2.4

Document question answering, custom dataset.

Model repo being trained: https://huggingface.co/Sharka/CIVQA_DVQA_LayoutXLM

Tokenizer:
copied from: microsoft/layoutxlm-base · Hugging Face

I am getting this error during evaluation (prior to training)(always with the same samle, as far as I can tell):

Code part:

...
outputs = model(input_ids=input_ids, attention_mask=attention_mask,
                token_type_ids=token_type_ids, bbox=bbox, image=image, 
                start_positions=start_positions, end_positions=end_positions)
...

Error message:

5%|▌         | 5901/109877 [56:31<16:12:41,  1.78it/s]terminate called after throwing an instance of 'c10::Error'
  what():  index out of bounds: 0 <= tmp30 < 1L
Exception raised from kernel at /tmp/torchinductor_aiteam/li/cliz2c63uoa3repoiaztoizrjecjxefsfbjltc6wzfp7p6brqesb.cpp:155 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f486a6d0f86 in /home/aiteam/miniconda3/envs/hf_layoutLM_test/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0x68 (0x7f486a67fdd9 in /home/aiteam/miniconda3/envs/hf_layoutLM_test/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x432b (0x7f47aa16032b in /tmp/torchinductor_aiteam/li/cliz2c63uoa3repoiaztoizrjecjxefsfbjltc6wzfp7p6brqesb.so)
frame #3: <unknown function> + 0x16405 (0x7f48b92aa405 in /home/aiteam/miniconda3/envs/hf_layoutLM_test/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1)
frame #4: <unknown function> + 0x8609 (0x7f48ba6e4609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #5: clone + 0x43 (0x7f48ba4af353 in /lib/x86_64-linux-gnu/libc.so.6)
terminate called recursively
--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*)
----------------------
Error Message Summary:
----------------------
FatalError: `Process abort signal` is detected by the operating system.
  [TimeInfo: *** Aborted at 1725357799 (unix time) try "date -d @1725357799" if you are using GNU date ***]
  [SignalInfo: *** SIGABRT (@0x3ea0001d000) received by PID 118784 (TID 0x7f47ff7ab700) from PID 118784 ***]

I heavily assume it is something with the data, with that particular sample.
Only, I am debugging for days, and I cannot see a systematic difference (obviously I am just missing it) between that sample and any other.
My feature:
“input_ids” → in range [0, 250002], which is the tokenizers’ vocab
“attention_mask” → in {0, 1}
“start_positions” 0 for this sample (subfinder didnt find the answer in the context)
“end_positions” 0 or this samples
“bbox” → normalized, all in [0, 1000]
“image” → uint8, all in range [0, 255]

Some(naiive) direct question(s):

Can the problem stem from start and end positions predictions being 0?
For some reason, the tokenizer is using token id==6 to tokenize the context of that sample, which, for this tokenizer, corresponds to an empty string, e.g. '' : (from tokenizer.json):

“vocab”:[…,[“▁”,-3.9299705028533936], …]

This being token id==6 (Why is the empty string being represented by this symbol?)
However, there is no empty string the original context to begin with.

Can the problem stem from using the fast tokenizer, while the model config says “LayoutXLMTokenizer” ?

Any idea appreciated! Thank you in advance!

Filed under: Kiến thức lập trình - @ 15:06

Thẻ: stringnlphuggingface-transformerslarge-language-modelhuggingface-tokenizers

Thiết kế website giá rẻ

Danh mục

NLP: LayoutXLM (HF model) inference- index out of bounds: 0 <= tmp30 < 1L