Use added tokens in BertTokenizer with a BartForConditionalGeneration model
I have a BertTokenizer
and I added some tokens to it.
cleanup_tokenization_spaces issue in Flux running in ComfyUI
I’m getting the issue below with Flux in ComfyUI and it points to this bug (https://discuss.huggingface.co/t/cleaup-tokenization-spaces-error/102749). What is the solution to resolve it? Do I set the cleanup_tokenization_spaces parameter to false somewhere?
ASR Model Tokenizer Won’t Load
I’m trying to load the ASR model ‘facebook/wav2vec2-large-xlsr-53’ so I made this simple script to test:
BertTokenizer has no attribute named save_pretrained
AttributeError: ‘BertTokenizer’ object has no attribute ‘save’
MBART-50 looks not compatible with Pipeline
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast article_en = “When you have a medical appointment, your health provider writes notes on the visit that are available to you” article_fr = “Les infirmières praticiennes et infirmiers praticiens sont des membres du personnel infirmier autorisé qui possèdent une formation et une expérience plus poussées et qui peuvent poser un […]
Seq2SeqTrainer produces incorrect EvalPrediction after changing another Tokenizer
I’m using Seq2SeqTrainer
to train my model with a custom tokenizer. The base model is BART Chinese (fnlp/bart-base-chinese
). If the original tokenizer of BART Chinese is used, the output is normal. Yet when I swap the tokenizer with another tokenizer that I made, the output of compute_metrics
, specifically the preds
part of EvalPrediction
is incorrect (the decoded text becomes garbage).
AutoTokenizer.from_pretrained took forever to load
I used the following code to load my custom-trained tokenizer: