Relative Content

Tag Archive for pythonhuggingface-transformershuggingface-tokenizers

cleanup_tokenization_spaces issue in Flux running in ComfyUI

I’m getting the issue below with Flux in ComfyUI and it points to this bug (https://discuss.huggingface.co/t/cleaup-tokenization-spaces-error/102749). What is the solution to resolve it? Do I set the cleanup_tokenization_spaces parameter to false somewhere?

MBART-50 looks not compatible with Pipeline

from transformers import MBartForConditionalGeneration, MBart50TokenizerFast article_en = “When you have a medical appointment, your health provider writes notes on the visit that are available to you” article_fr = “Les infirmières praticiennes et infirmiers praticiens sont des membres du personnel infirmier autorisé qui possèdent une formation et une expérience plus poussées et qui peuvent poser un […]

Seq2SeqTrainer produces incorrect EvalPrediction after changing another Tokenizer

I’m using Seq2SeqTrainer to train my model with a custom tokenizer. The base model is BART Chinese (fnlp/bart-base-chinese). If the original tokenizer of BART Chinese is used, the output is normal. Yet when I swap the tokenizer with another tokenizer that I made, the output of compute_metrics, specifically the preds part of EvalPrediction is incorrect (the decoded text becomes garbage).