Relative Content

Tag Archive for pythonhuggingface-transformershuggingface-tokenizers

Use added tokens in BertTokenizer with a BartForConditionalGeneration model

I have a BertTokenizer and I added some tokens to it.

cleanup_tokenization_spaces issue in Flux running in ComfyUI

I’m getting the issue below with Flux in ComfyUI and it points to this bug (https://discuss.huggingface.co/t/cleaup-tokenization-spaces-error/102749). What is the solution to resolve it? Do I set the cleanup_tokenization_spaces parameter to false somewhere?

ASR Model Tokenizer Won’t Load

I’m trying to load the ASR model ‘facebook/wav2vec2-large-xlsr-53’ so I made this simple script to test:

BertTokenizer has no attribute named save_pretrained

AttributeError: ‘BertTokenizer’ object has no attribute ‘save’

MBART-50 looks not compatible with Pipeline

from transformers import MBartForConditionalGeneration, MBart50TokenizerFast article_en = “When you have a medical appointment, your health provider writes notes on the visit that are available to you” article_fr = “Les infirmières praticiennes et infirmiers praticiens sont des membres du personnel infirmier autorisé qui possèdent une formation et une expérience plus poussées et qui peuvent poser un […]

Seq2SeqTrainer produces incorrect EvalPrediction after changing another Tokenizer

I’m using Seq2SeqTrainer to train my model with a custom tokenizer. The base model is BART Chinese (fnlp/bart-base-chinese). If the original tokenizer of BART Chinese is used, the output is normal. Yet when I swap the tokenizer with another tokenizer that I made, the output of compute_metrics, specifically the preds part of EvalPrediction is incorrect (the decoded text becomes garbage).

AutoTokenizer.from_pretrained took forever to load

I used the following code to load my custom-trained tokenizer:

Thiết kế website giá rẻ

Danh mục