Relative Content

Tag Archive for huggingface-transformerstokenizehuggingface-tokenizersgpt-2

Reordering GPT2Tokenizer tokens by frequency leads to unrecognized tokens

I am trying to create a new tokenizer by reordering the token ids in my existing tokenizer based on frequency. In theory, the order of token ids has no effect on performance or usability, but it results in not recognizing a few tokens. I am doing this for a variety of reasons and there isn’t another way to accomplish what i need other than through reordering. Why is this happening?