Relative Content

Tag Archive for tokenizehuggingface-tokenizers

handling multi word during tokenization

I have a text which contains different forms of a word. I want to map all singular occurences to a single id and plural forms to a single id. For example, the word ‘seaplane’ and ‘sea plane’ should to a single id (say 10). But ‘seaplanes’ and ‘sea planes’ should map to 11.