How to know which words are encoded with unknown tokens in HuggingFace BertTokenizer?
I use the following code to count how many % of words are encoded to unknown tokens.
I use the following code to count how many % of words are encoded to unknown tokens.