How to handle length changes due to normalization (NFKC for my use-case)?
Normalization does not always result in a 1-1 mapping of characters. Characters like ‘fi’ will break into ‘fi’ and some Japanese/Chinese characters can combine into a single character. I need a way to map offsets between the normalized and original strings. Is there any library or method to solve this issue accurately?