Relative Content

Tag Archive for pandasdata-cleaningstring-matching

dynamic approach to identify and standardize similar names automatically in pandas or data cleaning

I have a DataFrame with a column of publisher names that contains various minor variations of the same publisher. For example, entries such as “Harlequin Romance”, “Harlequin Blaze”, “Harlequin Superromance”, and “Harlequin” all refer to the same publisher, “Harlequin”. Similarly, “Hackett Publishing Company Inc.”, “Hackett Publ. Co Inc”, and “Hackett Publishing Company Inc. (USA)” should be standardized to a single name like “Hackett Publishing Company Inc.”.