Search a dataframe of paragraphs against a dataframe of substrings and return the substrings and the paragraph they matched against

  Kiến thức lập trình

I am searching through a web page against a list of names in a data frame. If a name appears in a paragraph I want to know which paragraph so I can parse certain parts of that paragraph and then associate it with the name.

I have two data frames:

dfrule which is made up of ‘Paragraphs’ and ‘Ids’ and eldf which is made up of ‘Names’ and ‘Ids’

So far I have:

substring_matches = eldf['name'].apply(lambda s1: dfrule['Paragraphs'].apply(lambda s2: s1 in s2).any()
matchdf = eldf[substring_matches]

This gives me every name on the list that matched against any paragraph but not the Id of which paragraph it matched against. How would I be able to associate it with the paragraph id?

LEAVE A COMMENT