I’m using the syntax below to replace only specific values in the ‘content_text’ column based on the macro_id column in my spreadsheet but the results are very inconsistent. Essentially the macros are numbers that represent text and they are all uniform as far as character count and delimiters, however, and I cannot figure out why some macro ids never get replaced. There are instances where there are multiple macro_ids in the context field but sometimes they aren’t replaced when there is only one. There are only about 4200 rows and I’m really hoping my laptop isn’t just tapping out.
import pandas as pd
import re
# Read the large spreadsheet with 11 thousand rows
df = pd.read_excel('testmacros.xlsx')
# Handle NaN values in 'content_text' column
df['content_text'] = df['content_text'].fillna('')
# Update values in the 'content_text' column using the map function
s = df.astype({'macro_id': str}).set_index('macro_id')['content_text']
pattern = r'b(%s)b' % '|'.join(map(re.escape, s.index))
df['content_text'] = (df['content_text']
.str.replace(pattern,
lambda m: s.get(m.group(0)),
regex=True)
)
# Save the updated large spreadsheet
df.to_excel('updatedmacros.xlsx', index=False)
Example:
What I have
macro_id | content_text |
---|---|
5678111123 | The Department of Health |
5678114567 | All personnel reporting late should call the 5678114568 |
5678114568 | Front desk phone line at 1-555-555-5555 |
5678112222 | Patients with a fever 300 should report to 5678114568 and contact 5678111123 |
What should return
macro_id | content_text |
---|---|
5678111123 | The Department of Health |
5678114567 | All personnel reporting late should call the Front desk phone line at 1-555-555-5555 |
5678114568 | Front desk phone line at 1-555-555-5555 |
5678112222 | Patients with a fever 300 should report to Front desk phone line at 1-555-555-5555 and contact The Department of Health |
I’m using Visual Studio 2019 and sometimes when I run the code I get at least 8 rows that didn’t convert and it’s gotten up to 186.
I’ve spot checked multiple macro_ids and confirmed they are valid and tie to text. The values on the original sheet were loading directly from a Power Query and I’ve played with changing the field type from number to text which is where I see the fluctuations in rows that didn’t convert. I’ve pasted the values as text into separate sheets for xlsx and csv formatting but I never get 100% replacement. What could I be doing wrong?