Pandas and replacing Excel values incosistently

  Kiến thức lập trình

I’m using the syntax below to replace only specific values in the ‘content_text’ column based on the macro_id column in my spreadsheet but the results are very inconsistent. Essentially the macros are numbers that represent text and they are all uniform as far as character count and delimiters, however, and I cannot figure out why some macro ids never get replaced. There are instances where there are multiple macro_ids in the context field but sometimes they aren’t replaced when there is only one. There are only about 4200 rows and I’m really hoping my laptop isn’t just tapping out.

import pandas as pd
import re

# Read the large spreadsheet with 11 thousand rows
df = pd.read_excel('testmacros.xlsx')


# Handle NaN values in 'content_text' column
df['content_text'] = df['content_text'].fillna('')


# Update values in the 'content_text' column using the map function
s = df.astype({'macro_id': str}).set_index('macro_id')['content_text']

pattern = r'b(%s)b' % '|'.join(map(re.escape, s.index))

df['content_text'] = (df['content_text']
                            .str.replace(pattern,
                                         lambda m: s.get(m.group(0)),
                                         regex=True)
                           )

# Save the updated large spreadsheet
df.to_excel('updatedmacros.xlsx', index=False)

Example:
What I have

macro_id content_text
5678111123 The Department of Health
5678114567 All personnel reporting late should call the 5678114568
5678114568 Front desk phone line at 1-555-555-5555
5678112222 Patients with a fever 300 should report to 5678114568 and contact 5678111123

What should return

macro_id content_text
5678111123 The Department of Health
5678114567 All personnel reporting late should call the Front desk phone line at 1-555-555-5555
5678114568 Front desk phone line at 1-555-555-5555
5678112222 Patients with a fever 300 should report to Front desk phone line at 1-555-555-5555 and contact The Department of Health

I’m using Visual Studio 2019 and sometimes when I run the code I get at least 8 rows that didn’t convert and it’s gotten up to 186.

I’ve spot checked multiple macro_ids and confirmed they are valid and tie to text. The values on the original sheet were loading directly from a Power Query and I’ve played with changing the field type from number to text which is where I see the fluctuations in rows that didn’t convert. I’ve pasted the values as text into separate sheets for xlsx and csv formatting but I never get 100% replacement. What could I be doing wrong?

Theme wordpress giá rẻ Theme wordpress giá rẻ Thiết kế website

LEAVE A COMMENT