I am trying to format the data in a column. It is monetary but I need to match it up with another dataset so I need to format it before it goes into PowerBi.
I am trying to ‘absolute value’ the number then convert it into a string to 2 decimal places
Any idea where I am going wrong with this?
def extract_data(filename, sheet_name):
# DataFrame containing data extracted from Excel sheet
df = pd.read_excel(filename, sheet_name=sheet_name, skiprows=[0])
# Drop rows and columns with all NaN values
df.dropna(how='all', axis=0, inplace=True)
df.dropna(how='all', axis=1, inplace=True)
# Format columns based on their data types
for col in df.columns:
if df[col].dtype == 'object': #Handle text or categorical data
df[col] = df[col].astype(str).str.strip() #Remove leading/trailing whitespace
elif df[col].dtype == 'float64': #Format numeric data
# Format specific columns based on required use case
if col == "Total Cost": #Selecting column based on header value
df[col] = df[col].apply(lambda x: '{:.2f}'.format(abs(x)) if pd.notnull(x) else "")
The last section: “# Format specific columns based on required use case” is the section that is currently returning 0.5 as 0.5 (rather than “0.50” as desired)
3
Thanks for the responses. Looks like there was a formatting issue with PANDAS processing numbers that had too many decimal places (or something similar). I managed to fix this by applying a ’round’ function before any of the other column formatting however this caused rounding issues due to Pythons default “ROUND TO EVEN”, so I have since amended to “ROUND AWAY FROM ZERO”.
You have to import:
from decimal import Decimal, ROUND_HALF_UP
Then the decimal fix looks like this:
for col in df.columns:
if pd.api.types.is_numeric_dtype(df[col]):
df[col] = df[col].apply(lambda x: Decimal(abs(x)).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP))
The whole code for anybody who needs it in the future:
import pandas as pd
from decimal import Decimal, ROUND_HALF_UP
import os
#DEF: Extract Data
def extract_data(filename, sheet_name):
df = pd.read_excel(filename, sheet_name=sheet_name, skiprows=[0])
#Drop rows and columns with all NaN values
df.dropna(how='all', axis=0, inplace=True)
df.dropna(how='all', axis=1, inplace=True)
#Set all columns to 2 decimals (amend 0.01 to change decimal places)
for col in df.columns:
if pd.api.types.is_numeric_dtype(df[col]): #this is an absolute value, concerned this may round negative values incorrectly
df[col] = df[col].apply(lambda x: Decimal(abs(x)).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP))
#To apply it to a specific column
df['Column1'] = df['Column1'].apply(lambda x: Decimal(abs(x)).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP))
return df
#DEF: File location
username = os.path.basename(os.path.expanduser("~"))
filename = os.path.join('C:\Users\', username, Documents, 'Book1.xlsx')
sheet_name = "Sheet1"
#PROCESS: Extract and Print/Save File
data = extract_data(filename, sheet_name)
print(data)
#TESTING: Export to desktop - uncomment the below and comment 'print' statement above
#output_file_path = os.path.join('C:\Users\', username, 'Desktop', 'test1.csv')
#data.to_csv(output_file_path, index=False)