I have the following small dataframe that I am passing through a decoder that decodes some byte values in the dataframe:
import pandas as pd
import numpy as np
dict = {'trade_qualifier' : [b'x86', b' ', b' ', b'x02']}
df = pd.DataFrame(dict)
def decode_bytes(df):
col = 'trade_qualifier'
df[col] = df[col].values.astype(np.str_).astype('O')
return df
decode_bytes(df)
The above breaks with:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x86 in position 0: ordinal not in range(128)
I can simply remove the backslashes in the byes so that b'x86'
-> b'x86'
for example.
But I am not sure if this is the correct thing to do?
I suppose a better question may be, what does this line do?:
df[col] = df[col].values.astype(np.str_).astype('O')
My understanding is that this line is changing the type of each value in the col
column to be an np.str
type? Is that correct?
I apologise if this is a very basic question, I am still new to working with bytes.