Question

I have a dataframe where I want to create a new column to inform the state of a given institution – df[‘ict_state’]. This information is provided in another column of the dataframe – df[‘ict_name’], that comes with more info as a string type. Normally, the information I want is right after the first ‘-‘ of the string. So to get exactly the info about state there’s some treatment to be made.

Example: If the cell contains the string ‘ict1 – rio – laboratorio nuclear’ the new column get the string ‘rio’. If the cell contains the string ‘ict2 – bsb’ the new column get the string ‘bsb’. If the cell contains the string ‘ict3’ new column get the string ‘nan’.

I tried to apply a for loop in a specific column to transform the string of the cell in a list and get the second element of that list.

Example:

matrix_entrada = [(1, 'ictX'), (2, 'ict - rio - estudos nucleares'),
(3, 'ict - ba - laboratorio IA'), (4, 'ict132 - sp')]
df = pd.DataFrame(matrix_entrada, columns=['id','ict_name'])

for ict in df['ict_name']:
    if len(ict.split('-'))>1:
       df['ict_state'] = ict.split('-')[1]
    else:
        df['ict_state'] = 'nan'

df

REAL OUTPUT:
    id  ict_name                    ict_state
0   1   ictX                            sp
1   2   ict - rio - estudos nucleares   sp
2   3   ict - ba - laboratorio IA   sp
3   4   ict132 - sp                 sp

EXPECTED OUTPUT:

    id  ict_name                    ict_state
0   1   ictX                            nan
1   2   ict - rio - estudos nucleares   rio
2   3   ict - ba - laboratorio IA   ba
3   4   ict132 - sp                 sp

What am I doing wrong?

How do I extract part of a string data from a specific column in pandas?

LEAVE A COMMENT Hủy