I have a dataframe where I want to create a new column to inform the state of a given institution – df[‘ict_state’]. This information is provided in another column of the dataframe – df[‘ict_name’], that comes with more info as a string type. Normally, the information I want is right after the first ‘-‘ of the string. So to get exactly the info about state there’s some treatment to be made.
Example: If the cell contains the string ‘ict1 – rio – laboratorio nuclear’ the new column get the string ‘rio’. If the cell contains the string ‘ict2 – bsb’ the new column get the string ‘bsb’. If the cell contains the string ‘ict3’ new column get the string ‘nan’.
I tried to apply a for loop in a specific column to transform the string of the cell in a list and get the second element of that list.
Example:
matrix_entrada = [(1, 'ictX'), (2, 'ict - rio - estudos nucleares'),
(3, 'ict - ba - laboratorio IA'), (4, 'ict132 - sp')]
df = pd.DataFrame(matrix_entrada, columns=['id','ict_name'])
for ict in df['ict_name']:
if len(ict.split('-'))>1:
df['ict_state'] = ict.split('-')[1]
else:
df['ict_state'] = 'nan'
df
REAL OUTPUT:
id ict_name ict_state
0 1 ictX sp
1 2 ict - rio - estudos nucleares sp
2 3 ict - ba - laboratorio IA sp
3 4 ict132 - sp sp
EXPECTED OUTPUT:
id ict_name ict_state
0 1 ictX nan
1 2 ict - rio - estudos nucleares rio
2 3 ict - ba - laboratorio IA ba
3 4 ict132 - sp sp
What am I doing wrong?