Scenario: I have a dataframe in which I want to sum rows inside a column, based on values of others columns.
DF Example:
+----------+----------------------------+---------------------+---------+
| SCENARIO | PRODUCT | FLOW | 2030 |
+----------+----------------------------+---------------------+---------+
| Stated | Natural gas: unabated | Total energy supply | 147.541 |
| Stated | Natural gas: with CCUS | Power sector inputs | 0.018 |
| Stated | Natural gas: with CCUS | Total energy supply | 1.147 |
| Stated | Nuclear | Power sector inputs | 36.773 |
| Stated | Nuclear | Total energy supply | 36.773 |
| Stated | Oil | Power sector inputs | 5.162 |
| Stated | Oil | Total energy supply | 194.938 |
| Stated | Renewables | Power sector inputs | 76.612 |
| Stated | Renewables | Total energy supply | 119.979 |
| Stated | Solar | Total energy supply | 22.66 |
| Stated | Solar PV | Power sector inputs | 19.457 |
| Stated | Total | Power sector inputs | 263.2 |
| Stated | Total | Total energy supply | 667.905 |
| Stated | Traditional use of biomass | Total energy supply | 18.975 |
| Stated | Wind | Power sector inputs | 18.823 |
| Stated | Wind | Total energy supply | 18.823 |
+----------+----------------------------+---------------------+---------+
Wanted Result: I want to sum row value of a given column (in this case 2030) based on conditions for other columns. For example: when Scenario = Stated, Flow = Total Energy Supply, I want to sum the values of 2030 where Product is Wind and Solar, so in this case, my result would be a new row in which the value on column 2030 is 22.66 (Solar value) + 18.823 (Wind value), or 41.483. Which would look like:
+----------+----------------------------+---------------------+---------+
| SCENARIO | PRODUCT | FLOW | 2030 |
+----------+----------------------------+---------------------+---------+
| Stated | Natural gas: unabated | Total energy supply | 147.541 |
| Stated | Natural gas: with CCUS | Power sector inputs | 0.018 |
| Stated | Natural gas: with CCUS | Total energy supply | 1.147 |
| Stated | Nuclear | Power sector inputs | 36.773 |
| Stated | Nuclear | Total energy supply | 36.773 |
| Stated | Oil | Power sector inputs | 5.162 |
| Stated | Oil | Total energy supply | 194.938 |
| Stated | Renewables | Power sector inputs | 76.612 |
| Stated | Renewables | Total energy supply | 119.979 |
| Stated | Solar | Total energy supply | 22.66 |
| Stated | Solar PV | Power sector inputs | 19.457 |
| Stated | Total | Power sector inputs | 263.2 |
| Stated | Total | Total energy supply | 667.905 |
| Stated | Traditional use of biomass | Total energy supply | 18.975 |
| Stated | Wind | Power sector inputs | 18.823 |
| Stated | Wind | Total energy supply | 18.823 |
| Stated | result_sum | | 41.483 |
+----------+----------------------------+---------------------+---------+
Issue: While trying different combinations of the df.sum function, I could not get the expected results. If I try to use an operator (e.g. ^ or |) like:
test3 = test3.loc[(test3['SCENARIO'] == 'Stated') & (test3['FLOW'] == 'Total energy supply') & (test3['PRODUCT'] == 'Solar' ^ 'Wind')]
I get an unsupported type error:
TypeError: unsupported operand type(s) for |: 'str' and 'str'
If I try to use a series of conditions, I am unsure how to do a multiple selection for the necessary FLOW values:
test3['sigcontr_steps'] = test3.loc[(test3['SCENARIO'] == 'Stated') & (test3['FLOW'] == 'Total energy supply') &
(test3['PRODUCT'] == 'Solar') & (test3['PRODUCT'] == 'Wind'),[2022]].sum(axis=0)
This does not work because of the two values for PRODUCT at the same time.
Question: What is the correct way to perform this operation?