Sum Pandas Dataframe rows based on multiple conditions

Scenario: I have a dataframe in which I want to sum rows inside a column, based on values of others columns.

DF Example:

+----------+----------------------------+---------------------+---------+
| SCENARIO |          PRODUCT           |        FLOW         |  2030   |
+----------+----------------------------+---------------------+---------+
| Stated   | Natural gas: unabated      | Total energy supply | 147.541 |
| Stated   | Natural gas: with CCUS     | Power sector inputs |   0.018 |
| Stated   | Natural gas: with CCUS     | Total energy supply |   1.147 |
| Stated   | Nuclear                    | Power sector inputs |  36.773 |
| Stated   | Nuclear                    | Total energy supply |  36.773 |
| Stated   | Oil                        | Power sector inputs |   5.162 |
| Stated   | Oil                        | Total energy supply | 194.938 |
| Stated   | Renewables                 | Power sector inputs |  76.612 |
| Stated   | Renewables                 | Total energy supply | 119.979 |
| Stated   | Solar                      | Total energy supply |   22.66 |
| Stated   | Solar PV                   | Power sector inputs |  19.457 |
| Stated   | Total                      | Power sector inputs |   263.2 |
| Stated   | Total                      | Total energy supply | 667.905 |
| Stated   | Traditional use of biomass | Total energy supply |  18.975 |
| Stated   | Wind                       | Power sector inputs |  18.823 |
| Stated   | Wind                       | Total energy supply |  18.823 |
+----------+----------------------------+---------------------+---------+

Wanted Result: I want to sum row value of a given column (in this case 2030) based on conditions for other columns. For example: when Scenario = Stated, Flow = Total Energy Supply, I want to sum the values of 2030 where Product is Wind and Solar, so in this case, my result would be a new row in which the value on column 2030 is 22.66 (Solar value) + 18.823 (Wind value), or 41.483. Which would look like:

+----------+----------------------------+---------------------+---------+
| SCENARIO |          PRODUCT           |        FLOW         |  2030   |
+----------+----------------------------+---------------------+---------+
| Stated   | Natural gas: unabated      | Total energy supply | 147.541 |
| Stated   | Natural gas: with CCUS     | Power sector inputs |   0.018 |
| Stated   | Natural gas: with CCUS     | Total energy supply |   1.147 |
| Stated   | Nuclear                    | Power sector inputs |  36.773 |
| Stated   | Nuclear                    | Total energy supply |  36.773 |
| Stated   | Oil                        | Power sector inputs |   5.162 |
| Stated   | Oil                        | Total energy supply | 194.938 |
| Stated   | Renewables                 | Power sector inputs |  76.612 |
| Stated   | Renewables                 | Total energy supply | 119.979 |
| Stated   | Solar                      | Total energy supply |   22.66 |
| Stated   | Solar PV                   | Power sector inputs |  19.457 |
| Stated   | Total                      | Power sector inputs |   263.2 |
| Stated   | Total                      | Total energy supply | 667.905 |
| Stated   | Traditional use of biomass | Total energy supply |  18.975 |
| Stated   | Wind                       | Power sector inputs |  18.823 |
| Stated   | Wind                       | Total energy supply |  18.823 |
| Stated   | result_sum                 |                     |  41.483 |
+----------+----------------------------+---------------------+---------+

Issue: While trying different combinations of the df.sum function, I could not get the expected results. If I try to use an operator (e.g. ^ or |) like:

test3 = test3.loc[(test3['SCENARIO'] == 'Stated') & (test3['FLOW'] == 'Total energy supply') & (test3['PRODUCT'] == 'Solar' ^ 'Wind')]

I get an unsupported type error:

TypeError: unsupported operand type(s) for |: 'str' and 'str'

If I try to use a series of conditions, I am unsure how to do a multiple selection for the necessary FLOW values:

test3['sigcontr_steps'] = test3.loc[(test3['SCENARIO'] == 'Stated') & (test3['FLOW'] == 'Total energy supply') & 
                                    (test3['PRODUCT'] == 'Solar') & (test3['PRODUCT'] == 'Wind'),[2022]].sum(axis=0)

This does not work because of the two values for PRODUCT at the same time.

Question: What is the correct way to perform this operation?

Filed under: Kiến thức lập trình - @ 05:44

Thẻ: pythonpandasdataframe

Thiết kế website giá rẻ

Danh mục

Sum Pandas Dataframe rows based on multiple conditions