Apply Labels Based On Thresholds Defined In One Dataframe To Another Dataframe

  Kiến thức lập trình

I have two pandas dataframes, and I’m looking to find a faster way to apply the labels defined in the first dataframe to the rows in the second:

  • df1: The first dataframe has thresholds for a number of categories along with associated labels
id Category A Category B Upper Bound Lower Bound Label
1 Red One 1 3 X
2 Yellow Two 3 5 Y
3 Blue Three 5 7 Z
  • df2: The second has the raw data values that need to be labeled according to the thresholds in the first table. If Value in df2 is between Upper Bound and Lower Bound in df1, then apply the appropriate label to that row (assuming match on Cat A and B).
id Category A Category B Value
1 Red One 2
2 Yellow Two 4
3 Blue Three 6

After the labels are applied from the first to the second, the resulting dataframe df2 should look like:

id Category A Category B Value Label
1 Red One 2 X
2 Yellow Two 4 Y
3 Blue Three 6 Z

I’m currently able to do this using df.apply and iterrows like follows:

def applyLabels(row, dfLabel,_):
    for index, rowLabel in dfLabel.iterrows():
        if row[Category A] == rowLabel[Category A] & row[Value] between Upper and Lower Bound...
            return dfLabel.at[index, 'Label']

df2['Label'] = df2.apply(applyLabels, args=(df2,_), axis=1)

But this is taking forever (I’m working w/ ~10 million rows) – is there a faster way to achieve this result without exporting this whole thing to SQL?

Theme wordpress giá rẻ Theme wordpress giá rẻ Thiết kế website

LEAVE A COMMENT