I’m trying to reverse engineer what the following functions are doing in a codebase I am working with at work:
def _helper(df):
return (df.groupby(['a', 'b', 'c'])
.size()
.reset_index()
.rename(columns={0: 'count'}))
def nunique_counts(df, col):
df = _helper(df)
data = df[['a', col]].groupby('a').nunique()[col]
return data
I had always thought that nunique_counts
returns the count of number of unique col
values for each unique value of column a
, but I’m not sure that this is actually the case because other we can just simply do this, i think:
def nunique_counts(df, col):
return df.groupby('a')[col].nunique()
What is the original function attempting to do? Also the count
column that was created in _helper
doesn’t seemt o be used.