Finding nunique of a column grouped by another column

  Kiến thức lập trình

I’m trying to reverse engineer what the following functions are doing in a codebase I am working with at work:

def _helper(df):
  return (df.groupby(['a', 'b', 'c'])
          .size()
          .reset_index()
          .rename(columns={0: 'count'}))

def nunique_counts(df, col):
  df   = _helper(df)
  data = df[['a', col]].groupby('a').nunique()[col]
  return data

I had always thought that nunique_counts returns the count of number of unique col values for each unique value of column a, but I’m not sure that this is actually the case because other we can just simply do this, i think:

def nunique_counts(df, col):
  return df.groupby('a')[col].nunique()

What is the original function attempting to do? Also the count column that was created in _helper doesn’t seemt o be used.

LEAVE A COMMENT