Here’s a toy example to illustrate an idea:
import polars as pl
series = pl.Series("x", [0, -1, 1, -1])
def apply_abs_max(series: pl.Series, default: int) -> int:
result = series.abs().max()
if result is None:
return default
else:
if isinstance(result, int):
return result
else:
raise ValueError(f"{result=}, {type(result)=}, expected `int`.")
apply_abs_max(series, -1)
# 1
Suppose I want to generalize apply_abs_max
to apply_expr
:
import polars as pl
series = pl.Series("x", [0, -1, 1, -1])
def apply_expr(series: pl.Series, default: int, expr: pl.Expr) -> int:
raise NotImplementedError()
result = ... # what do I do here to apply `expr` to `series`?
if result is None:
return default
else:
if isinstance(result, int):
return result
else:
raise ValueError(f"{result=}, {type(result)=}, expected `int`.")
apply_expr(series, -1, pl.Expr().abs().max())
apply_expr
is not actually implemented, as you can see above, because I do not know how to apply the input expr
onto series
. How can I go about doing that?
1
you always can use anonymous function wrapper:
def apply_expr(series: pl.Series, default: int, expr_func) -> int:
# raise NotImplementedError()
result = expr_func(series)
if result is None:
return default
else:
if isinstance(result, int):
return result
else:
raise ValueError(f"{result=}, {type(result)=}, expected `int`.")
apply_expr(series, -1, lambda x: x.abs().max())
3
You can’t chain the exprs like that. You can do pl.Expr.max
but when you put the parenthesis there it’s telling python to call it and you get an error about missing arguments.
Setting that aside for a moment, your function would look like this:
def apply_expr(series: pl.Series, default: int, expr: pl.Expr) -> int:
result=series.to_frame().select(expr(pl.first())).item() # type: ignore
if result is None:
return default
elif isinstance(result, int):
return result
raise ValueError(f"result type is {type(result)}, expected int")
and, with it, you can do:
apply_exprs(series, -1, pl.Expr.abs)
If you want to chain them you’d have to put them in a list so your function becomes…
def apply_exprs(series: pl.Series, default: int, expr: pl.Expr | list[pl.Expr]) -> int:
if isinstance(expr, list):
last_expr = expr[-1]
intermediate_exprs = expr[:-1]
for e in intermediate_exprs:
series = series.to_frame().select(e(pl.first())).to_series() # type: ignore
else:
last_expr = expr
Before seeing how to use that function, let’s change the series so the output is more obviously what we expect:
series = pl.Series("x", [0, -1, 1, -3])
apply_exprs(series, -1, [pl.Expr.abs,
pl.Expr.max])
## 3
But why
When you call an Expr’s method (ie. pl.col('a').abs()
) it returns another Expr. Since it’s another Expr, rather than just the method itself, it has all the methods that any Expr has. When you want to refer to the method so that you can pass it to a function, you can’t call it, and because you can’t call it, you can’t chain multiple methods. If you’re not calling it then it’s not returning anything so therefore you can’t chain it either.
You could augment this so that you input something like pl.first().abs().max()
instead of pl.Expr.abs
which would be better because you could do away with the for loop.