Applying a polars Expression to a Polars series

  Kiến thức lập trình

Here’s a toy example to illustrate an idea:

import polars as pl

series = pl.Series("x", [0, -1, 1, -1])


def apply_abs_max(series: pl.Series, default: int) -> int:
    result = series.abs().max()
    if result is None:
        return default
    else:
        if isinstance(result, int):
            return result
        else:
            raise ValueError(f"{result=}, {type(result)=}, expected `int`.")


apply_abs_max(series, -1)
# 1

Suppose I want to generalize apply_abs_max to apply_expr:

import polars as pl

series = pl.Series("x", [0, -1, 1, -1])


def apply_expr(series: pl.Series, default: int, expr: pl.Expr) -> int:
    raise NotImplementedError()
    result = ...  # what do I do here to apply `expr` to `series`?
    if result is None:
        return default
    else:
        if isinstance(result, int):
            return result
        else:
            raise ValueError(f"{result=}, {type(result)=}, expected `int`.")


apply_expr(series, -1, pl.Expr().abs().max())

apply_expr is not actually implemented, as you can see above, because I do not know how to apply the input expr onto series. How can I go about doing that?

1

you always can use anonymous function wrapper:

def apply_expr(series: pl.Series, default: int, expr_func) -> int:
    # raise NotImplementedError()
    result = expr_func(series)
    if result is None:
        return default
    else:
        if isinstance(result, int):
            return result
        else:
            raise ValueError(f"{result=}, {type(result)=}, expected `int`.")


apply_expr(series, -1, lambda x: x.abs().max())

3

You can’t chain the exprs like that. You can do pl.Expr.max but when you put the parenthesis there it’s telling python to call it and you get an error about missing arguments.

Setting that aside for a moment, your function would look like this:

def apply_expr(series: pl.Series, default: int, expr: pl.Expr) -> int:
    result=series.to_frame().select(expr(pl.first())).item() # type: ignore
    if result is None:
        return default
    elif isinstance(result, int):
        return result
    raise ValueError(f"result type is {type(result)}, expected int")

and, with it, you can do:

apply_exprs(series, -1, pl.Expr.abs)

If you want to chain them you’d have to put them in a list so your function becomes…

def apply_exprs(series: pl.Series, default: int, expr: pl.Expr | list[pl.Expr]) -> int:
    if isinstance(expr, list):
        last_expr = expr[-1]
        intermediate_exprs = expr[:-1]
        for e in intermediate_exprs:
            series = series.to_frame().select(e(pl.first())).to_series() # type: ignore
    else:
        last_expr = expr

Before seeing how to use that function, let’s change the series so the output is more obviously what we expect:

series = pl.Series("x", [0, -1, 1, -3])
apply_exprs(series, -1, [pl.Expr.abs,
                        pl.Expr.max])
## 3

But why

When you call an Expr’s method (ie. pl.col('a').abs()) it returns another Expr. Since it’s another Expr, rather than just the method itself, it has all the methods that any Expr has. When you want to refer to the method so that you can pass it to a function, you can’t call it, and because you can’t call it, you can’t chain multiple methods. If you’re not calling it then it’s not returning anything so therefore you can’t chain it either.

You could augment this so that you input something like pl.first().abs().max() instead of pl.Expr.abs which would be better because you could do away with the for loop.

Theme wordpress giá rẻ Theme wordpress giá rẻ Thiết kế website Kho Theme wordpress Kho Theme WP Theme WP

LEAVE A COMMENT