Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arr.eval / arr.map_batches / map a function over each series inside an Array #20697

Open
jwhitaker-gridcog opened this issue Jan 14, 2025 · 4 comments
Labels
enhancement New feature or an improvement of an existing feature

Comments

@jwhitaker-gridcog
Copy link

jwhitaker-gridcog commented Jan 14, 2025

Description

Given an array column of array dimension N, how can I map a function over each of the N "series" across the array?

e.g. given

df = pl.DataFrame(
    [pl.Series('a', [[1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3]], dtype=pl.Array(pl.Int32, 3))]
)

how can I map s + s.sum() over each of the N=3 s, to get

pl.DataFrame(
  [pl.Series('a', [[5, 10, 15], [5, 10, 15],  [5, 10, 15], [5, 10, 15]])]
)

The following options don't work:

# maybe this?
df.select(pl.col('a').map_batches(lambda s: s + s.sum())))

# or this?
print(df.select(pl.col('a').list.eval(pl.element() + pl.sum())))

Assuming I'm not missing something obvious, it feels like I'm missing something like

df.select(pl.col('a').arr.map(lambda s: s + s.sum()) # <-- s is an Expr representing, in turn, [1, 1, 1, 1], then [2, 2, 2, 2], then [3, 3, 3, 3]

or

df.select(pl.col('a').arr.map(pl.element() + pl.sum()))
@jwhitaker-gridcog jwhitaker-gridcog added the enhancement New feature or an improvement of an existing feature label Jan 14, 2025
@orlp
Copy link
Collaborator

orlp commented Jan 14, 2025

I believe the correct query for this is simply pl.col.a + pl.col.a.sum() but sum does not yet work for array columns.

@jwhitaker-gridcog
Copy link
Author

What about more general ops - say a is an array of structs. Should pl.col('a').struct.field('key') work?

@orlp
Copy link
Collaborator

orlp commented Jan 14, 2025

@jwhitaker-gridcog That is something else entirely, sum should work because a + b where a, b are arrays works.

We don't have Expr.arr.eval yet but I believe we're open to adding it.

@jwhitaker-gridcog
Copy link
Author

i think it would be a little bit different to list.eval - lists can all be different lengths so it's meaningless to do what I mean here, which is "do something on all arr[0], then on all arr[1], etc". Feels more like a map of some sort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

2 participants