Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exposing the fold expressions from Polars #911

Open
mhanberg opened this issue May 20, 2024 · 7 comments
Open

exposing the fold expressions from Polars #911

mhanberg opened this issue May 20, 2024 · 7 comments

Comments

@mhanberg
Copy link
Contributor

Description

Is it possible to expose the folds API from Polars?

I have a problem that I think can be solved via that API (I'm not entirely sure, still a beginner with Explorer).

I can try to think of a version of my problem that I can publicly if needed.

Also, if this API is already exposed and I just missed it... please let me know 😅.

Thanks!

@billylanchantin
Copy link
Contributor

It is not exposed (unless I missed it too!). I think it'd be a great addition. Though it looks like it'd be a good deal of work to add it so it might take a while.

I have a problem that I think can be solved via that API (I'm not entirely sure, still a beginner with Explorer). I can try to think of a version of my problem that I can publicly if needed.

If you want to ask on elixirforum.com, feel free to @- me and I can try to answer. My handle is the same as on GitHub.

@josevalim
Copy link
Member

Oh, I didn't know we had fold. It seems it works with expressions, which means we can use the structure in Explorer.QUery to fold over anything and it will be performant. I don't think it would be that complicated then! My suggestion is to call it reduce_with, to mirror it map_with and friends!

@billylanchantin
Copy link
Contributor

So it seems there's fold_exprs and reduce_exprs. The difference seems to be reduction col-wise vs. row-wise. I think we'd want to include both?

They also have a few exprs pairs like sum and sum_horizontal. Maybe we want to call them reduce_with and reduce_with_horizontal? reduce and fold are basically synonyms to me.

Also looking over the docs, I think there's a lot of potential in exposing many of their exprs:

@josevalim
Copy link
Member

Sorry, I got fold and reduce mixed up. If it is operating on the columns themselves, then we can probably add it to Explorer.Query directly. We already support column traversal via across/query.

I am more interested in the reduce version that works within a single column.

@billylanchantin
Copy link
Contributor

billylanchantin commented May 20, 2024

I am more interested in the reduce version that works within a single column.

Yeah agreed! It'd be super useful in summarise.

We already support column traversal via across/query.

If I'm reading this correctly (I've not confirmed it yet), then the reduce_with_horizontal reduces across the columns:

df = DF.new(a: [1, 2, 3], b: [10, 20, 30], c: [100, 200, 300])

+--------------------------------------------+
| Explorer DataFrame: [rows: 3, columns: 3]  |
+--------------+--------------+--------------+
|      a       |      b       |      c       |
|    <s64>     |    <s64>     |    <s64>     |
+==============+==============+==============+
| 1            | 10           | 100          |
+--------------+--------------+--------------+
| 2            | 20           | 200          |
+--------------+--------------+--------------+
| 3            | 30           | 300          |
+--------------+--------------+--------------+

mutate(df, sum: reduce_horizontal(cols(), 0, fn col, acc ->
  col + acc
end))

+-------------------------------------------+
| Explorer DataFrame: [rows: 3, columns: 4] |
+----------+----------+----------+----------+
|    a     |    b     |    c     |   sum    |
|  <s64>   |  <s64>   |  <s64>   |  <s64>   |
+==========+==========+==========+==========+
| 1        | 10       | 100      | 111      |
+----------+----------+----------+----------+
| 2        | 20       | 200      | 222      |
+----------+----------+----------+----------+
| 3        | 30       | 300      | 333      |
+----------+----------+----------+----------+

Our comprehensions only make the same call to mutate/filter/etc. with different columns more ergonomic. This would let you actually use compute multi-column things.

In fact, I wonder if we could make the :reduce option to for syntactic sugar for this?... 🤔

@josevalim
Copy link
Member

In fact, I wonder if we could make the :reduce option to for syntactic sugar for this?... 🤔

We certainly could but perhaps @cigrainger has ideas on the API for this. @cigrainger, can we "fold" across columns in dplyr?

@jsonbecker
Copy link

The equivalent in dplyr would be accomplished with something like this:

df
|> mutate(sum(c_across(starts_with("Bud")))

It's kind of gross, but quite similar to mutate(df, sum: reduce_horizontal(...))

There used to be a rowwise() wrapper that also felt a bit off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants