Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting an empty DataFrame results in a runtime Polars error #919

Open
eldano opened this issue Jun 7, 2024 · 3 comments
Open

Sorting an empty DataFrame results in a runtime Polars error #919

eldano opened this issue Jun 7, 2024 · 3 comments

Comments

@eldano
Copy link

eldano commented Jun 7, 2024

Attempting to sort a dataframe with groups and no values results in a runtime error

dataframe = DataFrame.new(a: ["a", "b", "c"])

dataframe
|> DataFrame.group_by("a")
|> DataFrame.filter(a == "d")
|> DataFrame.sort_by(a)

Output:

** (RuntimeError) Polars Error: cannot group_by + apply on empty 'DataFrame'
    (explorer 0.8.2) lib/explorer/polars_backend/shared.ex:79: Explorer.PolarsBackend.Shared.apply_dataframe/4
    #cell:2m6ajrb7ypgepmrw:3: (file)
@billylanchantin
Copy link
Contributor

billylanchantin commented Jun 7, 2024

Thanks for the issue!

It appears this may have been an issue on the Polars side that they addressed:

But that fix was released as part of Polars 0.35 (PR 12269):

We've got a later version of Polars, so I'll have to do some more digging later.

@eldano eldano changed the title Sorting and empty DataFrames results in a runtime Polars error Sorting an empty DataFrame results in a runtime Polars error Jun 23, 2024
@ceyhunkerti
Copy link
Contributor

It could be something related with the order of the chained expressions;

# ❗ doesn't work like mentioned in the issue.

df = DF.new(a: ["a", "b", "c"])
|> DF.group_by("a")
|> DF.filter(a == "d")
|> DF.sort_by(a)
# ✔️ This one works
df |> DF.filter(a == "d") |> DF.sort_by(a) |>  DF.group_by("a")

I usually crosscheck with the python api. so; In the latest version of the api
this doesn't work either.

df.group_by("a").filter(pl.lit("a").eq("d")).sort("a")

So my conclusion is, the order of the expressions are important.

@billylanchantin
Copy link
Contributor

@ceyhunkerti I think this should still be permitted. For example:

import Explorer.DataFrame
require Explorer.DataFrame

df = new(a: ["a", "a", "b"])

# Broken
df |> group_by("a") |> filter(a == "d") |> sort_by(a)

# Works
df |> lazy |> group_by("a") |> filter(a == "d") |> sort_by(a) |> compute
# #Explorer.DataFrame<
#   Polars[0 x 1]
#   Groups: ["a"]
#   a string []
# >

AFAICT Polars group_by works a little differently. I believe they require aggregating before continuing work in most cases:

df.group_by("a").filter(False)
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# AttributeError: 'GroupBy' object has no attribute 'filter'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants