Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frame-level all_horizontal / any_horizontal #20718

Open
hericks opened this issue Jan 14, 2025 · 2 comments
Open

Frame-level all_horizontal / any_horizontal #20718

hericks opened this issue Jan 14, 2025 · 2 comments
Labels
enhancement New feature or an improvement of an existing feature

Comments

@hericks
Copy link

hericks commented Jan 14, 2025

Description

I was recently looking at a question on stackoverflow concerned with checking if all values of a dataframe are True. While answering, I noticed that there are no frame-level methods to compute bitwise AND / OR horizontally across columns (pl.DataFrame.all_horizontal / pl.DataFrame.any_horizontal). This leads to the unpleasant side-effect that the most precise answer to the aforementioned question relies on pl.DataFrame.min_horizontal as follows.

import polars as pl

df = pl.DataFrame({
    "a": [True, True, None],
    "b": [True, True, True],
})

df.fill_null(False).min_horizontal().min()  

In contrast, a more explicit answer using pl.all_horizontal is a bit clumsy as one cannot rely on frame-level methods.

df.fill_null(False).select(pl.all_horizontal(pl.all().all())).item()

Apart from increased API consistency, adding pl.DataFrame.all_horizontal would enable the user to write the following code.

df.fill_null(False).all_horizontal().all()

Hence, I'd propose adding pl.DataFrame.all_horizontal and pl.DataFrame.any_horizontal to polars' public API.

Please let me know what you think. Once signed off, I'd be happy to take a shot at the implementation and open a PR. Thanks!

@hericks hericks added the enhancement New feature or an improvement of an existing feature label Jan 14, 2025
@orlp
Copy link
Collaborator

orlp commented Jan 15, 2025

Just FYI, the optimal solution should never include fill_null but rather use ignore_nulls=False on all().

@hericks
Copy link
Author

hericks commented Jan 15, 2025

@orlp Thanks! Then, the aggregated result becomes None. Would you propose the following?

df.select(pl.all_horizontal(pl.all().all(ignore_nulls=False))).item() != None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

2 participants