Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set operations fail with filter #436

Open
DSLituiev opened this issue Aug 12, 2022 · 4 comments
Open

set operations fail with filter #436

DSLituiev opened this issue Aug 12, 2022 · 4 comments

Comments

@DSLituiev
Copy link

I am trying to use siuba for filtering on a set, and it seems to fail badly:

mtcars >> filter(_.cyl in {4, 5}) returns nothing, while mtcars >> filter(_.cyl == 4) works

@machow
Copy link
Owner

machow commented Aug 12, 2022

Hey, thanks for raising --
Are you trying to do the equivalent of R's %in%? You can do this using the pandas .isin() method.

from siuba.data import mtcars

mtcars.cyl.isin([4, 5])

So for siuba verbs it would be this:

from siuba import _, filter
from siuba.data import mtcars

mtcars >> filter(_.cyl.isin([4, 5]))

Sorry for the weird R -> python situation, I'm actively working on pushing out new siuba docs that walk through situations like these numeric python quirks:

  • & instead of and
  • | instead of or
  • how you should always use parentheses like (_.cyl == 4) | (_.cyl == 6)

@DSLituiev
Copy link
Author

Hi Michael,
I'm looking for an equivalent of Python x in set(...) analogue, e.g.:

mtcars.cyl.map(lambda x: x in [4, 5])

This typically works with lambda functions, so I assumed that your package would vectorize it

@machow
Copy link
Owner

machow commented Aug 15, 2022

You could use .map with siuba, but AFAICT that code in pandas will be a slower version of .isin()

mtcars >> filter(_.cyl.map(lambda x: x in [4, 5]))

Does that do what you're looking for? If there's a case where .isin() won't solve your problem, that might help me get a feel for the issue.

@DSLituiev
Copy link
Author

DSLituiev commented Aug 16, 2022

isin would definitely solve it. I would just naïvely assume that mtcars >> filter(_.cyl in {4, 5}) would do the job. Not that I am unhappy with isin results or performance, it is just less intuitive

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants