Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(polars): add Intersection and Difference ops #10623

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

IndexSeek
Copy link
Member

Description of changes

Adds support for Intersection and Difference operations for the Polars backend. The approach uses pl.SQLContext to register the dataframes and query using INTERSECT and EXCEPT, respectively.

I saw that a few other functions used a ctx argument which might be using pl.SQLContext in a similar manner, I wasn't sure if that argument could be used here to avoid some of the boilerplate. Here are some of those functions I am referring to:

@translate.register(ops.InMemoryTable)
def in_memory_table(op, *, ctx, **_):
sql = sg.select(STAR).from_(sg.to_identifier(op.name, quoted=True)).sql(Polars)
return ctx.execute(sql, eager=False)

@translate.register(ops.SQLStringView)
def execute_sql_string_view(op, *, ctx: pl.SQLContext, **kw):
translate(op.child, ctx=ctx, **kw)
return ctx.execute(op.query)
@translate.register(ops.View)
def execute_view(op, *, ctx: pl.SQLContext, **kw):
child = translate(op.child, ctx=ctx, **kw)
ctx.register(op.name, child)
return child

@github-actions github-actions bot added tests Issues or PRs related to tests polars The polars backend labels Dec 27, 2024
@IndexSeek IndexSeek changed the title feat(polars): add Intersection and Difference ops feat(polars): add Intersection and Difference ops Dec 27, 2024
@cpcloud
Copy link
Member

cpcloud commented Dec 28, 2024

@IndexSeek Thanks for the PR as always 😄

I think it's reasonable to reuse the ctx argument in your implementations here rather than constructing a new SQLContext.

@cpcloud cpcloud added this to the 10.0 milestone Dec 28, 2024
@cpcloud cpcloud added the feature Features or general enhancements label Dec 28, 2024
@IndexSeek
Copy link
Member Author

@IndexSeek Thanks for the PR as always 😄

I think it's reasonable to reuse the ctx argument in your implementations here rather than constructing a new SQLContext.

You're welcome! That seems like a good plan. I gave this a try using the ctx from kwargs. I was experimenting with a way to reuse the existing tables as part of the context, but for some reason when I was doing op.left.name it would work standalone but when running tests I would get things like:

AttributeError: 'Sort' object has no attribute 'name'

This was working for me when using a memtable in an IPython shell, but not on the tests.

Do you know if there is a more efficient way than needing to manually register the tables as "left" and "right"?

Here's what I was trying where it seems to work using op.left.name, etc.:

@translate.register(ops.Difference)
def execute_difference(op, **kw):
    ctx = kw.get("ctx")
    sql = (
        sg.select(STAR)
        .from_(sg.to_identifier(op.left.name, quoted=True))
        .except_(sg.select(STAR).from_(sg.to_identifier(op.right.name, quoted=True)))
    )
    result = ctx.execute(sql.sql(Polars), eager=False)
    if op.distinct is True:
        return result.unique()
    return result
In [1]: from ibis.interactive import *
   ...: 
   ...: ibis.set_backend("polars")
   ...: 
   ...: t1 = ibis.memtable({"number_col": [1, 2, 3]})
   ...: t2 = ibis.memtable({"number_col": [2, 3, 4]})
   ...: t1.difference(t2)
Out[1]: 
┏━━━━━━━━━━━━┓
┃ number_col ┃
┡━━━━━━━━━━━━┩
│ int64      │
├────────────┤
│          1 │
└────────────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements polars The polars backend tests Issues or PRs related to tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants