-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose the Polars SQL API? #818
Comments
At first, I am -1, unless we have a very strong use case for this. Explorer is meant to be agnostic, and we work hard for this to be the case, but the sql semantics would be necessarily tied to polars here. |
It definitely is a big win for OLAP use cases. I'm not sure I understand why we'd be tied to Polars SQL semantics here -- for an |
In explorer_sql, sql is how you load data into memory. Polars SQL is a language for manipulating an existing in-memory dataframe. So they sit at different levels. And if I use Polars SQL to manipulate a Polars DataFrame, I won't be able to use to manipulate a Explorer SQL dataframe. |
Ah but that's not how I'm imagining I don't see a problem with the SQL syntax being backend dependent -- since it's passing through it would be handled by the backend anyway. This is also how Re: compute and collect from
|
Interesting. That would indeed close the gap, so I am ok with going down this path, although I wonder if we should tackle Explorer SQL first to have a better of how it will all connect. :) |
I'm cool with that! Though I'm pretty adamant that in |
I'm super +1 on SQL interop, and if either exposing the Polars SQL API or working on Explorer SQL are the path forward, then I'm behind them. I admit I'm still unclear on:
Wrapping my head around that those things is my personal barrier to +1 to either specific approach. I'm not against either! I just don't think I have all the facts yet. In particular, I think outlining a few specific use cases with pseudo-code would be instructive. I can't tell, for instance, how lazy frames play with this feature. |
I have just realized that, we don't use the exact same names as polars. So if we expose polars_sql, it would start to feel very janky. For example, nil vs null and probably several others. So ultimately, I believe I am a -1 on this approach. If we want to go down this road, it may make more sense to introduce our own SQL that compiles down to dataframe operations. |
https://crates.io/crates/polars-sql
https://docs.rs/polars-sql/0.36.2/polars_sql/fn.sql_expr.html
https://docs.pola.rs/py-polars/html/reference/sql
I'm not a huge fan of python polars here with registering etc explicitly. It feels very... Spark-ish to me.
I like the way dplyr allows you to just write
sql(...)
. https://dbplyr.tidyverse.org/articles/sql.htmlWe could have
Explorer.DataFrame.sql(df|lf, "statement")
.I'm very open to differing opinions on this!
The text was updated successfully, but these errors were encountered: