-
Hi, I found Ibis recently and am very impressed with the concept. I am not a data engineer or data scientist but I must occasionally do some time-series analysis for my job. For small projects I can work with Pandas data frames but to scale up I must access a corporate "data system". This "data system" has ranged from a RDBMS (of which we use at least 2), to a data warehouse, to a cloud data lake, to a cloud data-as-a-service all within the past few years. The idea that I could develop an analysis and then only must change the connection string has tremendous potential. I have been testing with a PostgreSQL database and am unable to get Ibis expressions to match the query performance of an SQL query using subqueries. My data set consists of system metrics. The metrics are indexed and partitioned on time and indexed on host. I would like to determine the most recent uptime for hosts that have been on within the past day. Example data:
Due to my partitions and index, I write subqueries to first filter to the last day, then rank by host to determine the most recent measurement, and then finally drop rows that are NULL. The Ibis expression I have been using:
The output:
Generating an SQL statement with I have also tried: My
How can I get Ibis to NOT include the ‘AT TIME ZONE’ or is there another method to improve performance and get the query to hit my index instead of doing a full table scan? Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
@WR-Long Thanks for opening a discussion! I fixed an issue related to Happy to help poke around query performance a bit. It would be very helpful if you could provide the query you're comparing against as well as some DDL that sets up the indexes you're referring to. |
Beta Was this translation helpful? Give feedback.
-
@cpcloud Thanks for the quick response! Query I was comparing to:
DDL: (I think... I am loading metrics with Telegraf and I believe it created the tables initially. The partitioning is handled by TimescaleDB. So I just nabbed the CREATE Script from pgAdmin.)
I originally thought it was the query layout. Just using |
Beta Was this translation helpful? Give feedback.
-
No real performance difference remains! (Just saw I had one sort different.) Removing AT TIME ZONE is all that is needed for testing the raw SQL for my current use case. I think the Postgres Planner/Optimizer handles the rest but I am admittedly in way over my head already at this point.
William Long
|
Beta Was this translation helpful? Give feedback.
Given that we're no longer generating
AT TIME ZONE
anymore (it'll be in the next release 8.0.0), I'm going to mark this discussion as answered!Thanks again for bringing it up!