Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(data-exploration): data exploration backend API with types #13933

Merged
merged 40 commits into from
Jan 27, 2023

Conversation

mariusandra
Copy link
Collaborator

@mariusandra mariusandra commented Jan 26, 2023

Problem

People are already using our API to get data exploration event responses, even though that backend part is undocumented and not finished.

Changes

  • Disconnects the "events v2" endpoint from the old /api/project/:id/events endpoint. The old endpoint works like it did before. Eventually we can deprecate it.
  • Creates a new /api/project/:id/query endpoint, which accepts any node that matches QuerySchema (so any query).
  • Using pydantic, generate Python classes from the JSON schema (which is generated from TypeScript definitions)
  • Use those new classes in the query endpoint, and in the v2 events query.
  • Replace the type HogQLExpression with just string, as it was causing trouble in the backend.
  • Fix a bug where if you removed the column that was used for sorting, the query would fail if there were aggregations, as it would keep the old sort column.
  • Move some test_event.py tests into test_query.py without changing the content
  • I kept the old and new events query with the same old query_event_list.py file. I'd like to still refactor it substantially, but the point now was to do the smallest change on the query itself.

How did you test this code?

Tested in the interface locally, and moved some tests around to the new format.

@mariusandra mariusandra changed the title feat(data-exploration): backend json schema validation feat(data-exploration): data exploration backend API with types Jan 26, 2023
@mariusandra mariusandra marked this pull request as ready for review January 26, 2023 18:06
@mariusandra
Copy link
Collaborator Author

Ready for a better look.

Things I already know to do... likely in a followup:

  • Fix the kludge with returning json
  • CI script that breaks the build if running schema:build results in changed files

@mariusandra mariusandra requested a review from macobo January 27, 2023 09:29
* This file acts as the source of truth for:
*
* - frontend/src/queries/schema.json
* - generated from typescript via "pnpm run generate:schema:json"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If schema.json and schema.py are only generated from this file, do we need to commit them? I can see this being a bit easier, but I'm a bit worried about them getting of out of sync as it's possible to commit changes to the wrong file. Plus they're really big. Could be compiled in package.json's prepare

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Says the guy who just committed gigabytes of PNGs to the repo :D.

Just like snapshots, I'm afraid if we don't commit them to git, we won't see any unexpected changes in the schema. Since this is a very important schema that'll be used to generate public facing docs, tooling for users, etc, I'd prefer to not take any chances.

It also makes development easier, if the files are already there. Unrelated, I've also been playing around with a Kea typegen inline types mode that would also commit the types into the repo... for clean, fast and reproducible builds.

Also, I still want to add a github action that throws a wrench when the schema is out of sync.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the kea types ship with git would help a lot for reviews etc. where switching branches takes a bit of time at the moment, so +1 for a solution there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't schema.ts the snapshot in this case? The derived .json and .py versions ones are just distractions in a code review.
But fair enough, should be okay IF we have a wrench-throwing action.

posthog/api/event.py Show resolved Hide resolved
posthog/api/query.py Show resolved Hide resolved
posthog/api/query.py Outdated Show resolved Hide resolved
offset: int = 0,
) -> Union[List, EventsQueryResponse]:
) -> List:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A List of what?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... this is how it was 😅

posthog/models/event/query_event_list.py Outdated Show resolved Hide resolved
posthog/models/event/query_event_list.py Outdated Show resolved Hide resolved
posthog/models/event/query_event_list.py Outdated Show resolved Hide resolved
@@ -168,7 +216,10 @@ def query_events_list(
fragment = fragment[1:]
order_by_list.append(translate_hogql(fragment, hogql_context) + " " + order_direction)
else:
order_by_list.append(select_columns[0] + " ASC")
if "count(*)" in select_columns:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is count() a valid option too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not right now, as this is the generated output SQL where count() hogql -> count(*) clickhouse sql... but since count() is also valid clickhouse SQL, and we might accidentally change this down the line, I added both formats here just to future proof.

)


def query_events_list_v2(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like run_data_table_query? This is more than an "events list v2"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to run_events_query, as a Team and an EventsQuery are its only inputs. I'd move this to a new file as well if I'd make all the changes I can foresee...

Copy link
Contributor

@thmsobrmlr thmsobrmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! Am I right that we'd write types for the backend response in schema.ts as well and thus have full type checking for the frontend <-> backend interaction? (will ask this at standup :))

Leaving the backend review to more knowledgeable people :)

@mariusandra mariusandra requested a review from Twixes January 27, 2023 12:03
Copy link
Member

@Twixes Twixes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ but let's make sure the schemas don't diverge with some PR check.

@mariusandra
Copy link
Collaborator Author

Thanks! Going to merge this in to reduce the number of spinning plates... and add one more with that CI check.

@mariusandra mariusandra merged commit 67453e5 into master Jan 27, 2023
@mariusandra mariusandra deleted the backend-queries branch January 27, 2023 14:30
tomasfarias pushed a commit that referenced this pull request Jan 31, 2023
* feat(data-exploration): backend json schema validation

* Update snapshots

* Update snapshots

* Update snapshots

* Update snapshots

* Update snapshots

* Update snapshots

* Update snapshots

* Update snapshots

* refactor

* switch to pydantic

* split into v2 events list query

* add backend query endpoint

* Update snapshots

* make events query work

* bring back HogQLExpression, inline all __root__ only classes in python, remove timestamp in old events query

* Update snapshots

* Update snapshots

* schema and mypy

* Update snapshots

* Update snapshots

* restore /fake/ orderBy (only "timestamp" and "-timestamp" are supported)

* Update snapshots

* Update snapshots

* Update snapshots

* raise if invalid json

* Update snapshots

* remove __root__ hack

* improve comments, in and out of quotes

* shrug

* future proofing

* comment

* rename to "run_events_query"

* simplify

Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants