Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New dataset and metadata integration #292

Merged
merged 48 commits into from
Dec 19, 2024
Merged

Conversation

daphne-cornelisse
Copy link
Contributor

@daphne-cornelisse daphne-cornelisse commented Nov 8, 2024

Description

This PR integrates a new version of the dataset with GPUDrive.

Todo's

  • @aaravpandya Verify with Madrona viewer
  • @aaravpandya Review PR
  • @daphnecor Test run on 10 scenarios + videos to verify correctness
  • @nadarenator Fix metadata design
  • @daphnecor Update tutotrials

Copy link
Collaborator

@nadarenator nadarenator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@aaravpandya
Copy link
Collaborator

This PR has breaking changes with the datasets. Please change the dataset before merging in this PR.

Also, can you change the file in tests/pytest_data with the new format? Tests wont work with the old file format.

@daphne-cornelisse
Copy link
Contributor Author

This PR has breaking changes with the datasets. Please change the dataset before merging in this PR.

Also, can you change the file in tests/pytest_data with the new format? Tests wont work with the old file format.

Yes, we will do that this week. Since this PR cannot be merged until we fix the data, I have taken your code and applied it to our integrations/vbd branch temporarily.

@daphne-cornelisse
Copy link
Contributor Author

@nadarenator Let's see if we can update the dataset links in the readme today so we can merge this PR

@nadarenator
Copy link
Collaborator

I think we should hold off on regenerating the dataset until we finish vbd alignment, unless you're confident that the current processing script extracts all required info

@eugenevinitsky
Copy link
Contributor

Is regenerating that expensive? We can just regenerate a small number of files for testing

@nadarenator
Copy link
Collaborator

nadarenator commented Nov 13, 2024

Yes that's what I meant, using a small set for our purposes. But before merging this pr we'd need updated links, which would involve regenerating the whole dataset. And if we find out during vbd integration that we need yet another piece of info from the raw tfrecords we'd have to regenerate the full dataset again.

@nadarenator nadarenator force-pushed the feat/align_data_struct branch from 79f8128 to 46f9df4 Compare December 4, 2024 22:55
@nadarenator
Copy link
Collaborator

Only thing remaining: Regenerate dataset and upload to hf, add link to readme.

@daphne-cornelisse
Copy link
Contributor Author

daphne-cornelisse commented Dec 6, 2024

Only thing remaining: Regenerate dataset and upload to hf, add link to readme.

Can you send me a link to the new data once it's up? I'd like to do a test PPO run

examples/tutorials/01_scenario_loading.ipynb Outdated Show resolved Hide resolved
pygpudrive/datatypes/metadata.py Outdated Show resolved Hide resolved
pygpudrive/datatypes/metadata.py Show resolved Hide resolved
pygpudrive/env/config.py Outdated Show resolved Hide resolved
@nadarenator
Copy link
Collaborator

Only thing remaining: Regenerate dataset and upload to hf, add link to readme.

Can you send me a link to the new data once it's up? I'd like to do a test PPO run

It should be available now on my public directory on greene (the same one I shared before).

@daphne-cornelisse daphne-cornelisse changed the title Add agent ids and remove degrees conversion New dataset and metadata integration Dec 11, 2024
@Emerge-Lab Emerge-Lab deleted a comment from aaravpandya Dec 12, 2024
@nadarenator
Copy link
Collaborator

Would be great if we could redo some of the tutorial notebooks, most of the indexing is outdated, and there's also datatypes now that handle the indexing internally

src/headless.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@aaravpandya aaravpandya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so 3 issues to fix:

  1. Revert headless paths
  2. Remove metadata.id unless this is absolutely required.
  3. Use a hashmap in json serialization to make the code cleaner

src/headless.cpp Outdated Show resolved Hide resolved
src/types.hpp Outdated Show resolved Hide resolved
src/json_serialization.hpp Outdated Show resolved Hide resolved
src/json_serialization.hpp Outdated Show resolved Hide resolved
src/json_serialization.hpp Outdated Show resolved Hide resolved
src/json_serialization.hpp Outdated Show resolved Hide resolved
src/json_serialization.hpp Outdated Show resolved Hide resolved
@aaravpandya
Copy link
Collaborator

I am approving this because I am unable to fix the tests. However, I have done a smell test, visualized the scenarios and reverified the code. I am confident that everything should be working fine. But considering that this feature is blocking work on vbd integration, I think its fine to merge it in.

I will be working on getting the tests to work in a separate PR.

Also, @daphnecor since many people are using GPUDrive, I would recommend bumping the version number of our current release, and putting a "warning/info" card on the readme to notify people that this new version has breaking changes with the previous dataset and that they should download a new version.

@daphne-cornelisse
Copy link
Contributor Author

I am approving this because I am unable to fix the tests. However, I have done a smell test, visualized the scenarios and reverified the code. I am confident that everything should be working fine. But considering that this feature is blocking work on vbd integration, I think its fine to merge it in.

I will be working on getting the tests to work in a separate PR.

Also, @daphnecor since many people are using GPUDrive, I would recommend bumping the version number of our current release, and putting a "warning/info" card on the readme to notify people that this new version has breaking changes with the previous dataset and that they should download a new version.

Great suggestion. I will add the release and description

@daphne-cornelisse daphne-cornelisse merged commit ecb236c into main Dec 19, 2024
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants