-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New dataset and metadata integration #292
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
This PR has breaking changes with the datasets. Please change the dataset before merging in this PR. Also, can you change the file in |
Yes, we will do that this week. Since this PR cannot be merged until we fix the data, I have taken your code and applied it to our integrations/vbd branch temporarily. |
@nadarenator Let's see if we can update the dataset links in the readme today so we can merge this PR |
I think we should hold off on regenerating the dataset until we finish vbd alignment, unless you're confident that the current processing script extracts all required info |
Is regenerating that expensive? We can just regenerate a small number of files for testing |
Yes that's what I meant, using a small set for our purposes. But before merging this pr we'd need updated links, which would involve regenerating the whole dataset. And if we find out during vbd integration that we need yet another piece of info from the raw tfrecords we'd have to regenerate the full dataset again. |
79f8128
to
46f9df4
Compare
Only thing remaining: Regenerate dataset and upload to hf, add link to readme. |
Can you send me a link to the new data once it's up? I'd like to do a test PPO run |
It should be available now on my public directory on greene (the same one I shared before). |
Would be great if we could redo some of the tutorial notebooks, most of the indexing is outdated, and there's also datatypes now that handle the indexing internally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so 3 issues to fix:
- Revert headless paths
- Remove metadata.id unless this is absolutely required.
- Use a hashmap in json serialization to make the code cleaner
…b/gpudrive into feat/align_data_struct
I am approving this because I am unable to fix the tests. However, I have done a smell test, visualized the scenarios and reverified the code. I am confident that everything should be working fine. But considering that this feature is blocking work on vbd integration, I think its fine to merge it in. I will be working on getting the tests to work in a separate PR. Also, @daphnecor since many people are using GPUDrive, I would recommend bumping the version number of our current release, and putting a "warning/info" card on the readme to notify people that this new version has breaking changes with the previous dataset and that they should download a new version. |
Great suggestion. I will add the release and description |
Description
This PR integrates a new version of the dataset with GPUDrive.
Todo's