Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove padding agents using interface entities #103

Merged
merged 51 commits into from
Sep 21, 2024

Conversation

aaravpandya
Copy link
Collaborator

@aaravpandya aaravpandya commented May 5, 2024

This is a big PR since remove the padding agents impacts almost everything.

We introduce new archetypes AgentInterface and RoadInterface with the components that we want to export. To fill in the observation, we simply use new structs AgentInterfaceEntity and RoadInterfaceEntity that store a a reference to these interface entities. While collecting observations, we can simply get the corresponding arrays and fill the values while ignoring the padded values.

While we make as many entities from these new archetypes as specified by consts, they do not have physics components and are not registered with the BVH. Also, these entities are not iterated through in the task graph. The effect is that the taskgraph is only run for the actual number of agents in the Sim and we simply reference the "export components" by id and fill in the observations.

@aaravpandya aaravpandya marked this pull request as ready for review May 11, 2024 17:30
@aaravpandya
Copy link
Collaborator Author

I am getting incorrect values in this branch while executing on CUDA. The values are exactly correct in every index for CPU exec mode. I am unsure of how to debug this.

What this PR tries to do - Before this change, we simply had Agent and PhysicsEntity. We had to create padding agents and physicsentities to make up for irregular number of entities in the dataset per world. To remove the padding agents, I introduced AgentInterface and RoadInterface archetypes. Now we only create as many agents as are needed per world, but have consts::kMaxAgentCount number of AgentInterface entities. Each agent stores a reference to an AgentInterface using a AgentInterfaceEntity struct. We fill in the observations in the AgentInterface which are then exported, while keeping zeros/ones in the padded AgentInterface entities depending on semantics. This is designed in a similar way as gpu_hideseek but in the reverse manner. The Agent stores the reference to AgentInterface.
The relevant code for entity creation is in level_gen.cpp::343 and the Archetypes are defined in types.hpp::241.
After the Agents are created, we make sure we have consts::kMaxAgentCount number of AgentInterface here - level_gen.cpp::299

How to reproduce the error - Simply run headless on CPU and then CUDA using ./headless CPU 1 for 1 step. Print any tensor (I am printing Done tensor in here as it is very easy to infer as it is a scalar value). The values ouput by the CPU exec mode are correct. The values output by CUDA are not correct.

Also, ctest can be run by changing exec modes. The tests pass on CPU but fail on CUDA.

I am not sure how to debug this or where the error could even reside.

@eugenevinitsky

@aaravpandya
Copy link
Collaborator Author

@shacklettbp Hi Brennan, would you mind taking a look ? I am not sure how to proceed with this issue.

For reference, this is how the output looks like and I think the main issue is that CUDA exec mode is returning different values than CPU. The CPU values are correct.

(madrona) (base) aarav@emerge2-desktop:~/gpudrive/build$ ./headless CPU 1
Done
[ 0 0 1 1 1 0 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ]
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ]
[ 1 0 0 1 1 1 1 1 0 1 1 1 1 0 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ]
FPS 3377.298340
Agent-Normalized FPS 178996.812500
(madrona) (base) aarav@emerge2-desktop:~/gpudrive/build$ ./headless CUDA 1
Initialization finished
Done
[ 0 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 1 0 1 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ]
[ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ]
[ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ]
FPS 1700.869263
Agent-Normalized FPS 0.000000

@shacklettbp
Copy link
Collaborator

You need to sort the new interface entities when using the GPU backend. It doesn't look like you're doing that currently. I assume this could be done once during the initialization task graph.

@eugenevinitsky
Copy link
Contributor

Well that was quick, thanks!

@aaravpandya aaravpandya merged commit dc160be into main Sep 21, 2024
1 check passed
@aaravpandya aaravpandya deleted the ap_removePaddingAgents branch October 4, 2024 03:21
@aaravpandya aaravpandya linked an issue Oct 9, 2024 that may be closed by this pull request
aaravpandya added a commit that referenced this pull request Oct 23, 2024
* Init remove padding agents

* Testing

* Refactor out the controlled state

* Fix the tests

* Use by reference

* Temp fix for map export

* cleanup

* Fix merge issues

* Fix merge issues

* make consts 6000 to pass tests

* Use init counts for BVH

* Remove unused function

* Pass all tests

* Set controlledstate

* Zero out things

* Zero padded agents out

* Add a test.py for easy debugging

* Better check for map drawing.

* Set agents to 128

* Cycle through files in test.py

* Rename InterfaceEntity to AgentInterfaceEntity

* Only print dones

* Remove debug checks

* Separete out the observation systems

* Sort interfaces

* remove debug script

* Remove debug statements

* Minor improvements

* Info tensor interface matches main

* Correctly export response type and trajectory
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Run exps and merge LIDAR + padding change
4 participants