Skip to content

Commit

Permalink
Merge branch 'main' into FixViewer
Browse files Browse the repository at this point in the history
  • Loading branch information
aaravpandya committed May 4, 2024
2 parents 7689d24 + c456e50 commit 70dbc0d
Show file tree
Hide file tree
Showing 4 changed files with 604 additions and 126 deletions.
200 changes: 108 additions & 92 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,54 +3,6 @@ GPUDrive

This is an Batch RL environment simulator of Nocturne built on the [Madrona Engine](https://madrona-engine.github.io). It supports loading multiple worlds with multi agent support. Python bindings are exposed for easy integration with RL algorithms.

The Environment and Learning Task
--------------

The codebase trains a shared policy that controls agents individually with direct engine inputs rather than pixel observations. Agents interact with the simulator as follows:

### Action Space:
* Acceleration: Continuous float values for acceleration applied to the agents.
* Steering angle: Continuous float values for steering angle applied according to the bicycle kinematic model.
* Heading angle (currently unused): Continuous float values for heading angle. This controls where the agent is looking.

### Observation Space:

**SelfObservation**

The `SelfObservation` tensor of shape `(5,)` for each agent provides information about the agent's own state. The respective values are

- `SelfObservation[0]`: Represents the current *speed* of the agent.
- `SelfObservation[1:3]` : *length* and *width* of the agent.
- `SelfObservation[3:5]`: *Coordinates (x,y)* of the goal relative to the agent.

**PartnerObservation**

The `PartnerObservation` tensor of shape `(num_agents-1,7)` for each agent provides information about other agents in the range `params.observationRadius`. All the values in this tensor are *relative to the ego agent*. The respective values for each `PartnerObservation` are

- `PartnerObservation[0]`: The *speed* of the observed neighboring agent.
- `PartnerObservation[1:3]`: The *position (x,y)* of the observed neighbouring agent.
- `PartnerObservation[3]`: The *orientation* of the neighboring agent.
- `PartnerObservation[4:6]`: The *length* and *width* of the neighbouring agent.
- `PartnerObservation[6]`: The type of agent.

**AgentMapObservations**

The `AgentMapObservations` tensor of shape (num_road_objs, 4) for each agent provides information about the road objects in the range `params.observationRadius`. All the values in this tensor are *relative to the ego agent*. The respective values for each `AgentMapObservations` are

- `AgentMapObservations[0:2]`: The position coordinates for the road object.
- `AgentMapObservations[2]`: The relative orientation of the road object.
- `AgentMapObservations[3]`: The road object type.

**Rewards:**
Agents are rewarded for the max distance achieved along the Y axis (the length of the level). Each step, new reward is assigned if the agents have progressed further in the level, or a small penalty reward is assigned if not.

For specific details about the format of observations, refer to exported ECS components introduced in the [code walkthrough section](#simulator-code-walkthrough-learning-the-madrona-ecs-apis).

Overall the "full simulator" contains logic for three major concerns:
* Procedurally generating a new random level for each episode.
* Time stepping the environment, which includes executing rigid body physics and evaluating game logic in response to agent actions.
* Generating agent observations from the state of the environment, which are communicated as PyTorch tensors to external policy evaluation or learning code.

Build Instructions
--------
First, make sure you have all the dependencies listed [here](https://github.com/shacklettbp/madrona#dependencies) (briefly, recent python and cmake, as well as Xcode or Visual Studio on MacOS or Windows respectively).
Expand All @@ -62,8 +14,6 @@ sudo apt install libx11-dev libxrandr-dev libxinerama-dev libxcursor-dev libxi-d

```

The built-in training functionality requires [PyTorch 2.0](https://pytorch.org/get-started/locally/) or later as well.

Now that you have the required dependencies, fetch the repo (don't forget `--recursive`!):
```bash
git clone --recursive https://github.com/Emerge-Lab/gpudrive.git
Expand All @@ -90,16 +40,6 @@ Now, setup the python components of the repository with `pip`:
pip install -e . # Add -Cpackages.madrona_escape_room.ext-out-dir=PATH_TO_YOUR_BUILD_DIR on Windows
```

You can then view the environment by running:
```bash
./build/viewer
```

Or test the PyTorch training integration:
```bash
python scripts/train.py --num-worlds 1024 --num-updates 100 --ckpt-dir build/ckpts
```

# Poetry install

### Conda
Expand All @@ -111,62 +51,138 @@ conda env create -f environment.yml`
poetry install
```

# Running the simulator

## Dataset
The simulator supports initializing scenes from the Nocturne dataset. The input parameter for the simulator `json_path` takes in a path to a directory containing the files in the Nocturne format. The directory should contain a `valid_files.json` with a list of the files to be initialized.

To control which files get initialized, the input parameter `datasetInitOptions` can be used. The parameter can have the following options -

* `FirstN` - Initialized the first `num_worlds` files from the `valid_files.json`. Ensure that the `valid_files.json` contains atleast `num_worlds` files.
* `RandomN` - Shuffles the files in `valid_files.json` and initializes `num_worlds` randomly. Ensure that the `valid_files.json` contains atleast `num_worlds` files.
* `PadN` - Tries to initialize as many files as available and then pads the initializer list by repeating the first file until `num_worlds` files.
* `ExactN` - Takes exactly `num_worlds` files from the `valid_files.json`. Ensure that the `valid_files.json` contains exactly `num_worlds` files.

## Run the sim
To test that the simulator compiled and installed correctly, change the location of the dataset in `test.py` and run it.

```bash
python scripts/test.py
```

Alternatively, to test if the simulator compiled correctly (and python lib did not), try running the headless program from the build directory. Remember to change the location of the data in `src/headless.cpp` and compiling again before running it.

```bash
cd build
./headless CPU 1 1 # Run on CPU , 1 world, 1 step
```


The Environment and Learning Task
--------------

The codebase trains a shared policy that controls agents individually with direct engine inputs rather than pixel observations. Agents interact with the simulator as follows:

### Action Space:
* Acceleration: Continuous float values for acceleration applied to the agents.
* Steering angle: Continuous float values for steering angle applied according to the bicycle kinematic model.
* Heading angle (currently unused): Continuous float values for heading angle. This controls where the agent is looking.

### Observation Space:

**SelfObservation**

The `SelfObservation` tensor of shape `(5,)` for each agent provides information about the agent's own state. The respective values are
- `SelfObservation[0]`: Represents the current *speed* of the agent.
- `SelfObservation[1:3]` : *length* and *width* of the agent.
- `SelfObservation[3:5]`: *Coordinates (x,y)* of the goal relative to the agent.
- `SelfObservation[5]`: Represents if the agent has collided. Values in `{0,1}`.
**MapObservation**
The `MapObservation` tensor of shape `(4,)` for each agent provides the *absolute* position of map objects. The values are
- `MapObservation[0:2]`: Represents the position of the `MapObject`
- `MapObservation[2]`: Represents the heading angle of the `MapObject`
- `MapObservation[3]`: Represents the type of the Map Object.
**PartnerObservation**
The `PartnerObservation` tensor of shape `(num_agents-1,7)` for each agent provides information about other agents in the range `params.observationRadius`. All the values in this tensor are *relative to the ego agent*. The respective values for each `PartnerObservation` are
- `PartnerObservation[0]`: The *speed* of the observed neighboring agent.
- `PartnerObservation[1:3]`: The *position (x,y)* of the observed neighbouring agent.
- `PartnerObservation[3]`: The *orientation* of the neighboring agent.
- `PartnerObservation[4:6]`: The *length* and *width* of the neighbouring agent.
- `PartnerObservation[6]`: The type of agent.
Simulator Code Walkthrough (Learning the Madrona ECS APIs)
-----------------------------------------------------------
**AgentMapObservations**
As mentioned above, this repo is intended to serve as a tutorial for how to use Madrona to implement a batch simulator for a simple 3D environment. If you're not interested in implementing your own novel environment simulator in Madrona and just want to try training agents, [skip to the next section](#training-agents).
The `AgentMapObservations` tensor of shape (num_road_objs, 4) for each agent provides information about the road objects in the range `params.observationRadius`. All the values in this tensor are *relative to the ego agent*. The respective values for each `AgentMapObservations` are
We assume the reader is familiar with the key concepts of the entity component system (ECS) design pattern. If you are unfamiliar with ECS concepts, we recommend that you check out Sander Mertens' very useful [Entity Components FAQ](https://github.com/SanderMertens/ecs-faq).
- `AgentMapObservations[0:2]`: The position coordinates for the road object.
- `AgentMapObservations[2]`: The relative orientation of the road object.
- `AgentMapObservations[3]`: The road object type.
#### Defining the Simulation's State: Components and Archetypes ####
## Configuring the Sim
The `SimManager` constructor takes in a `params` object that can be used to configure various aspects of the Sim.
The first step to understanding the simulator's implementation is to understand the ECS components that make up the data in the simulation. All the custom logic in the simulation (as well as logic for built-in systems like physics) is written in terms of these data types. Take a look at [`src/types.hpp`](https://github.com/shacklettbp/madrona_escape_room/blob/main/src/types.hpp#L28). This file first defines all the ECS components as simple C++ structs and next declares the ECS archetypes in terms of the components they are composed of. For integration with learning, many of the components of the `Agent` archetype are directly exported as PyTorch tensors. For example, the `Action` component directly correspondes to the action space described above, and `RoomEntityObservations` is the agent observations of all the objects in each room.
### Rewards
* `RewardType`: There are 3 types of rewards that can be exported from the sim.
- `DistanceBased` - Exports the distance of the agent to the goal.
- `OnGoalAchieved` - Exports 1 if the agent has reached the goal, else 0.
- `Dense` - Exports the distance of the agent from its expert trajectory specified in the dataset.
* `distanceToGoalThreshold`: This threshold is used to determine if the agent has reached the goal or not. `Default: 0.0`
* `distanceToExpertThreshold`: This threshold is used to determine agent is following the expert trajectory or not. `Default: 0.0`
#### Defining the Simulation's Logic: Systems and the Task Graph ####
### Road reduction algorithm
To manage the performance and simplify the observation space of the simulator, we apply a polyline reduction algorithm on the road edges, lines and lanes. We use the ['Visvalingam-Whyatt Algorithm'](https://en.wikipedia.org/wiki/Visvalingam%E2%80%93Whyatt_algorithm).
After understanding the ECS components that make up the data of the simulation, the next step is to learn about the ECS systems that operate on these components and implement the custom logic of the simulation. Madrona simulators define a centralized task graph that declares all the systems that need to execute during each simulation step that the Madrona runtime then executes across all the unique worlds in a simulation batch simultaneously for each step. This codebase builds the task graph during initialization in the [`Sim::setupTasks`](https://github.com/shacklettbp/madrona_escape_room/blob/main/src/sim.cpp#L552) function using `TaskGraphBuilder` class provided by Madrona. Take note of all the ECS system functions that `setupTasks` enqueues in the task graph using `ParallelForNode<>` nodes, and match the component types to the components declared you viewed in [`types.hpp`](https://github.com/shacklettbp/madrona_escape_room/blob/main/src/types.hpp). For example, `movementSystem`, added at the beginning of the task graph, implements the custom logic that translates discrete agent actions from the `Action` component into forces for the physics engine. At the end of each step, `collectObservationSystem` reads the simulation state and builds observations for the agent policy.
* `polylineReductionThreshold`: This threshold determines how much reduction is to be applied to the road lines. Ranges from `0` to `+ve inf`. If set to `0`, no reduction will be applied. `Default: 1.0`
At this point for an overview of the whole simulator you can continue to the next section, or for further details, you can continue reading [`src/sim.cpp`](https://github.com/shacklettbp/madrona_escape_room/blob/main/src/sim.cpp) and ['src/sim.hpp](https://github.com/shacklettbp/madrona_escape_room/blob/main/src/sim.hpp) where all the core simulation logic is located with the exception of level generation logic that handles creating new entities and placing them. The level generation logic starts with the [`generateWorld`](https://github.com/shacklettbp/madrona_escape_room/blob/main/src/level_gen.cpp#L558) function in [`src/level_gen.cpp`](https://github.com/shacklettbp/madrona_escape_room/blob/main/src/level_gen.cpp) and is called for each world when a training episode ends.
### Collision Behaviour
For easy troubleshooting and learning various policies, the behaviour of the agents on collisions can be configured.
#### Initializing the Simulator and Interfacing with Python Training Code ####
* `AgentStop`: The agents in collision would simply stop at the point of collisions. No further actions will be applied on these agents.
* `AgentRemoved`: The agents in collision would simply be removed from the scene.
* `Ignore`: The agents in collision still output that they collided, but they will continue to move around as if they did not collide.
The final missing pieces of the simulator are how the Madrona backends are initialized and how data communication between PyTorch and the simulator is managed. These pieces are controlled by the `Manager` class in [`src/mgr.hpp`](https://github.com/shacklettbp/madrona_escape_room/blob/main/src/mgr.hpp) and [`src/mgr.cpp`](https://github.com/shacklettbp/madrona_escape_room/blob/main/src/mgr.cpp). During initialization, the `Manager` constructor is passed an `ExecMode` object from pytorch that dictates whether the CPU or CUDA backends should be initialized. The `Manager` class then loads physics assets off disk (copying them to the GPU if needed) and then initializes the appropriate backend. Once initialization is complete, the python code can access simulation state through the `Manager`'s exported PyTorch tensors (for example, `Manager::rewardTensor`) via the python bindings declared in [`src/bindings.cpp`](https://github.com/shacklettbp/madrona_escape_room/blob/main/src/mgr.cpp). These bindings are just a thin wrapper around the `Manager` class using [`nanobind`](https://github.com/wjakob/nanobind).
### Misc params
#### Visualizing Simulation Output ####
* `ObservationRadius` : The `ObservationRadius` defines the radius of the circle within which an agent can observe its surrounding. The outputs in observation are set to invalid type for the objects not in the `ObservationRadius` and its observations are zeroed out.
* `MaxNumControlledVehicles` : This param controls how many maximum agents can be controlled in the sim. Specifically, we try to initialize as many controlled agents as possible. However, a particular file may have lesser valid agents, in which case certain worlds may not have as many controlled agents. We pick the first `MaxNumControlledVehicles` **valid** agents to control, and the rest are controlled via their expert trajectories.
The code that integrates with our visualization infrastructure is located in [`src/viewer.cpp`](https://github.com/shacklettbp/madrona_escape_room/blob/main/src/viewer.cpp). This code links with the `Manager` class and produces the `viewer` binary in the build directory that lets you control the agents directly and replay actions. More customization in the viewer code to support custom UI and overlays will be supported in the future.
### Types of objects
Training Agents
--------------------------------
* Road types:
- `RoadEdge`
- `RoadLine`
- `RoadLane`
- `CrossWalk`
- `SpeedBump`
- `StopSign`
* Agent types:
- `Vehicle`
- `Pedestrian`
- `Cyclist`
In addition to the simulator itself, this repo contains a simple PPO implementation (in PyTorch) to demonstrate how to integrate a training codebase with a Madrona batch simulator. [`scripts/train.py`](https://github.com/shacklettbp/madrona_escape_room/blob/main/scripts/train.py) is the training code entry point, while the bulk of the PPO implementation is in [train_src/madrona_escape_room_learn](https://github.com/shacklettbp/madrona_escape_room/blob/main/train_src/madrona_escape_room_learn).
* `Padding` type - This is a special type that is used to pad entities in different worlds to ensure consistent output shapes.
For example, the following settings will produce agents that should be able to solve all three rooms fairly consistently:
```bash
python scripts/train.py --num-worlds 8192 --num-updates 5000 --profile-report --fp16 --gpu-sim --ckpt-dir build/checkpoints/
```
* `None` type - This is a special type that is used to mark entities as invalid and should not be considered for learning. It can arise if an entity is outside `ObservationRadius` or if the entity collided and the collision behaviour is set to `AgentRemoved`.
If your machine doesn't support the GPU backend, simply remove the `--gpu-sim` argument above and consider reducing the `--num-worlds` argument to reduce the batch size.
After 5000 updates, the policy should have finished training. You can run the policy and record a set of actions with:
```bash
python scripts/infer.py --num-worlds 1 --num-steps 1000 --fp16 --ckpt-path build/checkpoints/5000.pth --action-dump-path build/dumped_actions
```
## Testing
To run tests, simply run the following
Finally, you can replay these actions in the `viewer` program to see how your agents behave:
```bash
./build/viewer 1 --cpu build/dumped_actions
cd build
ctest
```
Hold down right click and use WASD to fly around the environment, or use controls in the UI to following a viewer in first-person mode. Hopefully your agents perform similarly to those in the video at the start of this README!
Note that the hyperparameters chosen in scripts/train.py are likely non-optimal. Let us know if you find ones that train faster.
Citation
--------
If you use Madrona in a research project, please cite our SIGGRAPH paper.
This batch-simulator is made in thanks to Madrona Engine.
```
@article{...,
Expand Down
Loading

0 comments on commit 70dbc0d

Please sign in to comment.