[Feature] Terminated/truncated support and Gymnasium wrapper #143

LukasSchaefer · 2024-09-17T17:01:32Z

The current VMAS implementation supports the OpenAI Gym interface but not the new and still maintained Gymnasium interface. This was already raised in an issue #61 before.

This PR adds both a wrapper that implements the Gymnasium interface for VMAS, and native Gymnasium interface for the VMAS environment via the legacy_gym=False argument. By default, the default and previous Gym interface is maintained for backwards compatibility.

Small quality of life function to allow the make_env function to receive the wrapper name (Gymnasium, Gym, RLLib) as a string argument instead of a wrapper object only.

I have tested the interactive environment interface and ensured that (by default with legacy_gym=True) VMAS training of BenchMARL still runs as documented.

I'm happy to do any further changes as requested to make sure all works fine so let me know if you have any feedback!

fixes #61

bc-breaking changes: env.unwrapped() -> env.unwrapped in gym wrapper

matteobettini

Hey Lukas!!

Thanks a million for this, it is very cool to have it and i think it will help many users. it is about time vmas allowed the option of truncated/terminated

A few high-level tenets we should keep in mind:

vmas is currently depending on gym only for the specs. I think those are fine even if the library is unmaintained. I would like to avoid adding a core gymnasium dependency and keep the old specs. Gymnasium can be an optional dependency and its wrapper can handle the spec conversion
i would like to keep the vmas environment separated from the gym/gymnasium way of handling things. The only change i think we need in the vmas env interface is the terminated/truncated one, I would keep the rest as it was. The flag to get terminated and truncated instead of done can be called terminated_truncated instead of legacy_gym and be false by default
it would be cool if we could support vectorization in the gymnasium wrapper (maybe using numpy) do they have no way of doing this?

matteobettini · 2024-09-18T08:10:59Z

requirements.txt

@@ -2,4 +2,5 @@ numpy
 torch
 pyglet<=1.5.27
 gym
+gymnasium


Suggested change

gymnasium

matteobettini · 2024-09-18T08:16:12Z

vmas/simulator/environment/environment.py

@@ -45,12 +44,12 @@ def __init__(
        multidiscrete_actions: bool = False,
        clamp_actions: bool = False,
        grad_enabled: bool = False,
+        legacy_gym: bool = True,
+        render_mode: str = "human",


Suggested change

render_mode: str = "human",

matteobettini · 2024-09-18T08:18:04Z

vmas/simulator/environment/environment.py

+        if not self.legacy_gym:
+            # for gymnasium compatibility, return info
+            return_info = True


Suggested change

if not self.legacy_gym:

# for gymnasium compatibility, return info

return_info = True

The functionality of vmas reset should remian the same, if users want the info they can request it from the args. The gymnasium wrapper can do this (torchrl also does this)

matteobettini · 2024-09-18T08:19:26Z

vmas/simulator/environment/environment.py

@@ -77,6 +80,7 @@ def __init__(
        self.headless = None
        self.visible_display = None
        self.text_lines = None
+        self.render_mode = render_mode


Suggested change

self.render_mode = render_mode

The render mode of vmas envs is decided dynamically at each render call

matteobettini · 2024-09-18T08:19:57Z

vmas/simulator/environment/environment.py

+        if self.legacy_gym:
+            observations = self.reset(seed=seed)
+        else:
+            observations, _ = self.reset(seed=seed)


Suggested change

if self.legacy_gym:

observations = self.reset(seed=seed)

else:

observations, _ = self.reset(seed=seed)

observations = self.reset(seed=seed)

matteobettini · 2024-09-18T08:35:12Z

README.md

@@ -154,6 +153,8 @@ Here is an example:
 ```
 A further example that you can run is contained in `use_vmas_env.py` in the `examples` directory.

+To use an environment with the Gymnasium interface, give the additional `legacy_gym=False` argument.


We can explain above what the terminated_truncated option does

matteobettini · 2024-09-18T08:35:26Z

README.md

@@ -133,8 +133,7 @@ pip install pytest pyyaml pytest-instafail tqdm

 To use the simulator, simply create an environment by passing the name of the scenario
 you want (from the `scenarios` folder) to the `make_env` function.
-The function arguments are explained in the documentation. The function returns an environment
-object with the OpenAI gym interface:
+The function arguments are explained in the documentation. The function returns an environment object with the OpenAI Gym interface:


Suggested change

The function arguments are explained in the documentation. The function returns an environment object with the OpenAI Gym interface:

The function arguments are explained in the documentation. The function returns an environment object with the VMAS interface:

matteobettini · 2024-09-18T08:36:22Z

README.md

@@ -143,7 +142,7 @@ Here is an example:
        num_envs=32,
        device="cpu", # Or "cuda" for GPU
        continuous_actions=True,
-        wrapper=None,  # One of: None, vmas.Wrapper.RLLIB, and vmas.Wrapper.GYM
+        wrapper=None,  # One of: None, vmas.Wrapper.RLLIB or "rllib", and vmas.Wrapper.GYM or "gym", and vmas.Wrapper.GYMNASIUM or "gymnasium"


Suggested change

wrapper=None, # One of: None, vmas.Wrapper.RLLIB or "rllib", and vmas.Wrapper.GYM or "gym", and vmas.Wrapper.GYMNASIUM or "gymnasium"

wrapper=None, # One of: None, "rllib", "gym", "gymnasium"

matteobettini · 2024-09-18T08:37:01Z

README.md

@@ -176,7 +177,7 @@ on how to run MAPPO-IPPO-MADDPG-QMIX-VDN using the [VMAS wrapper](https://github

 ### Input and output spaces

-VMAS uses gym spaces for input and output spaces. 
+VMAS uses gym (or gymnasium if `legacy_gym=False`) spaces for input and output spaces. 


Suggested change

VMAS uses gym (or gymnasium if `legacy_gym=False`) spaces for input and output spaces.

VMAS uses gym spaces for input and output spaces.

matteobettini · 2024-09-18T08:37:36Z

setup.py

@@ -29,6 +29,6 @@ def get_version():
    author="Matteo Bettini",
    author_email="[email protected]",
    packages=find_packages(),
-    install_requires=["numpy", "torch", "pyglet<=1.5.27", "gym", "six"],
+    install_requires=["numpy", "torch", "pyglet<=1.5.27", "gym", "gymnasium", "six"],


we can instead add options for vmas[gymnasium], vmas[rllib] and so on

matteobettini · 2024-09-18T08:54:53Z

Just poking around it seems they support a vector env https://gymnasium.farama.org/api/vector/ interface

maybe we can use this? or have 2 wrappers "gymnasium" and "gymnasium_vector"?

LukasSchaefer · 2024-09-18T09:36:50Z

Hi Matteo,

Thanks for coming back quickly with comments. Some follow-ups so I best understand how this should look like:

vmas is currently depending on gym only for the specs. I think those are fine even if the library is unmaintained. I would like to avoid adding a core gymnasium dependency and keep the old specs. Gymnasium can be an optional dependency and its wrapper can handle the spec conversion

Currently, VMAS uses gym for action/ observation spaces in its underlying environment. Would you like to keep this then and only "convert" those to Gymnasium spaces within the Gymnasium wrapper?

The only change i think we need in the vmas env interface is the terminated/truncated one, I would keep the rest as it was. The flag to get terminated and truncated instead of done can be called terminated_truncated instead of legacy_gym and be false by default

So you'd want the done function to always return two values terminated and truncated? Similarly for the step function and get_from_scenario functions? Should those also return terminated and truncated instead of the previous done? Also, in Gymnasium the reset function returns both the observations and info dictionary. I'm asking since such changes in the interface might require changes in any code using VMAS atm which is why I originally was hesitant to do so. Very happy to go ahead with this though if that's your preference, I agree it's a cleaner solution than merging both interfaces within the environment function.

Just poking around it seems they support a vector env https://gymnasium.farama.org/api/vector/ interface

Yes, Gymnasium has wrappers to run multiple instances of environments in vectorised fashion, either synchronously or asynchronously. However, I think it might even be more efficient to write a vectorised gymnasium wrapper that uses a vectorised VMAS environment instance underneath and converts things to numpy arrays e.g. instead of having multiple gymnasium environments each of which holds a VMAS instance with a single environment only underneath. The latter would likely be notably slower. Let me know what you think and I'll have a go at this later today/ this week!

matteobettini · 2024-09-18T09:44:48Z

Currently, VMAS uses gym for action/ observation spaces in its underlying environment. Would you like to keep this then and only "convert" those to Gymnasium spaces within the Gymnasium wrapper?

exactly

So you'd want the done function to always return two values terminated and truncated? Similarly for the step function and get_from_scenario functions? Should those also return terminated and truncated instead of the previous done? Also, in Gymnasium the reset function returns both the observations and info dictionary. I'm asking since such changes in the interface might require changes in any code using VMAS atm which is why I originally was hesitant to do so. Very happy to go ahead with this though if that's your preference, I agree it's a cleaner solution than merging both interfaces within the environment function.

Nono, I would like to handle it like you have already done it in the PR. The only differences from the PR I am suggesting is that the legacy_gym flag is renamed to terminated_truncated (with swapped values ofc) and it affects just the way the dones are returned (no effect on specs, resets, rendering, and so on).

The way you implemented step and get_from_scenario is pristine and bc-compatible. I am just suggesting a renaming there.

Yes, Gymnasium has wrappers to run multiple instances of environments in vectorised fashion, either synchronously or asynchronously. However, I think it might even be more efficient to write a vectorised gymnasium wrapper that uses a vectorised VMAS environment instance underneath and converts things to numpy arrays e.g. instead of having multiple gymnasium environments each of which holds a VMAS instance with a single environment only underneath. The latter would likely be notably slower. Let me know what you think and I'll have a go at this later today/ this week!

Yes that is what i am referring to: wrap a vmas env (which has multiple subenvs) into a gymnasium vector and then just call .numpy() on the tensors.

If we implement it as a vector of single vmas envs, I would personally resign from my PhD ahaahahah

The question here is if we should still have the single gymnasium env wrapper or not

matteobettini · 2024-09-18T09:53:31Z

So you'd want the done function to always return two values terminated and truncated? Similarly for the step function and get_from_scenario functions? Should those also return terminated and truncated instead of the previous done? Also, in Gymnasium the reset function returns both the observations and info dictionary. I'm asking since such changes in the interface might require changes in any code using VMAS atm which is why I originally was hesitant to do so. Very happy to go ahead with this though if that's your preference, I agree it's a cleaner solution than merging both interfaces within the environment function.

Just a further clarification on this. I would like excatly the opposite:
the only change in the vmas environment class is a flag (by default false) that allows to get terminated and truncated instead of done from the get_from_scenario and step. All the rest of the class should be unchanged as it already possesses the rest of the functionalities gymnasium users could desire

LukasSchaefer · 2024-09-18T10:06:58Z

Gotcha, I think I understood what you mean 👍

I'll rename the environment argument flag as suggested. Just to make sure, for the base VMAS environment class, you'd want the only change induced by the flag to be the different in done/ terminated/ truncated and revert the change in the reset function interface (the new flag does not affect the reset function which only returns observations unless other flags are specified in its arguments).

I'll also have a go at the Gymnasium wrapper. Since gymnasium separates singleton and vectorised environments, I'm tempted to keep these things separate here as well and have separate wrappers for a singleton Gymnasium and vectorised Gymnasium environments.

matteobettini · 2024-09-18T10:09:22Z

Gotcha, I think I understood what you mean 👍

I'll rename the environment argument flag as suggested. Just to make sure, for the base VMAS environment class, you'd want the only change induced by the flag to be the different in done/ terminated/ truncated and revert the change in the reset function interface (the new flag does not affect the reset function which only returns observations unless other flags are specified in its arguments).

I'll also have a go at the Gymnasium wrapper. Since gymnasium separates singleton and vectorised environments, I'm tempted to keep these things separate here as well and have separate wrappers for a singleton Gymnasium and vectorised Gymnasium environments.

Exactly, and then the gymnasium wrapper can call reset with return_info=True. (and can also keep self.render_mode and other gymansium things)

All good on all fronts

matteobettini · 2024-09-18T11:29:13Z

cc @Giovannibriglia since we wanted to implement a StableBaselines3 wrapper, maybe the Gymnasium Vector we will work on here will make it easier to bootstrap the SB3 one

- base VMAS environment uses OpenAI gym spaces - base VMAS environment has new flag `terminated_truncated` (default: False) that determines whether `done()` and `step()` return the default `done` value or separate values for `terminated` and `truncated` - update `gymnasium` wrapper to convert gym spaces of base environment to gymnasium spaces - add `gymnasium_vec` wrapper that can wrap vectorized VMAS environment as gymnasium environment - add new installation options of VMAS for optional dependencies (used for features like rllib, torchrl, gymnasium, rendering, testing) - add `return_numpy` flag in gymnasium wrappers (default: True) to determine whether to convert torch tensors to numpy --> passed through by `make_env` function - add `render_mode` flag in gymnasium wrappers (default: "human") to determine mode to render --> passed through by `make_env` function

LukasSchaefer · 2024-09-18T13:10:06Z

@matteobettini I pushed the updated integration including a vectorized Gymnasium wrapper. I tested things via the provided pytest tests, made sure the BenchMARL integration still works and that shapes behave as anticipated.

Please let me know if there are any further changes you would like to see!

As a note, I slightly modified the make_env function to pass through the return_numpy and render_mode flags I introduced in the Gymnasium wrappers. The latter is to comply with the standard Gymnasium render mode handling, and the former is to allow for returning torch tensors instead of numpy (but default is numpy to comply with standard environment interface of other Gymnasium envs). Alternatively, I could also pass through these arguments as part of the kwargs, but then they would also be fed through to the VMAS environment which might not be desirable. I considered this the cleanest the solution but happy to adjust if you think differently.

matteobettini

Love this and so cool to see the vec wrapper!

I left some comments on the vmas stuff, once all that is settled i will read and test the new wrappers

matteobettini · 2024-09-18T13:19:25Z

README.md

+# install wandb logging dependencies
+pip install vmas[wandb]
+
+# install torchrl dependencies for training with BenchMARL
+pip install vmas[torchrl]
+


Suggested change

# install wandb logging dependencies

pip install vmas[wandb]

# install torchrl dependencies for training with BenchMARL

pip install vmas[torchrl]

I think we can remove these as they are not actual dependencies, same in setup.py

matteobettini · 2024-09-18T13:21:38Z

requirements.txt

@@ -3,3 +3,4 @@ torch
 pyglet<=1.5.27
 gym
 six
+cvxpylayers


Suggested change

cvxpylayers

I defiitely do not want to depend on this library, let's not add it for now, I'll fix the navigation heuristic later. Same in setup

vmas/interactive_rendering.py

matteobettini · 2024-09-18T13:24:27Z

vmas/make_env.py

+    return_numpy: bool = False,
+    render_mode: str = "human",


We should not have them as individual args otherwise in a future with 10+ wrappers we will go crazy. What we can consider is wrapper_kwargs: Optional[Dict] = None

matteobettini · 2024-09-18T13:28:31Z

vmas/simulator/environment/gym.py


    def unwrapped(self) -> Environment:
        return self._env

+    def _ensure_obs_type(self, obs):
+        return obs.detach().cpu().numpy() if self.return_numpy else obs


You can use

VectorizedMultiAgentSimulator/vmas/simulator/utils.py

Line 195 in 26ceb42

def to_numpy(data: Union[Tensor, Dict[str, Tensor], List[Tensor]]):

but here .item() should be fine and better

EDIT maybe not actually cause the obs and the info are arrays, ok then the first thing above should do it

.item() would not work here afaik since these values might not be scalars but tensors.

matteobettini · 2024-09-18T13:36:00Z

vmas/simulator/environment/gymnasium.py

+from typing import List, Optional
+
+import gym
+import gymnasium


Suggested change

import gymnasium

_has_gymnasium = importlib.util.find_spec("gymnasium") is not None

if _has_gymnasium:

import gymnasium

maybe we need to do this also in the rllib file

matteobettini · 2024-09-18T13:36:36Z

vmas/simulator/environment/gymnasium.py

+from vmas.simulator.utils import extract_nested_with_index
+
+
+def _convert_space(space: gym.Space) -> gymnasium.Space:


I really would like to avoid mainitaining this function, does gymnasium not have a conversion tool in their library?

conversion

matteobettini · 2024-09-18T14:49:58Z

vmas/make_env.py

    max_steps: Optional[int] = None,
    seed: Optional[int] = None,
    dict_spaces: bool = False,
    multidiscrete_actions: bool = False,
    clamp_actions: bool = False,
    grad_enabled: bool = False,
+    terminated_truncated: bool = False,
+    wrapper_kwargs: dict = {},


Suggested change

wrapper_kwargs: dict = {},

wrapper_kwargs: Optional[Dict] = None,

let's not use mutables as defaults as in python they casue a lot of trouble

matteobettini · 2024-09-18T14:51:11Z

vmas/simulator/environment/gym.py

@@ -17,18 +17,25 @@ class GymWrapper(gym.Env):
    def __init__(
        self,
        env: Environment,
+        return_numpy: bool = False,
+        **kwargs,


Suggested change

**kwargs,

Here and in the other wrappers it is better to consume all args so that it results in error if users pass the wrong args

matteobettini · 2024-09-18T14:51:57Z

vmas/simulator/environment/gym.py

@@ -17,18 +17,25 @@ class GymWrapper(gym.Env):
    def __init__(
        self,
        env: Environment,
+        return_numpy: bool = False,


Suggested change

return_numpy: bool = False,

return_numpy: bool = True,

I think we can change this to true, I know it is slightly bc-breaking but it might be better aligned with gym, wdyt?

matteobettini · 2024-09-18T14:52:09Z

vmas/simulator/environment/gym.py

    ):
        assert (
            env.num_envs == 1
        ), f"GymEnv wrapper is not vectorised, got env.num_envs: {env.num_envs}"

        self._env = env
+        assert not self._env.terminated_truncated, "GymWrapper is not only compatible with termination and truncation flags. Please set `terminated_truncated=False` in the VMAS environment."


Suggested change

assert not self._env.terminated_truncated, "GymWrapper is not only compatible with termination and truncation flags. Please set `terminated_truncated=False` in the VMAS environment."

assert not self._env.terminated_truncated, "GymWrapper is not compatible with termination and truncation flags. Please set `terminated_truncated=False` in the VMAS environment."

matteobettini · 2024-09-18T14:52:31Z

vmas/simulator/environment/gym.py


    def unwrapped(self) -> Environment:
        return self._env

+    def _ensure_obs_type(self, obs):


should we do it for info too? they are also tensors

matteobettini · 2024-09-18T14:52:59Z

vmas/simulator/environment/gymnasium.py

+        return self._env
+
+    def _ensure_obs_type(self, obs):
+        return obs.detach().cpu().numpy() if self.return_numpy else obs


use vmas util

matteobettini · 2024-09-18T14:54:58Z

vmas/simulator/environment/gymnasium.py

+    def _action_list_to_tensor(self, list_in: List) -> List:
+        assert (
+            len(list_in) == self._env.n_agents
+        ), f"Expecting actions for {self._env.n_agents} agents, got {len(list_in)} actions"
+        actions = []
+        for agent in self._env.agents:
+            actions.append(
+                torch.zeros(
+                    1,
+                    self._env.get_agent_action_size(agent),
+                    device=self._env.device,
+                    dtype=torch.float32,
+                )
+            )
+
+        for i in range(self._env.n_agents):
+            act = torch.tensor(list_in[i], dtype=torch.float32, device=self._env.device)
+            if len(act.shape) == 0:
+                assert (
+                    self._env.get_agent_action_size(self._env.agents[i]) == 1
+                ), f"Action of agent {i} is supposed to be an scalar int"
+            else:
+                assert len(act.shape) == 1 and act.shape[
+                    0
+                ] == self._env.get_agent_action_size(self._env.agents[i]), (
+                    f"Action of agent {i} hase wrong shape: "
+                    f"expected {self._env.get_agent_action_size(self._env.agents[i])}, got {act.shape[0]}"
+                )
+            actions[i][0] = act
+        return actions


For the shared functions, would it make sense to write them once?

- add base VMAS wrapper class for type conversion between tensors and np for singleton and vectorized envs - change default of gym wrapper to return np data - update interactive rendering to be compatible with non gym wrapper class (to preserve tensor types) - add error messages for gymnasium and rllib wrappers without installing first

LukasSchaefer · 2024-09-18T16:42:26Z

@matteobettini Added a new base VMAS wrapper class from which the gym, gymnasium, and vectorized gymnasium wrappers inherit that implements a lot of shared functionality including type conversions before and after feeding data to the environment.

Also made other smaller notifications as per your suggestions (gymnasium/ rllib import warnings, removing kwargs of wrappers and making wrapper kwargs optional to avoid mutable {} default value)

matteobettini

Really like this, I think we are almost done

matteobettini · 2024-09-18T16:58:44Z

vmas/simulator/environment/rllib.py

+    from ray.rllib.utils.typing import EnvActionType, EnvInfoDict, EnvObsType
+else:
+    raise ImportError(
+        "RLLib is not installed. Please install it with `pip install ray[rllib]`."


Suggested change

"RLLib is not installed. Please install it with `pip install ray[rllib]`."

"RLLib is not installed. Please install it with `pip install ray[rllib]<=2.2`."

matteobettini · 2024-09-18T17:16:57Z