Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Terminated/truncated support and Gymnasium wrapper #143

Merged
merged 19 commits into from
Sep 20, 2024

Conversation

LukasSchaefer
Copy link
Contributor

@LukasSchaefer LukasSchaefer commented Sep 17, 2024

The current VMAS implementation supports the OpenAI Gym interface but not the new and still maintained Gymnasium interface. This was already raised in an issue #61 before.

This PR adds both a wrapper that implements the Gymnasium interface for VMAS, and native Gymnasium interface for the VMAS environment via the legacy_gym=False argument. By default, the default and previous Gym interface is maintained for backwards compatibility.

Small quality of life function to allow the make_env function to receive the wrapper name (Gymnasium, Gym, RLLib) as a string argument instead of a wrapper object only.

I have tested the interactive environment interface and ensured that (by default with legacy_gym=True) VMAS training of BenchMARL still runs as documented.

I'm happy to do any further changes as requested to make sure all works fine so let me know if you have any feedback!

fixes #61

bc-breaking changes: env.unwrapped() -> env.unwrapped in gym wrapper

Copy link
Member

@matteobettini matteobettini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Lukas!!

Thanks a million for this, it is very cool to have it and i think it will help many users. it is about time vmas allowed the option of truncated/terminated

A few high-level tenets we should keep in mind:

  • vmas is currently depending on gym only for the specs. I think those are fine even if the library is unmaintained. I would like to avoid adding a core gymnasium dependency and keep the old specs. Gymnasium can be an optional dependency and its wrapper can handle the spec conversion
  • i would like to keep the vmas environment separated from the gym/gymnasium way of handling things. The only change i think we need in the vmas env interface is the terminated/truncated one, I would keep the rest as it was. The flag to get terminated and truncated instead of done can be called terminated_truncated instead of legacy_gym and be false by default
  • it would be cool if we could support vectorization in the gymnasium wrapper (maybe using numpy) do they have no way of doing this?

requirements.txt Outdated
@@ -2,4 +2,5 @@ numpy
torch
pyglet<=1.5.27
gym
gymnasium
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
gymnasium

@@ -45,12 +44,12 @@ def __init__(
multidiscrete_actions: bool = False,
clamp_actions: bool = False,
grad_enabled: bool = False,
legacy_gym: bool = True,
render_mode: str = "human",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
render_mode: str = "human",

Comment on lines 98 to 100
if not self.legacy_gym:
# for gymnasium compatibility, return info
return_info = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not self.legacy_gym:
# for gymnasium compatibility, return info
return_info = True

The functionality of vmas reset should remian the same, if users want the info they can request it from the args. The gymnasium wrapper can do this (torchrl also does this)

@@ -77,6 +80,7 @@ def __init__(
self.headless = None
self.visible_display = None
self.text_lines = None
self.render_mode = render_mode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.render_mode = render_mode

The render mode of vmas envs is decided dynamically at each render call

Comment on lines 68 to 71
if self.legacy_gym:
observations = self.reset(seed=seed)
else:
observations, _ = self.reset(seed=seed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self.legacy_gym:
observations = self.reset(seed=seed)
else:
observations, _ = self.reset(seed=seed)
observations = self.reset(seed=seed)

README.md Outdated
@@ -154,6 +153,8 @@ Here is an example:
```
A further example that you can run is contained in `use_vmas_env.py` in the `examples` directory.

To use an environment with the Gymnasium interface, give the additional `legacy_gym=False` argument.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can explain above what the terminated_truncated option does

README.md Outdated
@@ -133,8 +133,7 @@ pip install pytest pyyaml pytest-instafail tqdm

To use the simulator, simply create an environment by passing the name of the scenario
you want (from the `scenarios` folder) to the `make_env` function.
The function arguments are explained in the documentation. The function returns an environment
object with the OpenAI gym interface:
The function arguments are explained in the documentation. The function returns an environment object with the OpenAI Gym interface:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The function arguments are explained in the documentation. The function returns an environment object with the OpenAI Gym interface:
The function arguments are explained in the documentation. The function returns an environment object with the VMAS interface:

README.md Outdated
@@ -143,7 +142,7 @@ Here is an example:
num_envs=32,
device="cpu", # Or "cuda" for GPU
continuous_actions=True,
wrapper=None, # One of: None, vmas.Wrapper.RLLIB, and vmas.Wrapper.GYM
wrapper=None, # One of: None, vmas.Wrapper.RLLIB or "rllib", and vmas.Wrapper.GYM or "gym", and vmas.Wrapper.GYMNASIUM or "gymnasium"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
wrapper=None, # One of: None, vmas.Wrapper.RLLIB or "rllib", and vmas.Wrapper.GYM or "gym", and vmas.Wrapper.GYMNASIUM or "gymnasium"
wrapper=None, # One of: None, "rllib", "gym", "gymnasium"

README.md Outdated
@@ -176,7 +177,7 @@ on how to run MAPPO-IPPO-MADDPG-QMIX-VDN using the [VMAS wrapper](https://github

### Input and output spaces

VMAS uses gym spaces for input and output spaces.
VMAS uses gym (or gymnasium if `legacy_gym=False`) spaces for input and output spaces.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VMAS uses gym (or gymnasium if `legacy_gym=False`) spaces for input and output spaces.
VMAS uses gym spaces for input and output spaces.

setup.py Outdated
@@ -29,6 +29,6 @@ def get_version():
author="Matteo Bettini",
author_email="[email protected]",
packages=find_packages(),
install_requires=["numpy", "torch", "pyglet<=1.5.27", "gym", "six"],
install_requires=["numpy", "torch", "pyglet<=1.5.27", "gym", "gymnasium", "six"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can instead add options for vmas[gymnasium], vmas[rllib] and so on

@matteobettini matteobettini added the enhancement New feature or request label Sep 18, 2024
@matteobettini matteobettini changed the title Gymnasium support [Feature] Terminated/truncated support and Gymnasium wrapper Sep 18, 2024
@matteobettini
Copy link
Member

Just poking around it seems they support a vector env https://gymnasium.farama.org/api/vector/ interface

maybe we can use this? or have 2 wrappers "gymnasium" and "gymnasium_vector"?

@LukasSchaefer
Copy link
Contributor Author

Hi Matteo,

Thanks for coming back quickly with comments. Some follow-ups so I best understand how this should look like:

vmas is currently depending on gym only for the specs. I think those are fine even if the library is unmaintained. I would like to avoid adding a core gymnasium dependency and keep the old specs. Gymnasium can be an optional dependency and its wrapper can handle the spec conversion

Currently, VMAS uses gym for action/ observation spaces in its underlying environment. Would you like to keep this then and only "convert" those to Gymnasium spaces within the Gymnasium wrapper?

The only change i think we need in the vmas env interface is the terminated/truncated one, I would keep the rest as it was. The flag to get terminated and truncated instead of done can be called terminated_truncated instead of legacy_gym and be false by default

So you'd want the done function to always return two values terminated and truncated? Similarly for the step function and get_from_scenario functions? Should those also return terminated and truncated instead of the previous done? Also, in Gymnasium the reset function returns both the observations and info dictionary. I'm asking since such changes in the interface might require changes in any code using VMAS atm which is why I originally was hesitant to do so. Very happy to go ahead with this though if that's your preference, I agree it's a cleaner solution than merging both interfaces within the environment function.

Just poking around it seems they support a vector env https://gymnasium.farama.org/api/vector/ interface

Yes, Gymnasium has wrappers to run multiple instances of environments in vectorised fashion, either synchronously or asynchronously. However, I think it might even be more efficient to write a vectorised gymnasium wrapper that uses a vectorised VMAS environment instance underneath and converts things to numpy arrays e.g. instead of having multiple gymnasium environments each of which holds a VMAS instance with a single environment only underneath. The latter would likely be notably slower. Let me know what you think and I'll have a go at this later today/ this week!

@matteobettini
Copy link
Member

matteobettini commented Sep 18, 2024

Currently, VMAS uses gym for action/ observation spaces in its underlying environment. Would you like to keep this then and only "convert" those to Gymnasium spaces within the Gymnasium wrapper?

exactly

So you'd want the done function to always return two values terminated and truncated? Similarly for the step function and get_from_scenario functions? Should those also return terminated and truncated instead of the previous done? Also, in Gymnasium the reset function returns both the observations and info dictionary. I'm asking since such changes in the interface might require changes in any code using VMAS atm which is why I originally was hesitant to do so. Very happy to go ahead with this though if that's your preference, I agree it's a cleaner solution than merging both interfaces within the environment function.

Nono, I would like to handle it like you have already done it in the PR. The only differences from the PR I am suggesting is that the legacy_gym flag is renamed to terminated_truncated (with swapped values ofc) and it affects just the way the dones are returned (no effect on specs, resets, rendering, and so on).

The way you implemented step and get_from_scenario is pristine and bc-compatible. I am just suggesting a renaming there.

Yes, Gymnasium has wrappers to run multiple instances of environments in vectorised fashion, either synchronously or asynchronously. However, I think it might even be more efficient to write a vectorised gymnasium wrapper that uses a vectorised VMAS environment instance underneath and converts things to numpy arrays e.g. instead of having multiple gymnasium environments each of which holds a VMAS instance with a single environment only underneath. The latter would likely be notably slower. Let me know what you think and I'll have a go at this later today/ this week!

Yes that is what i am referring to: wrap a vmas env (which has multiple subenvs) into a gymnasium vector and then just call .numpy() on the tensors.

If we implement it as a vector of single vmas envs, I would personally resign from my PhD ahaahahah

The question here is if we should still have the single gymnasium env wrapper or not

@matteobettini
Copy link
Member

matteobettini commented Sep 18, 2024

So you'd want the done function to always return two values terminated and truncated? Similarly for the step function and get_from_scenario functions? Should those also return terminated and truncated instead of the previous done? Also, in Gymnasium the reset function returns both the observations and info dictionary. I'm asking since such changes in the interface might require changes in any code using VMAS atm which is why I originally was hesitant to do so. Very happy to go ahead with this though if that's your preference, I agree it's a cleaner solution than merging both interfaces within the environment function.

Just a further clarification on this. I would like excatly the opposite:
the only change in the vmas environment class is a flag (by default false) that allows to get terminated and truncated instead of done from the get_from_scenario and step. All the rest of the class should be unchanged as it already possesses the rest of the functionalities gymnasium users could desire

@LukasSchaefer
Copy link
Contributor Author

Gotcha, I think I understood what you mean 👍

I'll rename the environment argument flag as suggested. Just to make sure, for the base VMAS environment class, you'd want the only change induced by the flag to be the different in done/ terminated/ truncated and revert the change in the reset function interface (the new flag does not affect the reset function which only returns observations unless other flags are specified in its arguments).

I'll also have a go at the Gymnasium wrapper. Since gymnasium separates singleton and vectorised environments, I'm tempted to keep these things separate here as well and have separate wrappers for a singleton Gymnasium and vectorised Gymnasium environments.

@matteobettini
Copy link
Member

matteobettini commented Sep 18, 2024

Gotcha, I think I understood what you mean 👍

I'll rename the environment argument flag as suggested. Just to make sure, for the base VMAS environment class, you'd want the only change induced by the flag to be the different in done/ terminated/ truncated and revert the change in the reset function interface (the new flag does not affect the reset function which only returns observations unless other flags are specified in its arguments).

I'll also have a go at the Gymnasium wrapper. Since gymnasium separates singleton and vectorised environments, I'm tempted to keep these things separate here as well and have separate wrappers for a singleton Gymnasium and vectorised Gymnasium environments.

Exactly, and then the gymnasium wrapper can call reset with return_info=True. (and can also keep self.render_mode and other gymansium things)

All good on all fronts

@matteobettini
Copy link
Member

cc @Giovannibriglia since we wanted to implement a StableBaselines3 wrapper, maybe the Gymnasium Vector we will work on here will make it easier to bootstrap the SB3 one

- base VMAS environment uses OpenAI gym spaces
- base VMAS environment has new flag `terminated_truncated` (default:
  False) that determines whether `done()` and `step()` return the
  default `done` value or separate values for `terminated` and
  `truncated`
- update `gymnasium` wrapper to convert gym spaces of base environment
  to gymnasium spaces
- add `gymnasium_vec` wrapper that can wrap vectorized VMAS environment
  as gymnasium environment
- add new installation options of VMAS for optional dependencies (used
  for features like rllib, torchrl, gymnasium, rendering, testing)
- add `return_numpy` flag in gymnasium wrappers (default: True) to
  determine whether to convert torch tensors to numpy --> passed through by `make_env` function
- add `render_mode` flag in gymnasium wrappers (default: "human") to
  determine mode to render --> passed through by `make_env` function
@LukasSchaefer
Copy link
Contributor Author

@matteobettini I pushed the updated integration including a vectorized Gymnasium wrapper. I tested things via the provided pytest tests, made sure the BenchMARL integration still works and that shapes behave as anticipated.

Please let me know if there are any further changes you would like to see!

As a note, I slightly modified the make_env function to pass through the return_numpy and render_mode flags I introduced in the Gymnasium wrappers. The latter is to comply with the standard Gymnasium render mode handling, and the former is to allow for returning torch tensors instead of numpy (but default is numpy to comply with standard environment interface of other Gymnasium envs). Alternatively, I could also pass through these arguments as part of the kwargs, but then they would also be fed through to the VMAS environment which might not be desirable. I considered this the cleanest the solution but happy to adjust if you think differently.

Copy link
Member

@matteobettini matteobettini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this and so cool to see the vec wrapper!

I left some comments on the vmas stuff, once all that is settled i will read and test the new wrappers

README.md Outdated
Comment on lines 130 to 135
# install wandb logging dependencies
pip install vmas[wandb]

# install torchrl dependencies for training with BenchMARL
pip install vmas[torchrl]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# install wandb logging dependencies
pip install vmas[wandb]
# install torchrl dependencies for training with BenchMARL
pip install vmas[torchrl]

I think we can remove these as they are not actual dependencies, same in setup.py

requirements.txt Outdated
@@ -3,3 +3,4 @@ torch
pyglet<=1.5.27
gym
six
cvxpylayers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cvxpylayers

I defiitely do not want to depend on this library, let's not add it for now, I'll fix the navigation heuristic later. Same in setup

vmas/interactive_rendering.py Show resolved Hide resolved
vmas/make_env.py Outdated
Comment on lines 28 to 29
return_numpy: bool = False,
render_mode: str = "human",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not have them as individual args otherwise in a future with 10+ wrappers we will go crazy. What we can consider is wrapper_kwargs: Optional[Dict] = None


def unwrapped(self) -> Environment:
return self._env

def _ensure_obs_type(self, obs):
return obs.detach().cpu().numpy() if self.return_numpy else obs
Copy link
Member

@matteobettini matteobettini Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use

def to_numpy(data: Union[Tensor, Dict[str, Tensor], List[Tensor]]):

but here .item() should be fine and better

EDIT maybe not actually cause the obs and the info are arrays, ok then the first thing above should do it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.item() would not work here afaik since these values might not be scalars but tensors.

from typing import List, Optional

import gym
import gymnasium
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import gymnasium
_has_gymnasium = importlib.util.find_spec("gymnasium") is not None
if _has_gymnasium:
import gymnasium

maybe we need to do this also in the rllib file

from vmas.simulator.utils import extract_nested_with_index


def _convert_space(space: gym.Space) -> gymnasium.Space:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really would like to avoid mainitaining this function, does gymnasium not have a conversion tool in their library?

vmas/make_env.py Outdated
max_steps: Optional[int] = None,
seed: Optional[int] = None,
dict_spaces: bool = False,
multidiscrete_actions: bool = False,
clamp_actions: bool = False,
grad_enabled: bool = False,
terminated_truncated: bool = False,
wrapper_kwargs: dict = {},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
wrapper_kwargs: dict = {},
wrapper_kwargs: Optional[Dict] = None,

let's not use mutables as defaults as in python they casue a lot of trouble

@@ -17,18 +17,25 @@ class GymWrapper(gym.Env):
def __init__(
self,
env: Environment,
return_numpy: bool = False,
**kwargs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**kwargs,

Here and in the other wrappers it is better to consume all args so that it results in error if users pass the wrong args

@@ -17,18 +17,25 @@ class GymWrapper(gym.Env):
def __init__(
self,
env: Environment,
return_numpy: bool = False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return_numpy: bool = False,
return_numpy: bool = True,

I think we can change this to true, I know it is slightly bc-breaking but it might be better aligned with gym, wdyt?

):
assert (
env.num_envs == 1
), f"GymEnv wrapper is not vectorised, got env.num_envs: {env.num_envs}"

self._env = env
assert not self._env.terminated_truncated, "GymWrapper is not only compatible with termination and truncation flags. Please set `terminated_truncated=False` in the VMAS environment."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert not self._env.terminated_truncated, "GymWrapper is not only compatible with termination and truncation flags. Please set `terminated_truncated=False` in the VMAS environment."
assert not self._env.terminated_truncated, "GymWrapper is not compatible with termination and truncation flags. Please set `terminated_truncated=False` in the VMAS environment."


def unwrapped(self) -> Environment:
return self._env

def _ensure_obs_type(self, obs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we do it for info too? they are also tensors

return self._env

def _ensure_obs_type(self, obs):
return obs.detach().cpu().numpy() if self.return_numpy else obs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use vmas util

Comment on lines 118 to 147
def _action_list_to_tensor(self, list_in: List) -> List:
assert (
len(list_in) == self._env.n_agents
), f"Expecting actions for {self._env.n_agents} agents, got {len(list_in)} actions"
actions = []
for agent in self._env.agents:
actions.append(
torch.zeros(
1,
self._env.get_agent_action_size(agent),
device=self._env.device,
dtype=torch.float32,
)
)

for i in range(self._env.n_agents):
act = torch.tensor(list_in[i], dtype=torch.float32, device=self._env.device)
if len(act.shape) == 0:
assert (
self._env.get_agent_action_size(self._env.agents[i]) == 1
), f"Action of agent {i} is supposed to be an scalar int"
else:
assert len(act.shape) == 1 and act.shape[
0
] == self._env.get_agent_action_size(self._env.agents[i]), (
f"Action of agent {i} hase wrong shape: "
f"expected {self._env.get_agent_action_size(self._env.agents[i])}, got {act.shape[0]}"
)
actions[i][0] = act
return actions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the shared functions, would it make sense to write them once?

- add base VMAS wrapper class for type conversion between tensors and np
  for singleton and vectorized envs
- change default of gym wrapper to return np data
- update interactive rendering to be compatible with non gym wrapper
  class (to preserve tensor types)
- add error messages for gymnasium and rllib wrappers without installing
  first
@LukasSchaefer
Copy link
Contributor Author

@matteobettini Added a new base VMAS wrapper class from which the gym, gymnasium, and vectorized gymnasium wrappers inherit that implements a lot of shared functionality including type conversions before and after feeding data to the environment.

Also made other smaller notifications as per your suggestions (gymnasium/ rllib import warnings, removing kwargs of wrappers and making wrapper kwargs optional to avoid mutable {} default value)

Copy link
Member

@matteobettini matteobettini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like this, I think we are almost done

from ray.rllib.utils.typing import EnvActionType, EnvInfoDict, EnvObsType
else:
raise ImportError(
"RLLib is not installed. Please install it with `pip install ray[rllib]`."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"RLLib is not installed. Please install it with `pip install ray[rllib]`."
"RLLib is not installed. Please install it with `pip install ray[rllib]<=2.2`."

terminated[0].cpu().item()
if not self.vectorized
else self._ensure_tensor_type(terminated)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't we have all of these be the same function?

 def _convert_output(self, data, item: bool):
        if not self.vectorized:
            data = extract_nested_with_index(data, index=0)
            if item:
                return data.item()
        return self._ensure_tensor_type(data)

def env(self):
return self._env

def _ensure_tensor_type(self, tensor):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to _maybe_to_numpy?

Comment on lines 107 to 115
for agent in self._env.agents:
actions.append(
torch.zeros(
self._env.num_envs,
self._env.get_agent_action_size(agent),
device=self._env.device,
dtype=torch.float32,
)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not need to do this in case of vectorized

1
] == self._env.get_agent_action_size(self._env.agents[i]), (
f"Action of agent {i} hase wrong shape: "
f"expected {self._env.get_agent_action_size(self._env.agents[i])}, got {act.shape[0]}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"expected {self._env.get_agent_action_size(self._env.agents[i])}, got {act.shape[0]}"
f"expected {self._env.get_agent_action_size(self._env.agents[i])}, got {act.shape[1]}"

if len(act.shape) == 1:
assert (
self._env.get_agent_action_size(self._env.agents[i]) == 1
), f"Action of agent {i} is supposed to be an vector of shape ({self.n_num_envs},)."
Copy link
Member

@matteobettini matteobettini Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
), f"Action of agent {i} is supposed to be an vector of shape ({self.n_num_envs},)."
), f"Action of agent {i} is supposed to be an vector of shape ({self._env.num_envs},)."

else:
assert (
act.shape[0] == self._env.num_envs
), f"Action of agent {i} is supposed to be a vector of shape ({self._num_envs}, ...)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
), f"Action of agent {i} is supposed to be a vector of shape ({self._num_envs}, ...)"
), f"Action of agent {i} is supposed to be a vector of shape ({self._env.n_num_envs}, ...)"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is mostly the base of gym things, what do you think of creating a folder

gym
    base.py
    gym.py
    gymansium.py
    gymnasium_vec.py

and renaming the base to BaseGymWrapper

for act, agent in zip(action_list, self.env.agents)
]

obs, rew, done, info = self.env.step(action_tensors)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the removal of the wrapper you need to add the vmas nested indexing util here for it to run i think

@LukasSchaefer
Copy link
Contributor Author

@matteobettini Updated the base VMAS wrapper class for gym-style wrappers with simplified shared functions, and added unit tests for all gym-style wrappers now :)

Copy link
Member

@matteobettini matteobettini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow thanks so much for the tests. I never had a user add them without asking and they look amazing.

I left a few comments

from vmas.simulator.environment import Environment


def scenario_names():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this function already defined somwhere else?

let s avoid redefining cause it makes maintainability hard

assert o.shape == shape, f"Expected shape {shape}, got {o.shape}"


@pytest.mark.parametrize("scenario", scenario_names())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests should not be too different between scenarios, maybe select a few representative scenarios and use only those to lighten the burden on github CI

@@ -38,7 +40,7 @@ class InteractiveEnv:

def __init__(
self,
env: GymWrapper,
env: Environment,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems that his change now is not needed

vmas/interactive_rendering.py Show resolved Hide resolved
vmas/make_env.py Outdated
Wrapper
] = None, # One of: None, vmas.Wrapper.RLLIB, and vmas.Wrapper.GYM
Union[Wrapper, str]
] = None, # One of: None, vmas.Wrapper.RLLIB, vmas.Wrapper.GYM, vmas.Wrapper.Gymnasium
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add str examples

)

for base_key, info in zip(base_keys, values):
for k, v in info.items():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the reason behind this info compressing? This might be bc-breaking for the gymwrapper users

also, why not compressed_info[base_key]=info becase info might be a nested dict and we would be having a / for the first level and dicts at all other levels.

if the goal is to add agent names to infos then the above suggestion should do it and we can avoid /

), f"Expecting actions for {self._env.n_agents} agents, got {len(list_in)} actions"

return [
torch.tensor(act, device=self._env.device, dtype=torch.float32).reshape(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are actions only floats?

elif self is self.GYM:
from vmas.simulator.environment.gym import GymWrapper
from vmas.simulator.environment.gym.gym import GymWrapper
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to avoid changing the import structure for bc-compatibility

what we can do to makes sure
from vmas.simulator.environment.gym import GymWrapper still runs is:

add __init__.py to the gym folder containing:

from .gym inport GymWrapper
from .gymnasium import GymnasiumWrapper
...

that way from vmas.simulator.environment.gym import GymWrapper
still works
and also from vmas.simulator.environment.gym import GymnasiumWrapper will work

README.md Show resolved Hide resolved
README.md Outdated
max_steps=None, # Defines the horizon. None is infinite horizon.
seed=None, # Seed of the environment
dict_spaces=False, # By default tuple spaces are used with each element in the tuple being an agent.
# If dict_spaces=True, the spaces will become Dict with each key being the agent's name
grad_enabled=False, # If grad_enabled the simulator is differentiable and gradients can flow from output to input
terminated_truncated=False, # If terminated_truncated the simulator will return separate `terminated` and `truncated` flags in the `done()` and `step()` functions instead of a single `done` flag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might want to update the docs on running with this info as well

@matteobettini
Copy link
Member

Also if you could pls follow this for pre-cmmit chores https://github.com/proroklab/VectorizedMultiAgentSimulator/blob/main/CONTRIBUTING.md

@matteobettini
Copy link
Member

for the tests we need to add gymansium and shimmy to install_dependencies.sh

- update github dependency installation
- unify get scenario test function and limit wrapper tests to fewer
  scenarios
- allow import of all gym wrappers from `vmas.simulator.environment.gym`
- consider env continuous_actions for action type conversion in wrappers
- compress info to single nested info if needed rather than combining
  keys
@LukasSchaefer
Copy link
Contributor Author

I followed the pre-commit chore besides updating the sphinx documentation and updated according to your comments. I want to wrap this up soonish since I've already spent more time on it than I originally intended.

For the documentation, I am not familiar with sphinx. Should I just modify the docs/source/usage/*.rst files directly within the repo or how should I proceed?

@matteobettini
Copy link
Member

Ok all good! I'll take it on from here and do a few commits to doc as well as solve an import problem

@LukasSchaefer
Copy link
Contributor Author

I'll take it on from here and do a few commits to doc as well as solve an import problem

Sounds great, thanks for taking it over. And let me know if there is a bigger issue that I introduced and you'd want help with!

@matteobettini
Copy link
Member

I think we should be good to go!

Last thing we need to sort out is that I added an mpe task to the tests and they are currently failing. Any idea of the cause?

@LukasSchaefer
Copy link
Contributor Author

Last thing we need to sort out is that I added an mpe task to the tests and they are currently failing. Any idea of the cause?

Just had a look and found the issue. When checking shapes in the case of dict spaces, I compared them in order of a list but the dict -> list conversion I did does not guarantee that those spaces are then in the right order so sometimes it would fail. I'm just writing a solution and will push in a bit

@LukasSchaefer
Copy link
Contributor Author

This commit seems to resolve the issue on my side. Please have a look but I think this should fix it!

@matteobettini matteobettini added the bc-breaking Something that will change the behaviour wrt previous versions label Sep 20, 2024
@matteobettini matteobettini merged commit 659c390 into proroklab:main Sep 20, 2024
4 checks passed
@matteobettini
Copy link
Member

Merged! Thanks a mil Lukas! I think that this will make a LOT of users happy. I owe you a beer when you come to Cambridge :)

@LukasSchaefer LukasSchaefer deleted the gymnasium_support branch September 20, 2024 13:41
@LukasSchaefer
Copy link
Contributor Author

I'm glad if it will turn out useful! I'll write a simple wrapper to integrate VMAS with its new gymnasium wrapper into EPyMARL, probably next week.

And I'll take you up on that offer once I'm properly moved!

@matteobettini
Copy link
Member

I'm glad if it will turn out useful! I'll write a simple wrapper to integrate VMAS with its new gymnasium wrapper into EPyMARL, probably next week

Oh very cool! I'll make a release in the meantime then. Do you think we will be able to use the vector env in EPyMARL and keep the data on the torch device?

@LukasSchaefer
Copy link
Contributor Author

LukasSchaefer commented Sep 20, 2024

Do you think we will be able to use the vector env in EPyMARL and keep the data on the torch device?

As I see it right now, it might be tricky to use that unfortunately since the parallel rollouts in EPyMARL use multithreading with a different interface than standard vectorised environments (see EPyMARL's parallel runners for more info). To be able to use the vectorised environment of VMAS as an alternative to this parallelisation/ vectorisation, I'd need to fundamentally change data collection in EPyMARL but I don't think that's a change I'd like to do as of right now.

While this will cost some performance, I think the loss won't be too bad when using CPUs anyway but might be more noticeable when using a GPU since we won't be able to keep all data constantly on the GPU (it will be moved back and forth). To get the most out of the latter case though, a JAX framework might be better suited in the first place judging by the current landscape but that would be a different discussion altogether.

@LukasSchaefer
Copy link
Contributor Author

@matteobettini Just wanted to ping you that I added a VMAS wrapper to EPyMARL now! See the docs here that integrates all VMAS tasks with the gymnasium wrapper into EPyMARL. As discussed, I only integrated the singleton environment for now but it already trains reasonably fast.

I tried only MAPPO training in balance and transport and with just 4 CPU cores (no GPUs), it took 6 1/2h to train 10M timesteps in transport and 10h in balance which seems reasonable, even though I'm sure it could be notably faster when using and keeping all on the GPU.

@matteobettini
Copy link
Member

Amazing! This will be so helpful as now users have the epymarl/benchmarl/rllib triad to triple check their results when in doubt. I love this so much.

PS If one day you will want to add a VectorRunner or smth like that that steps batched enviornments I would be happy to help.

@LukasSchaefer
Copy link
Contributor Author

Thanks for the offer! I'll ping you should I spend some time on this, even though I have to admit that it's unlikely I'll work on this (in the near future) given I start a new job soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bc-breaking Something that will change the behaviour wrt previous versions enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade to Gymnasium
2 participants