ML-Agents Beta 0.3.0
Pre-releaseEnvironments
To learn more about new and improved environments, see our Example Environments page.
New
-
Soccer Twos - Multi-agent competitive and cooperative environment where behavior comes about because of reward function. Used to demonstrate multi-brain training.
-
Banana Collectors - Multi-agent resource collection environment where competitive or cooperative behavior comes about dynamically based on available resources. Used to demonstrate Imitation Learning.
-
Hallway - Single agent environment in which an agent must explore a room, remember the object within the room, and use that information to navigate to the correct goal. Used to demonstrate LSTM models.
-
Bouncer - Single agent environment provided as an example of our new On-Demand Decision-Making feature. In this environment, an agent can apply force to itself in order to bounce around a platform, and attempt to collide with floating bananas.
Improved
- All environments have been visually refreshed with a consistent color pallet and design language.
- Revamped GridWorld to only use visual observations and a 5x5 grid by default.
- Revamped Tennis to use continuous actions.
- Revamped Push Block to use local perception.
- Revamped Wall Jump to use local perception.
- Added Hard version of 3DBall which doesn’t contain velocity information in observations.
New Features
- [Unity] On Demand Decision Making - It is now possible to have agents only request decisions from their brains when necessary, using
RequestDecision()
andRequestAction()
. For more information, see here. - [Unity] Added vector-observation stacking - The past n vector observations for each agent can now be stored and used as input to a Brain for decision making.
- [Python] Added Behavioral Cloning (Imitation Learning) algorithm - Train a neural network to imitate either player behavior or a hand-coded game bot using behavioral cloning. For more info, see here.
- [Python] Support for training multiple brains simultaneously - Two or more different brains can now be trained simultaneously using the provided PPO algorithm.
- [Python] Added LSTM models - We now support training and embedding recurrent neural networks using the PPO algorithm. This allows for learning temporal dependencies between observations.
- [Unity] [Python] Added Docker Image for RL-training - We now provide a Docker image which allows users to train their brains in an isolated environment without the need to install Python, TensorFlow, and other dependencies. For more information, see here.
- [Python] Ability to provide random seed to training process and environment - Allows for reproducible experimentation. For more information, see here. (Note: Unity Physics is non-deterministic, as such fully-reproducible experiments are currently not possible when using physics based interactions.)
Changes
- [Unity] Memory size has been removed as a user-facing brain parameter. It is now defined when creating models from
unitytrainers
. - [Unity] [Python] The API as well as the general semantics used throughout ML-Agents has changed. See here for information on these changes, and how to easily adjust current projects to be compatible with these changes.
- [Python] Training hyperparameters are now defined in a
.yaml
file instead of via command line arguments. - [Python] Training now takes place via learn.py, which launches trainers for multiple brains.
- [Python] Python 2 is no longer supported.
Documentation
Documentation has been significantly re-written to include many new sections, in addition to updated tutorials and guides. Check it out here.
Fixes & Performance Improvements
- [Unity] Improved memory management - Reduced garbage collection memory usage by up to 5x when using External Brain.
- [Unity] Time.captureFramerate is now set by default to help sync Update and FixedUpdate frequencies.
- [Unity] Added tooltips to relevant inspector objects.
- [Unity] It is now possible to instantiate and destroy GameObjects which are Agents.
- [Unity] Improved visual observation inference time by 3x.
- [Unity] Tooltips added to Unity Inspector for ML-Agents variables and functions.
- [Unity] [Python] Epsilon is now a built-in part of PPO graph. It is no longer necessary to specify it additionally in “Graph Placeholders” from Unity.
- [Python] Changed value bootstrapping in PPO algorithm to properly calculate returns on episode time-out.
- [Python] The neural network graph is now automatically saved as a
.bytes
file when training is interrupted.
Acknowledgements
Thanks to everyone at Unity who contributed to v0.3, as well as:
@asolano, @LionH, @MarcoMeter, @srcnalt, @wouterhardeman, @60days, @floAr, @Coac, @Zamaroht, @slightperturbation