Release ML-Agents Beta 0.3.0 · Unity-Technologies/ml-agents

Environments

To learn more about new and improved environments, see our Example Environments page.

Soccer Twos - Multi-agent competitive and cooperative environment where behavior comes about because of reward function. Used to demonstrate multi-brain training.
Banana Collectors - Multi-agent resource collection environment where competitive or cooperative behavior comes about dynamically based on available resources. Used to demonstrate Imitation Learning.
Hallway - Single agent environment in which an agent must explore a room, remember the object within the room, and use that information to navigate to the correct goal. Used to demonstrate LSTM models.
Bouncer - Single agent environment provided as an example of our new On-Demand Decision-Making feature. In this environment, an agent can apply force to itself in order to bounce around a platform, and attempt to collide with floating bananas.

All environments have been visually refreshed with a consistent color pallet and design language.
Revamped GridWorld to only use visual observations and a 5x5 grid by default.
Revamped Tennis to use continuous actions.
Revamped Push Block to use local perception.
Revamped Wall Jump to use local perception.
Added Hard version of 3DBall which doesn’t contain velocity information in observations.

[Unity] On Demand Decision Making - It is now possible to have agents only request decisions from their brains when necessary, using RequestDecision() and RequestAction(). For more information, see here.
[Unity] Added vector-observation stacking - The past n vector observations for each agent can now be stored and used as input to a Brain for decision making.
[Python] Added Behavioral Cloning (Imitation Learning) algorithm - Train a neural network to imitate either player behavior or a hand-coded game bot using behavioral cloning. For more info, see here.
[Python] Support for training multiple brains simultaneously - Two or more different brains can now be trained simultaneously using the provided PPO algorithm.
[Python] Added LSTM models - We now support training and embedding recurrent neural networks using the PPO algorithm. This allows for learning temporal dependencies between observations.
[Unity] [Python] Added Docker Image for RL-training - We now provide a Docker image which allows users to train their brains in an isolated environment without the need to install Python, TensorFlow, and other dependencies. For more information, see here.
[Python] Ability to provide random seed to training process and environment - Allows for reproducible experimentation. For more information, see here. (Note: Unity Physics is non-deterministic, as such fully-reproducible experiments are currently not possible when using physics based interactions.)

[Unity] Memory size has been removed as a user-facing brain parameter. It is now defined when creating models from unitytrainers.
[Unity] [Python] The API as well as the general semantics used throughout ML-Agents has changed. See here for information on these changes, and how to easily adjust current projects to be compatible with these changes.
[Python] Training hyperparameters are now defined in a .yaml file instead of via command line arguments.
[Python] Training now takes place via learn.py, which launches trainers for multiple brains.
[Python] Python 2 is no longer supported.

Documentation has been significantly re-written to include many new sections, in addition to updated tutorials and guides. Check it out here.

[Unity] Improved memory management - Reduced garbage collection memory usage by up to 5x when using External Brain.
[Unity] Time.captureFramerate is now set by default to help sync Update and FixedUpdate frequencies.
[Unity] Added tooltips to relevant inspector objects.
[Unity] It is now possible to instantiate and destroy GameObjects which are Agents.
[Unity] Improved visual observation inference time by 3x.
[Unity] Tooltips added to Unity Inspector for ML-Agents variables and functions.
[Unity] [Python] Epsilon is now a built-in part of PPO graph. It is no longer necessary to specify it additionally in “Graph Placeholders” from Unity.
[Python] Changed value bootstrapping in PPO algorithm to properly calculate returns on episode time-out.
[Python] The neural network graph is now automatically saved as a .bytes file when training is interrupted.

Thanks to everyone at Unity who contributed to v0.3, as well as: