Skip to content

ML-Agents Beta 0.3.0

Pre-release
Pre-release
Compare
Choose a tag to compare
@vincentpierre vincentpierre released this 15 Mar 00:54
bbbe2e7

Environments

To learn more about new and improved environments, see our Example Environments page.

New

  • Soccer Twos - Multi-agent competitive and cooperative environment where behavior comes about because of reward function. Used to demonstrate multi-brain training.

  • Banana Collectors - Multi-agent resource collection environment where competitive or cooperative behavior comes about dynamically based on available resources. Used to demonstrate Imitation Learning.

  • Hallway - Single agent environment in which an agent must explore a room, remember the object within the room, and use that information to navigate to the correct goal. Used to demonstrate LSTM models.

  • Bouncer - Single agent environment provided as an example of our new On-Demand Decision-Making feature. In this environment, an agent can apply force to itself in order to bounce around a platform, and attempt to collide with floating bananas.

Improved

  • All environments have been visually refreshed with a consistent color pallet and design language.
  • Revamped GridWorld to only use visual observations and a 5x5 grid by default.
  • Revamped Tennis to use continuous actions.
  • Revamped Push Block to use local perception.
  • Revamped Wall Jump to use local perception.
  • Added Hard version of 3DBall which doesn’t contain velocity information in observations.

New Features

  • [Unity] On Demand Decision Making - It is now possible to have agents only request decisions from their brains when necessary, using RequestDecision() and RequestAction(). For more information, see here.
  • [Unity] Added vector-observation stacking - The past n vector observations for each agent can now be stored and used as input to a Brain for decision making.
  • [Python] Added Behavioral Cloning (Imitation Learning) algorithm - Train a neural network to imitate either player behavior or a hand-coded game bot using behavioral cloning. For more info, see here.
  • [Python] Support for training multiple brains simultaneously - Two or more different brains can now be trained simultaneously using the provided PPO algorithm.
  • [Python] Added LSTM models - We now support training and embedding recurrent neural networks using the PPO algorithm. This allows for learning temporal dependencies between observations.
  • [Unity] [Python] Added Docker Image for RL-training - We now provide a Docker image which allows users to train their brains in an isolated environment without the need to install Python, TensorFlow, and other dependencies. For more information, see here.
  • [Python] Ability to provide random seed to training process and environment - Allows for reproducible experimentation. For more information, see here. (Note: Unity Physics is non-deterministic, as such fully-reproducible experiments are currently not possible when using physics based interactions.)

Changes

  • [Unity] Memory size has been removed as a user-facing brain parameter. It is now defined when creating models from unitytrainers.
  • [Unity] [Python] The API as well as the general semantics used throughout ML-Agents has changed. See here for information on these changes, and how to easily adjust current projects to be compatible with these changes.
  • [Python] Training hyperparameters are now defined in a .yaml file instead of via command line arguments.
  • [Python] Training now takes place via learn.py, which launches trainers for multiple brains.
  • [Python] Python 2 is no longer supported.

Documentation

Documentation has been significantly re-written to include many new sections, in addition to updated tutorials and guides. Check it out here.

Fixes & Performance Improvements

  • [Unity] Improved memory management - Reduced garbage collection memory usage by up to 5x when using External Brain.
  • [Unity] Time.captureFramerate is now set by default to help sync Update and FixedUpdate frequencies.
  • [Unity] Added tooltips to relevant inspector objects.
  • [Unity] It is now possible to instantiate and destroy GameObjects which are Agents.
  • [Unity] Improved visual observation inference time by 3x.
  • [Unity] Tooltips added to Unity Inspector for ML-Agents variables and functions.
  • [Unity] [Python] Epsilon is now a built-in part of PPO graph. It is no longer necessary to specify it additionally in “Graph Placeholders” from Unity.
  • [Python] Changed value bootstrapping in PPO algorithm to properly calculate returns on episode time-out.
  • [Python] The neural network graph is now automatically saved as a .bytes file when training is interrupted.

Acknowledgements

Thanks to everyone at Unity who contributed to v0.3, as well as:

@asolano, @LionH, @MarcoMeter, @srcnalt, @wouterhardeman, @60days, @floAr, @Coac, @Zamaroht, @slightperturbation