Mlagents

Latest version: v1.1.0

Safety actively analyzes 714919 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 11 of 14

0.5.0a

Fixes and Improvements
* Fixes typo on documentation.
* Removes unnecessary `gitignore` line.
* Fixes imitation learning scenes.
* Fixes `BananaCollector` environment.
* Enables `gym_unity` with multiple visual observations.

Acknowledgements
Thanks to everyone at Unity who contributed to v0.5.0a, as well as: Sohojoe, fengredrum, and xiaodi-faith.

0.4.0

Environments

To learn more about new and improved environments, see our [Example Environments page](../master/docs/Learning-Environment-Examples.md).

New

* **Walker** - Humanoid physics based agent. The agents must move its body toward the goal direction as quickly as possible without falling.

* **Pyramids** - Sparse reward environment. The agent must press a button, then topple a pyramid of blocks to get the golden brick at the top. Used to demonstrate Curiosity.

Improved

* Revamped the Crawler environment

* Added visual observation based scenes for :
* BananaCollector
* PushBlock
* Hallway
* Pyramids

* Added Imitation Learning based scenes for :
* Tennis
* Bouncer
* PushBlock
* Hallway
* Pyramids

New Features

* **[Unity]** In Editor Training - It is now possible to train agents directly in the editor without building the scene. For more information, see [here](../master/docs/Basic-Guide.mdtraining-the-brain-with-reinforcement-learning).

* **[Training]** Curiosity-Driven Exploration - Addition of curiosity-based intrinsic reward signal when using PPO. Enable by setting `use_curiosity` brain training hyperparameter to `true`.

* **[Unity]** Support for providing player input using axes within the Player Brain.

* **[Unity]** TensorFlowSharp Plugin has been upgraded to version 1.7.1.

Changes
* Main ML-Agents code now within `MLAgents` namespace. Ensure that the `MLAgents` namespace is added to necessary project scripts such as Agent classes.
* ASCII art added to `learn.py` script.
* Communication now uses gRPC and Protobuf. JSON libraries removed.
* TensorBoard now reports mean absolute loss as opposed to total loss update loop.
* PPO algorithm now uses wider gaussian output for Continuous Control models (increasing performance).

Documentation
* Added Quick Start and & FAQ sections to the documentation.
* Added documentation explaining how to use ML-Agents on Microsoft Azure.
* Added benchmark reward thresholds for example environments.

Fixes & Performance Improvements
* Episode length is now properly reported in TensorBoard in the first episode.
* Behavioral Cloning now works with LSTM models.

Known Issues
* Curiosity-driven exploration does not function with On-Demand Decision Making. Expect a fix in v0.4.0a.

Acknowledgements

Thanks to everyone at Unity who contributed to v0.4, as well as: sterlingcrispin, ChrisRisner, akmadian, animaleja32, LeighS, and 5665tm.

0.4.0preview

0.4.0b

Fixes & Performance Improvements
* Corrects observation space description for PushBlock environment.
* Fixes bug preventing using environments with python multi-processing.
* Fixes bug preventing agents to be initialized without a brain.

0.4.0a

Environments
* Changes to example environments for visual consistency.

Documentation
* Adjustments to Windows installation documentation.
* Updates documentation to refer to project as a toolkit.

Changes
* New Amazon Web Service AMI.
* Uses `swish` for continuous control activation function.
* Corrected version number in `setup.py`.

Fixes & Performance Improvements
* Fixes memory leak bug when using visual observations.
* Fixes use of behavioral cloning with visual observations.
* Fixes use of curiosity-driven exploration with on-demand decision making.
* Optimize visual observations when using internal brain.

Acknowledgements
Thanks to everyone at Unity who contributed to v0.4.0a, as well as: tcmxx

0.3.1

Features

* We have upgraded our Docker contain, which now supports Brains which contain camera-based Visual Observations.

Documentation

* We have added a partial Chinese translation of our documentation. It is available [here](../master/docs/localized/zh-CN).

Fixes & Performance Improvements

* Missing component reference in BananaRL environment.
* Neural Network for multiple visual observations was not properly generated.
* Episode time-out value estimate bootstrapping used incorrect observation as input.

Acknowledgements

Thanks to everyone at Unity who contributed to v0.3.1, as well as to the following community contributors:

sterlingcrispin, andersonaddo, palomagr, imankgoyal, luchris429.

Page 11 of 14

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.