Stable-baselines3

Latest version: v2.6.0

Safety actively analyzes 723963 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 5

1.6.1

Breaking Changes:

- Fixed the issue that ``predict`` does not always return action as ``np.ndarray`` (qgallouedec)
- Upgraded to Stable-Baselines3 >= 1.6.1

Bug Fixes:

- Fixed the issue of wrongly passing policy arguments when using CnnLstmPolicy or MultiInputLstmPolicy with ``RecurrentPPO`` (mlodel)
- Fixed division by zero error when computing FPS when a small number of time has elapsed in operating systems with low-precision timers.
- Fixed calling child callbacks in MaskableEvalCallback (CppMaster)
- Fixed missing verbose parameter passing in the ``MaskableEvalCallback`` constructor (burakdmb)
- Fixed the issue that when updating the target network in QRDQN, TQC, the ``running_mean`` and ``running_var`` properties of batch norm layers are not updated (honglu2875)

Others:

- Changed the default buffer device from ``"cpu"`` to ``"auto"``

1.6.0

Breaking changes:

- Upgraded to Stable-Baselines3 >= 1.6.0
- Changed the way policy "aliases" are handled ("MlpPolicy", "CnnPolicy", ...), removing the former
``register_policy`` helper, ``policy_base`` parameter and using ``policy_aliases`` static attributes instead (Gregwar)
- Renamed ``rollout/exploration rate`` key to ``rollout/exploration_rate`` for QRDQN (to be consistent with SB3 DQN)
- Upgraded to python 3.7+ syntax using ``pyupgrade``
- SB3 now requires PyTorch >= 1.11
- Changed the default network architecture when using ``CnnPolicy`` or ``MultiInputPolicy`` with TQC,
``share_features_extractor`` is now set to False by default and the ``net_arch=[256, 256]`` (instead of ``net_arch=[]`` that was before)

New Features

- Added ``RecurrentPPO`` (aka PPO LSTM)

Bug Fixes:

- Fixed a bug in ``RecurrentPPO`` when calculating the masked loss functions (rnederstigt)
- Fixed a bug in ``TRPO`` where kl divergence was not implemented for ``MultiDiscrete`` space

1.5.0

Breaking Changes:

- Switched minimum Gym version to 0.21.0.
- Upgraded to Stable-Baselines3 >= 1.5.0

New Features:

- Allow PPO to turn of advantage normalization (see [PR 61](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/61)) vwxyzjn

Bug Fixes:

- Removed explict calls to ``forward()`` method as per pytorch guidelines

1.4.0

Breaking Changes:

- Dropped python 3.6 support
- Upgraded to Stable-Baselines3 >= 1.4.0
- ``MaskablePPO`` was updated to match latest SB3 ``PPO`` version (timeout handling and new method for the policy object)

New Features:

- Added ``TRPO`` (cyprienc)
- Added experimental support to train off-policy algorithms with multiple envs (note: ``HerReplayBuffer`` currently not supported)
- Added Augmented Random Search (ARS) (sgillen)

Others:

- Improve test coverage for ``MaskablePPO``

1.3.0

**WARNING: This version will be the last one supporting Python 3.6 (end of life in Dec 2021).
We highly recommended you to upgrade to Python >= 3.7.**

Breaking Changes:

- Removed ``sde_net_arch``
- Upgraded to Stable-Baselines3 >= 1.3.0

New Features:

- Added ``MaskablePPO`` algorithm (kronion)
- ``MaskablePPO`` Dictionary Observation support (glmcdona)

1.2.0

Breaking Changes:

- Upgraded to Stable-Baselines3 >= 1.2.0

Bug Fixes:

- QR-DQN and TQC updated so that their policies are switched between train and eval mode at the correct time (ayeright)

Others:

- Fixed type annotation
- Added python 3.9 to CI

Page 3 of 5

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.