stable-baselines3 Changelog

2.2.1

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

Breaking Changes:

- Upgraded to Stable-Baselines3 >= 2.2.1
- Switched to ``ruff`` for sorting imports (isort is no longer needed), black and ruff version now require a minimum version
- Dropped ``x is False`` in favor of ``not x``, which means that callbacks that wrongly returned None (instead of a boolean) will cause the training to stop (iwishiwasaneagle)

New Features:

- Added ``set_options`` for ``AsyncEval``
- Added ``rollout_buffer_class`` and ``rollout_buffer_kwargs`` arguments to TRPO

Others:

- Fixed ``ActorCriticPolicy.extract_features()`` signature by adding an optional ``features_extractor`` argument
- Update dependencies (accept newer Shimmy/Sphinx version and remove ``sphinx_autodoc_typehints``)

2.1.0

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

Breaking Changes:

- Removed Python 3.7 support
- SB3 now requires PyTorch >= 1.13
- Upgraded to Stable-Baselines3 >= 2.1.0

New Features:

- Added Python 3.11 support

Bug Fixes:

- Fixed MaskablePPO ignoring ``stats_window_size`` argument

**Full Changelog**: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/v2.0.0...v2.1.0

2.0.0

> **Warning**
> Stable-Baselines3 (SB3) v2.0 will be the last one supporting python 3.7 (end of life in June 2023).
> We highly recommended you to upgrade to Python >= 3.8.

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade

Breaking Changes

- Switched to Gymnasium as primary backend, Gym 0.21 and 0.26 are still supported via the ``shimmy`` package (carlosluis, arjun-kg, tlpss)
- Upgraded to Stable-Baselines3 >= 2.0.0

Bug fixes

- Fixed QRDQN update interval for multi envs

Others

- Fixed ``sb3_contrib/tqc/*.py`` type hints
- Fixed ``sb3_contrib/trpo/*.py`` type hints
- Fixed ``sb3_contrib/common/envs/invalid_actions_env.py`` type hints

**Full Changelog**: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/v1.8.0...v2.0.0

1.8.0

> **Warning**
> Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend.
Starting with v2.0.0, Gymnasium will be the default backend (though SB3 will have compatibility layers for Gym envs).
You can find a migration guide [here](https://gymnasium.farama.org/content/migration-guide/).
If you want to try the SB3 v2.0 alpha version, you can take a look at [PR 1327](https://github.com/DLR-RM/stable-baselines3/pull/1327).

RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo

To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade

Breaking Changes:

- Removed shared layers in `mlp_extractor` (AlexPasqua)
- Upgraded to Stable-Baselines3 \>= 1.8.0

New Features:

- Added `stats_window_size` argument to control smoothing in rollout logging (jonasreiher)

Bug Fixes:

Deprecations:

Others:

- Moved to pyproject.toml
- Added github issue forms
- Fixed Atari Roms download in CI
- Fixed `sb3_contrib/qrdqn/*.py` type hints
- Switched from `flake8` to `ruff`

Documentation:

- Added warning about potential crashes caused by `check_env` in the `MaskablePPO` docs (AlexPasqua)

1.7.0

> **Warning**
> Shared layers in MLP policy (``mlp_extractor``) are now deprecated for PPO, A2C and TRPO.
This feature will be removed in SB3 v1.8.0 and the behavior of ``net_arch=[64, 64]``
will create **separate** networks with the same architecture, to be consistent with the off-policy algorithms.

> **Note**
> TRPO models saved with SB3 < 1.7.0 will show a warning about
missing keys in the state dict when loaded with SB3 >= 1.7.0.
To suppress the warning, simply save the model again.
You can find more info in issue 1233

Breaking Changes:

- Removed deprecated ``create_eval_env``, ``eval_env``, ``eval_log_path``, ``n_eval_episodes`` and ``eval_freq`` parameters,
please use an ``EvalCallback`` instead
- Removed deprecated ``sde_net_arch`` parameter
- Upgraded to Stable-Baselines3 >= 1.7.0

New Features:

- Introduced mypy type checking
- Added support for Python 3.10
- Added ``with_bias`` parameter to ``ARSPolicy``
- Added option to have non-shared features extractor between actor and critic in on-policy algorithms (AlexPasqua)
- Features extractors now properly support unnormalized image-like observations (3D tensor)
when passing ``normalize_images=False``

Bug Fixes:

- Fixed a bug in ``RecurrentPPO`` where the lstm states where incorrectly reshaped for ``n_lstm_layers > 1`` (thanks kolbytn)
- Fixed ``RuntimeError: rnn: hx is not contiguous`` while predicting terminal values for ``RecurrentPPO`` when ``n_lstm_layers > 1``

Deprecations:

- You should now explicitely pass a ``features_extractor`` parameter when calling ``extract_features()``
- Deprecated shared layers in ``MlpExtractor`` (AlexPasqua)

Others:

- Fixed flake8 config
- Fixed ``sb3_contrib/common/utils.py`` type hint
- Fixed ``sb3_contrib/common/recurrent/type_aliases.py`` type hint
- Fixed ``sb3_contrib/ars/policies.py`` type hint
- Exposed modules in `__init__.py` with `__all__` attribute (ZikangXiong)
- Removed ignores on Flake8 F401 (ZikangXiong)
- Upgraded GitHub CI/setup-python to v4 and checkout to v3
- Set tensors construction directly on the device
- Standardized the use of ``from gym import spaces``

1.6.2

Breaking Changes:
- Upgraded to Stable-Baselines3 >= 1.6.2

New Features:
- Added ``progress_bar`` argument in the ``learn()`` method, displayed using TQDM and rich packages

Deprecations:
- Deprecate parameters ``eval_env``, ``eval_freq`` and ``create_eval_env``

Others:
- Fixed the return type of ``.load()`` methods so that they now use ``TypeVar``

Stable-baselines3

Page 2 of 5