stable-baselines3 Changelog

2.3.2

Bug fixes
* Reverted ``torch.load()`` to be called ``weights_only=False`` as it caused loading issue with old version of PyTorch. https://github.com/DLR-RM/stable-baselines3/pull/1913
* Cast learning_rate to float lambda for pickle safety when doing model.load by markscsmith in https://github.com/DLR-RM/stable-baselines3/pull/1901

Documentation
* Fix typo in changelog by araffin in https://github.com/DLR-RM/stable-baselines3/pull/1882
* Fixed broken link in ppo.rst by chaitanyabisht in https://github.com/DLR-RM/stable-baselines3/pull/1884
* Adding ER-MRL to community project by corentinlger in https://github.com/DLR-RM/stable-baselines3/pull/1904
* Fix tensorboad video slow numpy->torch conversion by NickLucche in https://github.com/DLR-RM/stable-baselines3/pull/1910

New Contributors
* chaitanyabisht made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1884
* markscsmith made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1901
* NickLucche made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1910

**Full Changelog**: https://github.com/DLR-RM/stable-baselines3/compare/v2.3.0...v2.3.2

2.3.0

model = QRDQN("MlpPolicy", env, learning_starts=100)

New Features:

- Added ``rollout_buffer_class`` and ``rollout_buffer_kwargs`` arguments to MaskablePPO
- Log success rate ``rollout/success_rate`` when available for on policy algorithms

Others:

- Fixed ``train_freq`` type annotation for tqc and qrdqn (Armandpl)
- Fixed ``sb3_contrib/common/maskable/*.py`` type annotations
- Fixed ``sb3_contrib/ppo_mask/ppo_mask.py`` type annotations
- Fixed ``sb3_contrib/common/vec_env/async_eval.py`` type annotations

Documentation:

- Add some additional notes about ``MaskablePPO`` (evaluation and multi-process) (icheered)

**Full Changelog**: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/v2.2.1...v2.3.0

2.2.1

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

Breaking Changes:

- Upgraded to Stable-Baselines3 >= 2.2.1
- Switched to ``ruff`` for sorting imports (isort is no longer needed), black and ruff version now require a minimum version
- Dropped ``x is False`` in favor of ``not x``, which means that callbacks that wrongly returned None (instead of a boolean) will cause the training to stop (iwishiwasaneagle)

New Features:

- Added ``set_options`` for ``AsyncEval``
- Added ``rollout_buffer_class`` and ``rollout_buffer_kwargs`` arguments to TRPO

Others:

- Fixed ``ActorCriticPolicy.extract_features()`` signature by adding an optional ``features_extractor`` argument
- Update dependencies (accept newer Shimmy/Sphinx version and remove ``sphinx_autodoc_typehints``)

2.1.0

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

Breaking Changes:

- Removed Python 3.7 support
- SB3 now requires PyTorch >= 1.13
- Upgraded to Stable-Baselines3 >= 2.1.0

New Features:

- Added Python 3.11 support

Bug Fixes:

- Fixed MaskablePPO ignoring ``stats_window_size`` argument

**Full Changelog**: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/v2.0.0...v2.1.0

2.0.0

> **Warning**
> Stable-Baselines3 (SB3) v2.0 will be the last one supporting python 3.7 (end of life in June 2023).
> We highly recommended you to upgrade to Python >= 3.8.

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade

Breaking Changes

- Switched to Gymnasium as primary backend, Gym 0.21 and 0.26 are still supported via the ``shimmy`` package (carlosluis, arjun-kg, tlpss)
- Upgraded to Stable-Baselines3 >= 2.0.0

Bug fixes

- Fixed QRDQN update interval for multi envs

Others

- Fixed ``sb3_contrib/tqc/*.py`` type hints
- Fixed ``sb3_contrib/trpo/*.py`` type hints
- Fixed ``sb3_contrib/common/envs/invalid_actions_env.py`` type hints

**Full Changelog**: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/v1.8.0...v2.0.0

1.8.0

> **Warning**
> Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend.
Starting with v2.0.0, Gymnasium will be the default backend (though SB3 will have compatibility layers for Gym envs).
You can find a migration guide [here](https://gymnasium.farama.org/content/migration-guide/).
If you want to try the SB3 v2.0 alpha version, you can take a look at [PR 1327](https://github.com/DLR-RM/stable-baselines3/pull/1327).

RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo

To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade

Breaking Changes:

- Removed shared layers in `mlp_extractor` (AlexPasqua)
- Upgraded to Stable-Baselines3 \>= 1.8.0

New Features:

- Added `stats_window_size` argument to control smoothing in rollout logging (jonasreiher)

Bug Fixes:

Deprecations:

Others:

- Moved to pyproject.toml
- Added github issue forms
- Fixed Atari Roms download in CI
- Fixed `sb3_contrib/qrdqn/*.py` type hints
- Switched from `flake8` to `ruff`

Documentation:

- Added warning about potential crashes caused by `check_env` in the `MaskablePPO` docs (AlexPasqua)

Stable-baselines3

Page 1 of 4