Stable-baselines3

Latest version: v2.5.0

Safety actively analyzes 715032 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 5

2.1.0

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

To upgrade:

pip install stable_baselines3 sb3_contrib --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade


Breaking Changes:

- Removed Python 3.7 support
- SB3 now requires PyTorch >= 1.13

New Features:

- Added Python 3.11 support
- Added Gymnasium 0.29 support (pseudo-rnd-thoughts)


[SB3-Contrib](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib)

- Fixed MaskablePPO ignoring ``stats_window_size`` argument
- Added Python 3.11 support

[RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo)

- Upgraded to Huggingface-SB3 >= 2.3
- Added Python 3.11 support

Bug Fixes:

- Relaxed check in logger, that was causing issue on Windows with colorama
- Fixed off-policy algorithms with continuous float64 actions (see 1145) (tobirohrer)
- Fixed ``env_checker.py`` warning messages for out of bounds in complex observation spaces (Gabo-Tor)

Others:

- Updated GitHub issue templates
- Fix typo in gym patch error message (lukashass)
- Refactor ``test_spaces.py`` tests

Documentation:

- Fixed callback example (BertrandDecoster)
- Fixed policy network example (kyle-he)
- Added mobile-env as new community project (stefanbschneider)
- Added [DeepNetSlice](https://github.com/AlexPasqua/DeepNetSlice) to community projects (AlexPasqua)

**Full Changelog**: https://github.com/DLR-RM/stable-baselines3/compare/v2.0.0...v2.1.0

2.0.0

> [!WARNING]
> Stable-Baselines3 (SB3) v2.0 will be the last one supporting python 3.7 (end of life in June 2023).
> We highly recommended you to upgrade to Python >= 3.8.

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx


To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade


Breaking Changes:

- Switched to Gymnasium as primary backend, Gym 0.21 and 0.26 are still supported via the ``shimmy`` package (carlosluis, arjun-kg, tlpss)
- The deprecated ``online_sampling`` argument of ``HerReplayBuffer`` was removed
- Removed deprecated ``stack_observation_space`` method of ``StackedObservations``
- Renamed environment output observations in ``evaluate_policy`` to prevent shadowing the input observations during callbacks (npit)
- Upgraded wrappers and custom environment to Gymnasium
- Refined the ``HumanOutputFormat`` file check: now it verifies if the object is an instance of ``io.TextIOBase`` instead of only checking for the presence of a ``write`` method.
- Because of new Gym API (0.26+), the random seed passed to ``vec_env.seed(seed=seed)`` will only be effective after then ``env.reset()`` call.

New Features:

- Added Gymnasium support (Gym 0.21 and 0.26 are supported via the ``shimmy`` package)

[SB3-Contrib](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib)

- Fixed QRDQN update interval for multi envs

[RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo)

- Gym 0.26+ patches to continue working with pybullet and TimeLimit wrapper
- Renamed `CarRacing-v1` to `CarRacing-v2` in hyperparameters
- Huggingface push to hub now accepts a `--n-timesteps` argument to adjust the length of the video
- Fixed `record_video` steps (before it was stepping in a closed env)
- Dropped Gym 0.21 support

Bug Fixes:

- Fixed ``VecExtractDictObs`` does not handle terminal observation (WeberSamuel)
- Set NumPy version to ``>=1.20`` due to use of ``numpy.typing`` (troiganto)
- Fixed loading DQN changes ``target_update_interval`` (tobirohrer)
- Fixed env checker to properly reset the env before calling ``step()`` when checking
for ``Inf`` and ``NaN`` (lutogniew)
- Fixed HER ``truncate_last_trajectory()`` (lbergmann1)
- Fixed HER desired and achieved goal order in reward computation (JonathanKuelz)


Others:

- Fixed ``stable_baselines3/a2c/*.py`` type hints
- Fixed ``stable_baselines3/ppo/*.py`` type hints
- Fixed ``stable_baselines3/sac/*.py`` type hints
- Fixed ``stable_baselines3/td3/*.py`` type hints
- Fixed ``stable_baselines3/common/base_class.py`` type hints
- Fixed ``stable_baselines3/common/logger.py`` type hints
- Fixed ``stable_baselines3/common/envs/*.py`` type hints
- Fixed ``stable_baselines3/common/vec_env/vec_monitor|vec_extract_dict_obs|util.py`` type hints
- Fixed ``stable_baselines3/common/vec_env/base_vec_env.py`` type hints
- Fixed ``stable_baselines3/common/vec_env/vec_frame_stack.py`` type hints
- Fixed ``stable_baselines3/common/vec_env/dummy_vec_env.py`` type hints
- Fixed ``stable_baselines3/common/vec_env/subproc_vec_env.py`` type hints
- Upgraded docker images to use mamba/micromamba and CUDA 11.7
- Updated env checker to reflect what subset of Gymnasium is supported and improve GoalEnv checks
- Improve type annotation of wrappers
- Tests envs are now checked too
- Added render test for ``VecEnv`` and ``VecEnvWrapper``
- Update issue templates and env info saved with the model
- Changed ``seed()`` method return type from ``List`` to ``Sequence``
- Updated env checker doc and requirements for tuple spaces/goal envs

Documentation:

- Added Deep RL Course link to the Deep RL Resources page
- Added documentation about ``VecEnv`` API vs Gym API
- Upgraded tutorials to Gymnasium API
- Make it more explicit when using ``VecEnv`` vs Gym env
- Added UAV_Navigation_DRL_AirSim to the project page (heleidsn)
- Added ``EvalCallback`` example (sidney-tio)
- Update custom env documentation
- Added `pink-noise-rl` to projects page
- Fix custom policy example, ``ortho_init`` was ignored
- Added SBX page

**Full Changelog**: https://github.com/DLR-RM/stable-baselines3/compare/v1.8.0...v2.0.0

1.8.0

> [!WARNING]
> Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend.
Starting with v2.0.0, Gymnasium will be the default backend (though SB3 will have compatibility layers for Gym envs).
You can find a migration guide [here](https://gymnasium.farama.org/content/migration-guide/).
If you want to try the SB3 v2.0 alpha version, you can take a look at [PR 1327](https://github.com/DLR-RM/stable-baselines3/pull/1327).

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo

To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade



Breaking Changes:

- Removed shared layers in `mlp_extractor` (AlexPasqua)
- Refactored `StackedObservations` (it now handles dict obs, `StackedDictObservations` was removed)
- You must now explicitely pass a `features_extractor` parameter when calling `extract_features()`
- Dropped offline sampling for `HerReplayBuffer`
- As `HerReplayBuffer` was refactored to support multiprocessing, previous replay buffer are incompatible with this new version
- `HerReplayBuffer` doesn't require a `max_episode_length` anymore

New Features:

- Added `repeat_action_probability` argument in `AtariWrapper`.
- Only use `NoopResetEnv` and `MaxAndSkipEnv` when needed in `AtariWrapper`
- Added support for dict/tuple observations spaces for `VecCheckNan`, the check is now active in the `env_checker()` (DavyMorgan)
- Added multiprocessing support for `HerReplayBuffer`
- `HerReplayBuffer` now supports all datatypes supported by `ReplayBuffer`
- Provide more helpful failure messages when validating the `observation_space` of custom gym environments using `check_env` (FieteO)
- Added `stats_window_size` argument to control smoothing in rollout logging (jonasreiher)

[SB3-Contrib](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib)

- Added warning about potential crashes caused by `check_env` in the `MaskablePPO` docs (AlexPasqua)
- Fixed `sb3_contrib/qrdqn/*.py` type hints
- Removed shared layers in `mlp_extractor` (AlexPasqua)

[RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo)

- [Open RL Benchmark](https://github.com/openrlbenchmark/openrlbenchmark/issues/7)
- Upgraded to new <span class="title-ref">HerReplayBuffer</span> implementation that supports multiple envs
- Removed <span class="title-ref">TimeFeatureWrapper</span> for Panda and Fetch envs, as the new replay buffer should handle timeout.
- Tuned hyperparameters for RecurrentPPO on Swimmer
- Documentation is now built using Sphinx and hosted on read the doc
- Removed <span class="title-ref">use_auth_token</span> for push to hub util
- Reverted from v3 to v2 for HumanoidStandup, Reacher, InvertedPendulum and InvertedDoublePendulum since they were not part of the mujoco refactoring (see <https://github.com/openai/gym/pull/1304>)
- Fixed <span class="title-ref">gym-minigrid</span> policy (from <span class="title-ref">MlpPolicy</span> to <span class="title-ref">MultiInputPolicy</span>)
- Replaced deprecated <span class="title-ref">optuna.suggest_loguniform(...)</span> by <span class="title-ref">optuna.suggest_float(..., log=True)</span>
- Switched to <span class="title-ref">ruff</span> and <span class="title-ref">pyproject.toml</span>
- Removed <span class="title-ref">online_sampling</span> and <span class="title-ref">max_episode_length</span> argument when using <span class="title-ref">HerReplayBuffer</span>

Bug Fixes:

- Fixed Atari wrapper that missed the reset condition (luizapozzobon)
- Added the argument `dtype` (default to `float32`) to the noise for consistency with gym action (sidney-tio)
- Fixed PPO train/n_updates metric not accounting for early stopping (adamfrly)
- Fixed loading of normalized image-based environments
- Fixed `DictRolloutBuffer.add` with multidimensional action space (younik)

Deprecations:

Others:

- Fixed `tests/test_tensorboard.py` type hint
- Fixed `tests/test_vec_normalize.py` type hint
- Fixed `stable_baselines3/common/monitor.py` type hint
- Added tests for StackedObservations
- Removed Gitlab CI file
- Moved from `setup.cg` to `pyproject.toml` configuration file
- Switched from `flake8` to `ruff`
- Upgraded AutoROM to latest version
- Fixed `stable_baselines3/dqn/*.py` type hints
- Added `extra_no_roms` option for package installation without Atari Roms

Documentation:

- Renamed `load_parameters` to `set_parameters` (DavyMorgan)
- Clarified documentation about subproc multiprocessing for A2C (Bonifatius94)
- Fixed typo in `A2C` docstring (AlexPasqua)
- Renamed timesteps to episodes for `log_interval` description (theSquaredError)
- Removed note about gif creation for Atari games (harveybellini)
- Added information about default network architecture
- Update information about Gymnasium support

1.7.0

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo

To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade



> **Warning**
> Shared layers in MLP policy (``mlp_extractor``) are now deprecated for PPO, A2C and TRPO.
This feature will be removed in SB3 v1.8.0 and the behavior of ``net_arch=[64, 64]``
will create **separate** networks with the same architecture, to be consistent with the off-policy algorithms.

> **Note**
> A2C and PPO models saved with SB3 < 1.7.0 will show a warning about
missing keys in the state dict when loaded with SB3 >= 1.7.0.
To suppress the warning, simply save the model again.
You can find more info in issue 1233



Breaking Changes:

- Removed deprecated ``create_eval_env``, ``eval_env``, ``eval_log_path``, ``n_eval_episodes`` and ``eval_freq`` parameters,
please use an ``EvalCallback`` instead
- Removed deprecated ``sde_net_arch`` parameter
- Removed ``ret`` attributes in ``VecNormalize``, please use ``returns`` instead
- ``VecNormalize`` now updates the observation space when normalizing images

New Features:

- Introduced mypy type checking
- Added option to have non-shared features extractor between actor and critic in on-policy algorithms (AlexPasqua)
- Added ``with_bias`` argument to ``create_mlp``
- Added support for multidimensional ``spaces.MultiBinary`` observations
- Features extractors now properly support unnormalized image-like observations (3D tensor)
when passing ``normalize_images=False``
- Added ``normalized_image`` parameter to ``NatureCNN`` and ``CombinedExtractor``
- Added support for Python 3.10

[SB3-Contrib](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib)

- Fixed a bug in ``RecurrentPPO`` where the lstm states where incorrectly reshaped for ``n_lstm_layers > 1`` (thanks kolbytn)
- Fixed ``RuntimeError: rnn: hx is not contiguous`` while predicting terminal values for ``RecurrentPPO`` when ``n_lstm_layers > 1``

[RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo)

- Added support for python file for configuration
- Added ``monitor_kwargs`` parameter

Bug Fixes:

- Fixed ``ProgressBarCallback`` under-reporting (dominicgkerr)
- Fixed return type of ``evaluate_actions`` in ``ActorCritcPolicy`` to reflect that entropy is an optional tensor (Rocamonde)
- Fixed type annotation of ``policy`` in ``BaseAlgorithm`` and ``OffPolicyAlgorithm``
- Allowed model trained with Python 3.7 to be loaded with Python 3.8+ without the ``custom_objects`` workaround
- Raise an error when the same gym environment instance is passed as separate environments when creating a vectorized environment with more than one environment. (Rocamonde)
- Fix type annotation of ``model`` in ``evaluate_policy``
- Fixed ``Self`` return type using ``TypeVar``
- Fixed the env checker, the key was not passed when checking images from Dict observation space
- Fixed ``normalize_images`` which was not passed to parent class in some cases
- Fixed ``load_from_vector`` that was broken with newer PyTorch version when passing PyTorch tensor

Deprecations:

- You should now explicitely pass a ``features_extractor`` parameter when calling ``extract_features()``
- Deprecated shared layers in ``MlpExtractor`` (AlexPasqua)

Others:

- Used issue forms instead of issue templates
- Updated the PR template to associate each PR with its peer in RL-Zoo3 and SB3-Contrib
- Fixed flake8 config to be compatible with flake8 6+
- Goal-conditioned environments are now characterized by the availability of the ``compute_reward`` method, rather than by their inheritance to ``gym.GoalEnv``
- Replaced ``CartPole-v0`` by ``CartPole-v1`` is tests
- Fixed ``tests/test_distributions.py`` type hints
- Fixed ``stable_baselines3/common/type_aliases.py`` type hints
- Fixed ``stable_baselines3/common/torch_layers.py`` type hints
- Fixed ``stable_baselines3/common/env_util.py`` type hints
- Fixed ``stable_baselines3/common/preprocessing.py`` type hints
- Fixed ``stable_baselines3/common/atari_wrappers.py`` type hints
- Fixed ``stable_baselines3/common/vec_env/vec_check_nan.py`` type hints
- Exposed modules in ``__init__.py`` with the ``__all__`` attribute (ZikangXiong)
- Upgraded GitHub CI/setup-python to v4 and checkout to v3
- Set tensors construction directly on the device (~8% speed boost on GPU)
- Monkey-patched ``np.bool = bool`` so gym 0.21 is compatible with NumPy 1.24+
- Standardized the use of ``from gym import spaces``
- Modified ``get_system_info`` to avoid issue linked to copy-pasting on GitHub issue

Documentation:

- Updated Hugging Face Integration page (simoninithomas)
- Changed ``env`` to ``vec_env`` when environment is vectorized
- Updated custom policy docs to better explain the ``mlp_extractor``'s dimensions (AlexPasqua)
- Updated custom policy documentation (athatheo)
- Improved tensorboard callback doc
- Clarify doc when using image-like input
- Added RLeXplore to the project page (yuanmingqi)

1.6.2

SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3: https://github.com/DLR-RM/rl-baselines3-zoo


New Features:

- Added ``progress_bar`` argument in the ``learn()`` method, displayed using TQDM and rich packages
- Added progress bar callback

[RL Zoo3](https://github.com/DLR-RM/rl-baselines3-zoo)
- The [RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo>) can now be installed as a package (``pip install rl_zoo3``)

Bug Fixes:

- ``self.num_timesteps`` was initialized properly only after the first call to ``on_step()`` for callbacks
- Set importlib-metadata version to ``~=4.13`` to be compatible with ``gym=0.21``

Deprecations:

- Added deprecation warning if parameters ``eval_env``, ``eval_freq`` or ``create_eval_env`` are used (see 925) (tobirohrer)

Others:

- Fixed type hint of the ``env_id`` parameter in ``make_vec_env`` and ``make_atari_env`` (AlexPasqua)

Documentation:

- Extended docstring of the ``wrapper_class`` parameter in ``make_vec_env`` (AlexPasqua)

1.6.1

SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib


Breaking Changes:

- Switched minimum tensorboard version to 2.9.1

New Features:

- Support logging hyperparameters to tensorboard (timothe-chaumont)
- Added checkpoints for replay buffer and ``VecNormalize`` statistics (anand-bala)
- Added option for ``Monitor`` to append to existing file instead of overriding (sidney-tio)
- The env checker now raises an error when using dict observation spaces and observation keys don't match observation space keys

SB3-Contrib

- Fixed the issue of wrongly passing policy arguments when using ``CnnLstmPolicy`` or ``MultiInputLstmPolicy`` with ``RecurrentPPO`` (mlodel)

Bug Fixes:

- Fixed issue where ``PPO`` gives NaN if rollout buffer provides a batch of size 1 (hughperkins)
- Fixed the issue that ``predict`` does not always return action as ``np.ndarray`` (qgallouedec)
- Fixed division by zero error when computing FPS when a small number of time has elapsed in operating systems with low-precision timers.
- Added multidimensional action space support (qgallouedec)
- Fixed missing verbose parameter passing in the ``EvalCallback`` constructor (burakdmb)
- Fixed the issue that when updating the target network in DQN, SAC, TD3, the ``running_mean`` and ``running_var`` properties of batch norm layers are not updated (honglu2875)
- Fixed incorrect type annotation of the replay_buffer_class argument in ``common.OffPolicyAlgorithm`` initializer, where an instance instead of a class was required (Rocamonde)
- Fixed loading saved model with different number of envrionments
- Removed ``forward()`` abstract method declaration from ``common.policies.BaseModel`` (already defined in ``torch.nn.Module``) to fix type errors in subclasses (Rocamonde)
- Fixed the return type of ``.load()`` and ``.learn()`` methods in ``BaseAlgorithm`` so that they now use ``TypeVar`` (Rocamonde)
- Fixed an issue where keys with different tags but the same key raised an error in ``common.logger.HumanOutputFormat`` (Rocamonde and AdamGleave)

Others:

- Fixed ``DictReplayBuffer.next_observations`` typing (qgallouedec)
- Added support for ``device="auto"`` in buffers and made it default (qgallouedec)
- Updated ``ResultsWriter` (used internally by ``Monitor`` wrapper) to automatically create missing directories when ``filename`` is a path (dominicgkerr)

Documentation:

- Added an example of callback that logs hyperparameters to tensorboard. (timothe-chaumont)
- Fixed typo in docstring "nature" -> "Nature" (Melanol)
- Added info on split tensorboard logs into (Melanol)
- Fixed typo in ppo doc (francescoluciano)
- Fixed typo in install doc(jlp-ue)
- Clarified and standardized verbosity documentation
- Added link to a GitHub issue in the custom policy documentation (AlexPasqua)
- Fixed typos (Akhilez)

Page 2 of 5

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.