Stable-baselines3

Latest version: v2.4.0

Safety actively analyzes 685525 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 5

2.4.0

> [!WARNING]
> Stable-Baselines3 (SB3) v2.4.0 will be the last one supporting Python 3.8 (end of life in October 2024)
> and PyTorch < 2.3.
> We highly recommended you to upgrade to Python >= 3.9 and PyTorch >= 2.3 (compatible with NumPy v2).


SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade


> [!NOTE]
> DQN (and QR-DQN) models saved with SB3 < 2.4.0 will show a warning about truncation of optimizer state when loaded with SB3 >= 2.4.0.
> To suppress the warning, simply save the model again.
> You can find more info in PR 1963

Breaking Changes:
- Increased minimum required version of Gymnasium to 0.29.1

New Features:
- Added support for ``pre_linear_modules`` and ``post_linear_modules`` in ``create_mlp`` (useful for adding normalization layers, like in DroQ or CrossQ)
- Enabled np.ndarray logging for TensorBoardOutputFormat as histogram (see GH1634) (iwishwasaneagle)
- Updated env checker to warn users when using multi-dim array to define `MultiDiscrete` spaces
- Added support for Gymnasium v1.0

Bug Fixes:
- Fixed memory leak when loading learner from storage, ``set_parameters()`` does not try to load the object data anymore
and only loads the PyTorch parameters (peteole)
- Cast type in compute gae method to avoid error when using torch compile (amjames)
- ``CallbackList`` now sets the ``.parent`` attribute of child callbacks to its own ``.parent``. (will-maclean)
- Fixed error when loading a model that has ``net_arch`` manually set to ``None`` (jak3122)
- Set requirement numpy<2.0 until PyTorch is compatible (https://github.com/pytorch/pytorch/issues/107302)
- Updated DQN optimizer input to only include q_network parameters, removing the target_q_network ones (corentinlger)
- Fixed ``test_buffers.py::test_device`` which was not actually checking the device of tensors (rhaps0dy)


[SB3-Contrib](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib)

- Added ``CrossQ`` algorithm, from "Batch Normalization in Deep Reinforcement Learning" paper (danielpalen)
- Added ``BatchRenorm`` PyTorch layer used in ``CrossQ`` (danielpalen)
- Updated QR-DQN optimizer input to only include quantile_net parameters (corentinlger)
- Fixed loading QRDQN changes `target_update_interval` (jak3122)


[RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo)
- Updated defaults hyperparameters for TQC/SAC for Swimmer-v4 (decrease gamma for more consistent results)

[SBX (SB3 + Jax)](https://github.com/araffin/sbx)
- Added CNN support for DQN
- Bug fix for SAC and related algorithms, optimize log of ent coeff to be consistent with SB3

Others:
- Fixed various typos (cschindlbeck)
- Remove unnecessary SDE noise resampling in PPO update (brn-dev)
- Updated PyTorch version on CI to 2.3.1
- Added a warning to recommend using CPU with on policy algorithms (A2C/PPO) and ``MlpPolicy``
- Switched to uv to download packages faster on GitHub CI
- Updated dependencies for read the doc
- Removed unnecessary ``copy_obs_dict`` method for ``SubprocVecEnv``, remove the use of ordered dict and rename ``flatten_obs`` to ``stack_obs``

Documentation:
- Updated PPO doc to recommend using CPU with ``MlpPolicy``
- Clarified documentation about planned features and citing software
- Added a note about the fact we are optimizing log of ent coeff for SAC


New Contributors
* amjames made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1922
* cschindlbeck made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1926
* peteole made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1908
* jak3122 made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1937
* will-maclean made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1939
* brn-dev made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1933
* chsahit made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1962
* Dev1nW made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/2017

**Full Changelog**: https://github.com/DLR-RM/stable-baselines3/compare/v2.3.2...v2.4.0

2.3.2

Bug fixes
* Reverted ``torch.load()`` to be called ``weights_only=False`` as it caused loading issue with old version of PyTorch. https://github.com/DLR-RM/stable-baselines3/pull/1913
* Cast learning_rate to float lambda for pickle safety when doing model.load by markscsmith in https://github.com/DLR-RM/stable-baselines3/pull/1901

Documentation
* Fix typo in changelog by araffin in https://github.com/DLR-RM/stable-baselines3/pull/1882
* Fixed broken link in ppo.rst by chaitanyabisht in https://github.com/DLR-RM/stable-baselines3/pull/1884
* Adding ER-MRL to community project by corentinlger in https://github.com/DLR-RM/stable-baselines3/pull/1904
* Fix tensorboad video slow numpy->torch conversion by NickLucche in https://github.com/DLR-RM/stable-baselines3/pull/1910

New Contributors
* chaitanyabisht made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1884
* markscsmith made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1901
* NickLucche made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1910

**Full Changelog**: https://github.com/DLR-RM/stable-baselines3/compare/v2.3.0...v2.3.2

2.3.0

> [!WARNING]
>Because of ``weights_only=True``, this release breaks loading of policies when using PyTorch 1.13.
>Please upgrade to PyTorch >= 2.0 or upgrade SB3 version (we reverted the change in SB3 2.3.2)

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

To upgrade:

pip install stable_baselines3 sb3_contrib --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade



Breaking Changes:

- The defaults hyperparameters of ``TD3`` and ``DDPG`` have been changed to be more consistent with ``SAC``

python

SB3 < 2.3.0 default hyperparameters
model = TD3("MlpPolicy", env, train_freq=(1, "episode"), gradient_steps=-1, batch_size=100)
SB3 >= 2.3.0:
model = TD3("MlpPolicy", env, train_freq=1, gradient_steps=1, batch_size=256)


> [!NOTE]
> Two inconsistencies remain: the default network architecture for ``TD3/DDPG`` is ``[400, 300]`` instead of ``[256, 256]`` for SAC (for backward compatibility reasons, see [report on the influence of the network size ](https://wandb.ai/openrlbenchmark/sbx/reports/SBX-TD3-Influence-of-policy-net--Vmlldzo2NDg1Mzk3)) and the default learning rate is 1e-3 instead of 3e-4 for SAC (for performance reasons, see [W&B report on the influence of the lr ](https://wandb.ai/openrlbenchmark/sbx/reports/SBX-TD3-RL-Zoo-v2-3-0a0-vs-SB3-TD3-RL-Zoo-2-2-1---Vmlldzo2MjUyNTQx>))


- The default ``learning_starts`` parameter of ``DQN`` have been changed to be consistent with the other offpolicy algorithms


python

SB3 < 2.3.0 default hyperparameters, 50_000 corresponded to Atari defaults hyperparameters
model = DQN("MlpPolicy", env, learning_starts=50_000)
SB3 >= 2.3.0:
model = DQN("MlpPolicy", env, learning_starts=100)


- For safety, ``torch.load()`` is now called with ``weights_only=True`` when loading torch tensors,
policy ``load()`` still uses ``weights_only=False`` as gymnasium imports are required for it to work
- When using ``huggingface_sb3``, you will now need to set ``TRUST_REMOTE_CODE=True`` when downloading models from the hub, as ``pickle.load`` is not safe.


New Features:

- Log success rate ``rollout/success_rate`` when available for on policy algorithms (corentinlger)

Bug Fixes:

- Fixed ``monitor_wrapper`` argument that was not passed to the parent class, and dones argument that wasn't passed to ``_update_into_buffer`` (corentinlger)

[SB3-Contrib](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib)

- Added ``rollout_buffer_class`` and ``rollout_buffer_kwargs`` arguments to MaskablePPO
- Fixed ``train_freq`` type annotation for tqc and qrdqn (Armandpl)
- Fixed ``sb3_contrib/common/maskable/*.py`` type annotations
- Fixed ``sb3_contrib/ppo_mask/ppo_mask.py`` type annotations
- Fixed ``sb3_contrib/common/vec_env/async_eval.py`` type annotations
- Add some additional notes about ``MaskablePPO`` (evaluation and multi-process) (icheered)


[RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo)

- Updated defaults hyperparameters for TD3/DDPG to be more consistent with SAC
- Upgraded MuJoCo envs hyperparameters to v4 (pre-trained agents need to be updated)
- Added test dependencies to `setup.py` (power-edge)
- Simplify dependencies of `requirements.txt` (remove duplicates from `setup.py`)

[SBX (SB3 + Jax)](https://github.com/araffin/sbx)

- Added support for ``MultiDiscrete`` and ``MultiBinary`` action spaces to PPO
- Added support for large values for gradient_steps to SAC, TD3, and TQC
- Fix ``train()`` signature and update type hints
- Fix replay buffer device at load time
- Added flatten layer
- Added ``CrossQ``

Others:

- Updated black from v23 to v24
- Updated ruff to >= v0.3.1
- Updated env checker for (multi)discrete spaces with non-zero start.

Documentation:

- Added a paragraph on modifying vectorized environment parameters via setters (fracapuano)
- Updated callback code example
- Updated export to ONNX documentation, it is now much simpler to export SB3 models with newer ONNX Opset!
- Added video link to "Practical Tips for Reliable Reinforcement Learning" video
- Added ``render_mode="human"`` in the README example (marekm4)
- Fixed docstring signature for sum_independent_dims (stagoverflow)
- Updated docstring description for ``log_interval`` in the base class (rushitnshah).



**Full Changelog**: https://github.com/DLR-RM/stable-baselines3/compare/v2.2.1...v2.3.0

2.2.1

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

To upgrade:

pip install stable_baselines3 sb3_contrib --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade


> [!NOTE]
> Stable-Baselines3 (SB3) v2.2.0 was yanked after a breaking change was found in [GH1751](https://github.com/DLR-RM/stable-baselines3/issues/1751).
> Please use SB3 v2.2.1 and not v2.2.0.



Breaking Changes:

- Switched to ``ruff`` for sorting imports (isort is no longer needed), black and ruff version now require a minimum version
- Dropped ``x is False`` in favor of ``not x``, which means that callbacks that wrongly returned None (instead of a boolean) will cause the training to stop (iwishiwasaneagle)

New Features:

- Improved error message of the ``env_checker`` for env wrongly detected as GoalEnv (``compute_reward()`` is defined)
- Improved error message when mixing Gym API with VecEnv API (see GH1694)
- Add support for setting ``options`` at reset with VecEnv via the ``set_options()`` method. Same as seeds logic, options are reset at the end of an episode (ReHoss)
- Added ``rollout_buffer_class`` and ``rollout_buffer_kwargs`` arguments to on-policy algorithms (A2C and PPO)

Bug Fixes:

- Prevents using squash_output and not use_sde in ActorCritcPolicy (PatrickHelm)
- Performs unscaling of actions in collect_rollout in OnPolicyAlgorithm (PatrickHelm)
- Moves VectorizedActionNoise into ``_setup_learn()`` in OffPolicyAlgorithm (PatrickHelm)
- Prevents out of bound error on Windows if no seed is passed (PatrickHelm)
- Calls ``callback.update_locals()`` before ``callback.on_rollout_end()`` in OnPolicyAlgorithm (PatrickHelm)
- Fixed replay buffer device after loading in OffPolicyAlgorithm (PatrickHelm)
- Fixed ``render_mode`` which was not properly loaded when using ``VecNormalize.load()``
- Fixed success reward dtype in ``SimpleMultiObsEnv`` (NixGD)
- Fixed check_env for Sequence observation space (corentinlger)
- Prevents instantiating BitFlippingEnv with conflicting observation spaces (kylesayrs)
- Fixed ResourceWarning when loading and saving models (files were not closed), please note that only path are closed automatically,
the behavior stay the same for tempfiles (they need to be closed manually),
the behavior is now consistent when loading/saving replay buffer

[SB3-Contrib](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib)

- Added ``set_options`` for ``AsyncEval``
- Added ``rollout_buffer_class`` and ``rollout_buffer_kwargs`` arguments to TRPO

[RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo)

- Removed `gym` dependency, the package is still required for some pretrained agents.
- Added `--eval-env-kwargs` to `train.py` (Quentin18)
- Added `ppo_lstm` to hyperparams_opt.py (technocrat13)
- Upgraded to `pybullet_envs_gymnasium>=0.4.0`
- Removed old hacks (for instance limiting offpolicy algorithms to one env at test time)
- Updated docker image, removed support for X server
- Replaced deprecated `optuna.suggest_uniform(...)` by `optuna.suggest_float(..., low=..., high=...)`

[SBX (SB3 + Jax)](https://github.com/araffin/sbx)

- Added ``DDPG`` and ``TD3`` algorithms

Others:

- Fixed ``stable_baselines3/common/callbacks.py`` type hints
- Fixed ``stable_baselines3/common/utils.py`` type hints
- Fixed ``stable_baselines3/common/vec_envs/vec_transpose.py`` type hints
- Fixed ``stable_baselines3/common/vec_env/vec_video_recorder.py`` type hints
- Fixed ``stable_baselines3/common/save_util.py`` type hints
- Updated docker images to Ubuntu Jammy using micromamba 1.5
- Fixed ``stable_baselines3/common/buffers.py`` type hints
- Fixed ``stable_baselines3/her/her_replay_buffer.py`` type hints
- Buffers do no call an additional ``.copy()`` when storing new transitions
- Fixed ``ActorCriticPolicy.extract_features()`` signature by adding an optional ``features_extractor`` argument
- Update dependencies (accept newer Shimmy/Sphinx version and remove ``sphinx_autodoc_typehints``)
- Fixed ``stable_baselines3/common/off_policy_algorithm.py`` type hints
- Fixed ``stable_baselines3/common/distributions.py`` type hints
- Fixed ``stable_baselines3/common/vec_env/vec_normalize.py`` type hints
- Fixed ``stable_baselines3/common/vec_env/__init__.py`` type hints
- Switched to PyTorch 2.1.0 in the CI (fixes type annotations)
- Fixed ``stable_baselines3/common/policies.py`` type hints
- Switched to ``mypy`` only for checking types
- Added tests to check consistency when saving/loading files

Documentation:

- Updated RL Tips and Tricks (include recommendation for evaluation, added links to DroQ, ARS and SBX).
- Fixed various typos and grammar mistakes

Full changelog: https://github.com/DLR-RM/stable-baselines3/compare/v2.1.0...v2.2.1

2.1.0

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

To upgrade:

pip install stable_baselines3 sb3_contrib --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade


Breaking Changes:

- Removed Python 3.7 support
- SB3 now requires PyTorch >= 1.13

New Features:

- Added Python 3.11 support
- Added Gymnasium 0.29 support (pseudo-rnd-thoughts)


[SB3-Contrib](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib)

- Fixed MaskablePPO ignoring ``stats_window_size`` argument
- Added Python 3.11 support

[RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo)

- Upgraded to Huggingface-SB3 >= 2.3
- Added Python 3.11 support

Bug Fixes:

- Relaxed check in logger, that was causing issue on Windows with colorama
- Fixed off-policy algorithms with continuous float64 actions (see 1145) (tobirohrer)
- Fixed ``env_checker.py`` warning messages for out of bounds in complex observation spaces (Gabo-Tor)

Others:

- Updated GitHub issue templates
- Fix typo in gym patch error message (lukashass)
- Refactor ``test_spaces.py`` tests

Documentation:

- Fixed callback example (BertrandDecoster)
- Fixed policy network example (kyle-he)
- Added mobile-env as new community project (stefanbschneider)
- Added [DeepNetSlice](https://github.com/AlexPasqua/DeepNetSlice) to community projects (AlexPasqua)

**Full Changelog**: https://github.com/DLR-RM/stable-baselines3/compare/v2.0.0...v2.1.0

2.0.0

> [!WARNING]
> Stable-Baselines3 (SB3) v2.0 will be the last one supporting python 3.7 (end of life in June 2023).
> We highly recommended you to upgrade to Python >= 3.8.

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx


To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade


Breaking Changes:

- Switched to Gymnasium as primary backend, Gym 0.21 and 0.26 are still supported via the ``shimmy`` package (carlosluis, arjun-kg, tlpss)
- The deprecated ``online_sampling`` argument of ``HerReplayBuffer`` was removed
- Removed deprecated ``stack_observation_space`` method of ``StackedObservations``
- Renamed environment output observations in ``evaluate_policy`` to prevent shadowing the input observations during callbacks (npit)
- Upgraded wrappers and custom environment to Gymnasium
- Refined the ``HumanOutputFormat`` file check: now it verifies if the object is an instance of ``io.TextIOBase`` instead of only checking for the presence of a ``write`` method.
- Because of new Gym API (0.26+), the random seed passed to ``vec_env.seed(seed=seed)`` will only be effective after then ``env.reset()`` call.

New Features:

- Added Gymnasium support (Gym 0.21 and 0.26 are supported via the ``shimmy`` package)

[SB3-Contrib](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib)

- Fixed QRDQN update interval for multi envs

[RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo)

- Gym 0.26+ patches to continue working with pybullet and TimeLimit wrapper
- Renamed `CarRacing-v1` to `CarRacing-v2` in hyperparameters
- Huggingface push to hub now accepts a `--n-timesteps` argument to adjust the length of the video
- Fixed `record_video` steps (before it was stepping in a closed env)
- Dropped Gym 0.21 support

Bug Fixes:

- Fixed ``VecExtractDictObs`` does not handle terminal observation (WeberSamuel)
- Set NumPy version to ``>=1.20`` due to use of ``numpy.typing`` (troiganto)
- Fixed loading DQN changes ``target_update_interval`` (tobirohrer)
- Fixed env checker to properly reset the env before calling ``step()`` when checking
for ``Inf`` and ``NaN`` (lutogniew)
- Fixed HER ``truncate_last_trajectory()`` (lbergmann1)
- Fixed HER desired and achieved goal order in reward computation (JonathanKuelz)


Others:

- Fixed ``stable_baselines3/a2c/*.py`` type hints
- Fixed ``stable_baselines3/ppo/*.py`` type hints
- Fixed ``stable_baselines3/sac/*.py`` type hints
- Fixed ``stable_baselines3/td3/*.py`` type hints
- Fixed ``stable_baselines3/common/base_class.py`` type hints
- Fixed ``stable_baselines3/common/logger.py`` type hints
- Fixed ``stable_baselines3/common/envs/*.py`` type hints
- Fixed ``stable_baselines3/common/vec_env/vec_monitor|vec_extract_dict_obs|util.py`` type hints
- Fixed ``stable_baselines3/common/vec_env/base_vec_env.py`` type hints
- Fixed ``stable_baselines3/common/vec_env/vec_frame_stack.py`` type hints
- Fixed ``stable_baselines3/common/vec_env/dummy_vec_env.py`` type hints
- Fixed ``stable_baselines3/common/vec_env/subproc_vec_env.py`` type hints
- Upgraded docker images to use mamba/micromamba and CUDA 11.7
- Updated env checker to reflect what subset of Gymnasium is supported and improve GoalEnv checks
- Improve type annotation of wrappers
- Tests envs are now checked too
- Added render test for ``VecEnv`` and ``VecEnvWrapper``
- Update issue templates and env info saved with the model
- Changed ``seed()`` method return type from ``List`` to ``Sequence``
- Updated env checker doc and requirements for tuple spaces/goal envs

Documentation:

- Added Deep RL Course link to the Deep RL Resources page
- Added documentation about ``VecEnv`` API vs Gym API
- Upgraded tutorials to Gymnasium API
- Make it more explicit when using ``VecEnv`` vs Gym env
- Added UAV_Navigation_DRL_AirSim to the project page (heleidsn)
- Added ``EvalCallback`` example (sidney-tio)
- Update custom env documentation
- Added `pink-noise-rl` to projects page
- Fix custom policy example, ``ortho_init`` was ignored
- Added SBX page

**Full Changelog**: https://github.com/DLR-RM/stable-baselines3/compare/v1.8.0...v2.0.0

Page 1 of 5

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.