Stable-baselines3

Latest version: v2.4.0

Safety actively analyzes 687918 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 5

1.4.0

SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

Breaking Changes:

- Dropped python 3.6 support (as announced in previous release)
- Renamed ``mask`` argument of the ``predict()`` method to ``episode_start`` (used with RNN policies only)
- local variables ``action``, ``done`` and ``reward`` were renamed to their plural form for offpolicy algorithms (``actions``, ``dones``, ``rewards``),
this may affect custom callbacks.
- Removed ``episode_reward`` field from ``RolloutReturn()`` type


**Warning:**

An update to the ``HER`` algorithm is planned to support multi-env training and remove the max episode length constrain.
(see [PR 704](https://github.com/DLR-RM/stable-baselines3/pull/704))
This will be a backward incompatible change (model trained with previous version of ``HER`` won't work with the new version).



New Features:

- Added ``norm_obs_keys`` param for ``VecNormalize`` wrapper to configure which observation keys to normalize (kachayev)
- Added experimental support to train off-policy algorithms with multiple envs (note: ``HerReplayBuffer`` currently not supported)
- Handle timeout termination properly for on-policy algorithms (when using ``TimeLimit``)
- Added ``skip`` option to ``VecTransposeImage`` to skip transforming the channel order when the heuristic is wrong
- Added ``copy()`` and ``combine()`` methods to ``RunningMeanStd``

[SB3-Contrib](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib)

- Added Trust Region Policy Optimization (TRPO) (cyprienc)
- Added Augmented Random Search (ARS) (sgillen)
- Coming soon: PPO LSTM, see https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/53

Bug Fixes:

- Fixed a bug where ``set_env()`` with ``VecNormalize`` would result in an error with off-policy algorithms (thanks cleversonahum)
- FPS calculation is now performed based on number of steps performed during last ``learn`` call, even when ``reset_num_timesteps`` is set to ``False`` (kachayev)
- Fixed evaluation script for recurrent policies (experimental feature in SB3 contrib)
- Fixed a bug where the observation would be incorrectly detected as non-vectorized instead of throwing an error
- The env checker now properly checks and warns about potential issues for continuous action spaces when the boundaries are too small or when the dtype is not float32
- Fixed a bug in ``VecFrameStack`` with channel first image envs, where the terminal observation would be wrongly created.


Others:

- Added a warning in the env checker when not using ``np.float32`` for continuous actions
- Improved test coverage and error message when checking shape of observation
- Added ``newline="\n"`` when opening CSV monitor files so that each line ends with ``\r\n`` instead of ``\r\r\n`` on Windows while Linux environments are not affected (hsuehch)
- Fixed ``device`` argument inconsistency (qgallouedec)

Documentation:

- Add drivergym to projects page (theDebugger811)
- Add highway-env to projects page (eleurent)
- Add tactile-gym to projects page (ac-93)
- Fix indentation in the RL tips page (cove9988)
- Update GAE computation docstring
- Add documentation on exporting to TFLite/Coral
- Added JMLR paper and updated citation
- Added link to RL Tips and Tricks video
- Updated ``BaseAlgorithm.load`` docstring (Demetrio92)
- Added a note on ``load`` behavior in the examples (Demetrio92)
- Updated SB3 Contrib doc
- Fixed A2C and migration guide guidance on how to set epsilon with RMSpropTFLike (thomasgubler)
- Fixed custom policy documentation (IperGiove)
- Added doc on Weights & Biases integration

1.3.0

**WARNING: This version will be the last one supporting Python 3.6 (end of life in Dec 2021).
We highly recommend you to upgrade to Python >= 3.7.**

SB3-Contrib changelog: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/releases/tag/v1.3.0

Breaking Changes:

- ``sde_net_arch`` argument in policies is deprecated and will be removed in a future version.
- ``_get_latent`` (``ActorCriticPolicy``) was removed
- All logging keys now use underscores instead of spaces (timokau). Concretely this changes:

- ``time/total timesteps`` to ``time/total_timesteps`` for off-policy algorithms (PPO and A2C) and the eval callback (on-policy algorithms already used the underscored version),
- ``rollout/exploration rate`` to ``rollout/exploration_rate`` and
- ``rollout/success rate`` to ``rollout/success_rate``.

New Features:

- Added methods ``get_distribution`` and ``predict_values`` for ``ActorCriticPolicy`` for A2C/PPO/TRPO (cyprienc)
- Added methods ``forward_actor`` and ``forward_critic`` for ``MlpExtractor``
- Added ``sb3.get_system_info()`` helper function to gather version information relevant to SB3 (e.g., Python and PyTorch version)
- Saved models now store system information where agent was trained, and load functions have ``print_system_info`` parameter to help debugging load issues.

Bug Fixes:

- Fixed ``dtype`` of observations for ``SimpleMultiObsEnv``
- Allow `VecNormalize` to wrap discrete-observation environments to normalize reward
when observation normalization is disabled.
- Fixed a bug where ``DQN`` would throw an error when using ``Discrete`` observation and stochastic actions
- Fixed a bug where sub-classed observation spaces could not be used
- Added ``force_reset`` argument to ``load()`` and ``set_env()`` in order to be able to call ``learn(reset_num_timesteps=False)`` with a new environment

Others:

- Cap gym max version to 0.19 to avoid issues with atari-py and other breaking changes
- Improved error message when using dict observation with the wrong policy
- Improved error message when using ``EvalCallback`` with two envs not wrapped the same way.
- Added additional infos about supported python version for PyPi in ``setup.py``

Documentation:

- Add Rocket League Gym to list of supported projects (AechPro)
- Added gym-electric-motor to project page (wkirgsn)
- Added policy-distillation-baselines to project page (CUN-bjy)
- Added ONNX export instructions (batu)
- Update read the doc env (fixed ``docutils`` issue)
- Fix PPO environment name (IljaAvadiev)
- Fix custom env doc and add env registration example
- Update algorithms from SB3 Contrib
- Use underscores for numeric literals in examples to improve clarity

1.2.0

Breaking Changes:

- SB3 now requires PyTorch >= 1.8.1
- ``VecNormalize`` ``ret`` attribute was renamed to ``returns``

Bug Fixes:

- Hotfix for ``VecNormalize`` where the observation filter was not updated at reset (thanks vwxyzjn)
- Fixed model predictions when using batch normalization and dropout layers by calling ``train()`` and ``eval()`` (davidblom603)
- Fixed model training for DQN, TD3 and SAC so that their target nets always remain in evaluation mode (ayeright)
- Passing ``gradient_steps=0`` to an off-policy algorithm will result in no gradient steps being taken (vs as many gradient steps as steps done in the environment
during the rollout in previous versions)

Others:

- Enabled Python 3.9 in GitHub CI
- Fixed type annotations
- Refactored ``predict()`` by moving the preprocessing to ``obs_to_tensor()`` method

Documentation:

- Updated multiprocessing example
- Added example of ``VecEnvWrapper``
- Added a note about logging to tensorboard more often
- Added warning about simplicity of examples and link to RL zoo (MihaiAnca13)

1.1.0

Breaking Changes

- All customs environments (e.g. the ``BitFlippingEnv`` or ``IdentityEnv``) were moved to ``stable_baselines3.common.envs`` folder
- Refactored ``HER`` which is now the ``HerReplayBuffer`` class that can be passed to any off-policy algorithm
- Handle timeout termination properly for off-policy algorithms (when using ``TimeLimit``)
- Renamed ``_last_dones`` and ``dones`` to ``_last_episode_starts`` and ``episode_starts`` in ``RolloutBuffer``.
- Removed ``ObsDictWrapper`` as ``Dict`` observation spaces are now supported

python
her_kwargs = dict(n_sampled_goal=2, goal_selection_strategy="future", online_sampling=True)
SB3 < 1.1.0
model = HER("MlpPolicy", env, model_class=SAC, **her_kwargs)
SB3 >= 1.1.0:
model = SAC("MultiInputPolicy", env, replay_buffer_class=HerReplayBuffer, replay_buffer_kwargs=her_kwargs)


- Updated the KL Divergence estimator in the PPO algorithm to be positive definite and have lower variance (09tangriro)
- Updated the KL Divergence check in the PPO algorithm to be before the gradient update step rather than after end of epoch (09tangriro)
- Removed parameter ``channels_last`` from ``is_image_space`` as it can be inferred.
- The logger object is now an attribute ``model.logger`` that be set by the user using ``model.set_logger()``
- Changed the signature of ``logger.configure`` and ``utils.configure_logger``, they now return a ``Logger`` object
- Removed ``Logger.CURRENT`` and ``Logger.DEFAULT``
- Moved ``warn(), debug(), log(), info(), dump()`` methods to the ``Logger`` class
- ``.learn()`` now throws an import error when the user tries to log to tensorboard but the package is not installed

New Features

- Added support for single-level ``Dict`` observation space (JadenTravnik)
- Added ``DictRolloutBuffer`` ``DictReplayBuffer`` to support dictionary observations (JadenTravnik)
- Added ``StackedObservations`` and ``StackedDictObservations`` that are used within ``VecFrameStack``
- Added simple 4x4 room Dict test environments
- ``HerReplayBuffer`` now supports ``VecNormalize`` when ``online_sampling=False``
- Added [VecMonitor](https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/vec_env/vec_monitor.py) and [VecExtractDictObs](https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/vec_env/vec_extract_dict_obs.py) wrappers to handle gym3-style vectorized environments (vwxyzjn)
- Ignored the terminal observation if the it is not provided by the environment
such as the gym3-style vectorized environments. (vwxyzjn)
- Added policy_base as input to the OnPolicyAlgorithm for more flexibility (09tangriro)
- Added support for image observation when using ``HER``
- Added ``replay_buffer_class`` and ``replay_buffer_kwargs`` arguments to off-policy algorithms
- Added ``kl_divergence`` helper for ``Distribution`` classes (09tangriro)
- Added support for vector environments with ``num_envs > 1`` (benblack769)
- Added ``wrapper_kwargs`` argument to ``make_vec_env`` (amy12xx)

Bug Fixes

- Fixed potential issue when calling off-policy algorithms with default arguments multiple times (the size of the replay buffer would be the same)
- Fixed loading of ``ent_coef`` for ``SAC`` and ``TQC``, it was not optimized anymore (thanks Atlis)
- Fixed saving of ``A2C`` and ``PPO`` policy when using gSDE (thanks liusida)
- Fixed a bug where no output would be shown even if ``verbose>=1`` after passing ``verbose=0`` once
- Fixed observation buffers dtype in DictReplayBuffer (c-rizz)
- Fixed EvalCallback tensorboard logs being logged with the incorrect timestep. They are now written with the timestep at which they were recorded. (skandermoalla)


Others

- Added ``flake8-bugbear`` to tests dependencies to find likely bugs
- Updated ``env_checker`` to reflect support of dict observation spaces
- Added Code of Conduct
- Added tests for GAE and lambda return computation
- Updated distribution entropy test (thanks 09tangriro)
- Added sanity check ``batch_size > 1`` in PPO to avoid NaN in advantage normalization

Documentation:

- Added gym pybullet drones project (JacopoPan)
- Added link to SuperSuit in projects (justinkterry)
- Fixed DQN example (thanks ltbd78)
- Clarified channel-first/channel-last recommendation
- Update sphinx environment installation instructions (tom-doerr)
- Clarified pip installation in Zsh (tom-doerr)
- Clarified return computation for on-policy algorithms (TD(lambda) estimate was used)
- Added example for using ``ProcgenEnv``
- Added note about advanced custom policy example for off-policy algorithms
- Fixed DQN unicode checkmarks
- Updated migration guide (juancroldan)
- Pinned ``docutils==0.16`` to avoid issue with rtd theme
- Clarified callback ``save_freq`` definition
- Added doc on how to pass a custom logger
- Remove recurrent policies from ``A2C`` docs (bstee615)

1.0

**First Major Version**

Blog post: https://araffin.github.io/post/sb3/

100+ pre-trained models in the zoo: https://github.com/DLR-RM/rl-baselines3-zoo

Breaking Changes:

- Removed `stable_baselines3.common.cmd_util` (already deprecated), please use `env_util` instead

<div class="warning">

<div class="admonition-title">

**Warning**

</div>

A refactoring of the `HER` algorithm is planned together with support for dictionary observations (see [PR 243](https://github.com/DLR-RM/stable-baselines3/pull/243) and
[351](https://github.com/DLR-RM/stable-baselines3/pull/351))
This will be a backward incompatible change (model trained with previous version of `HER` won't work with the new version).

</div>

New Features:

- Added support for `custom_objects` when loading models

Bug Fixes:

- Fixed a bug with `DQN` predict method when using `deterministic=False` with image space

Documentation:

- Fixed examples
- Added new project using SB3: rl\_reach (PierreExeter)
- Added note about slow-down when switching to PyTorch
- Add a note on continual learning and resetting environment
- Updated RL-Zoo to reflect the fact that is it more than a collection of trained agents
- Added images to illustrate the training loop and custom policies (created with <https://excalidraw.com/>)
- Updated the custom policy section

1.0rc1

Second release candidate

Page 3 of 5

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.