Stable-baselines

Latest version: v2.10.2

Safety actively analyzes 706259 Python packages for vulnerabilities to keep your Python projects secure.

Page 2 of 5

2.5.1

**Warning: breaking change when using custom policies**

- doc update (fix example of result plotter + improve doc)
- fixed logger issues when stdout lacks `read` function
- fixed a bug in `common.dataset.Dataset` where shuffling was not disabled properly (it affects only PPO1 with recurrent policies)
- fixed output layer name for DDPG q function, used in pop-art normalization and l2 regularization of the critic
- added support for multi env recording to `generate_expert_traj` (XMaster96)
- added support for LSTM model recording to `generate_expert_traj` (XMaster96)
- `GAIL`: remove mandatory matplotlib dependency and refactor as subclass of `TRPO` (kantneel and AdamGleave)
- added `get_attr()`, `env_method()` and `set_attr()` methods for all VecEnv.
Those methods now all accept `indices` keyword to select a subset of envs.
`set_attr` now returns `None` rather than a list of `None`. (kantneel)
- `GAIL`: `gail.dataset.ExpertDataset` supports loading from memory rather than file, and
`gail.dataset.record_expert` supports returning in-memory rather than saving to file.
- added support in `VecEnvWrapper` for accessing attributes of arbitrarily deeply nested
instances of `VecEnvWrapper` and `VecEnv`. This is allowed as long as the attribute belongs
to exactly one of the nested instances i.e. it must be unambiguous. (kantneel)
- fixed bug where result plotter would crash on very short runs (Pastafarianist)
- added option to not trim output of result plotter by number of timesteps (Pastafarianist)
- clarified the public interface of `BasePolicy` and `ActorCriticPolicy`. **Breaking change** when using custom policies: `masks_ph` is now called `dones_ph`.
- support for custom stateful policies.
- fixed episode length recording in `trpo_mpi.utils.traj_segment_generator` (GerardMaggiolino)

2.5.0

- fixed various bugs in GAIL
- added scripts to generate dataset for gail
- added tests for GAIL + data for Pendulum-v0
- removed unused ``utils`` file in DQN folder
- fixed a bug in A2C where actions were cast to ``int32`` even in the continuous case
- added addional logging to A2C when Monitor wrapper is used
- changed logging for PPO2: do not display NaN when reward info is not present
- change default value of A2C lr schedule
- removed behavior cloning script
- added ``pretrain`` method to base class, in order to use behavior cloning on all models
- fixed ``close()`` method for DummyVecEnv.
- added support for Dict spaces in DummyVecEnv and SubprocVecEnv. (AdamGleave)
- added support for arbitrary multiprocessing start methods and added a warning about SubprocVecEnv that are not thread-safe by default. (AdamGleave)
- added support for Discrete actions for GAIL
- fixed deprecation warning for tf: replaces ``tf.to_float()`` by ``tf.cast()``
- fixed bug in saving and loading ddpg model when using normalization of obs or returns (tperol)
- changed DDPG default buffer size from 100 to 50000.
- fixed a bug in ``ddpg.py`` in ``combined_stats`` for eval. Computed mean on ``eval_episode_rewards`` and ``eval_qs`` (keshaviyengar)
- fixed a bug in ``setup.py`` that would error on non-GPU systems without TensorFlow installed

Welcome to AdamGleave who joins the maintainer team.

2.4.1

- fixed computation of training metrics in TRPO and PPO1
- added `reset_num_timesteps` keyword when calling train() to continue
tensorboard learning curves
- reduced the size taken by tensorboard logs (added a
`full_tensorboard_log` to enable full logging, which was the
previous behavior)
- fixed image detection for tensorboard logging
- fixed ACKTR for recurrent policies
- fixed gym breaking changes
- fixed custom policy examples in the doc for DQN and DDPG
- remove gym spaces patch for equality functions
- fixed tensorflow dependency: cpu version was installed overwritting
tensorflow-gpu when present.
- fixed a bug in `traj_segment_generator` (used in ppo1 and trpo)
where `new` was not updated. (spotted by junhyeokahn)

2.4.0

- added Soft Actor-Critic (SAC) model
- fixed a bug in DQN where prioritized_replay_beta_iters param was not used
- fixed DDPG that did not save target network parameters
- fixed bug related to shape of true_reward (abhiskk)
- fixed example code in documentation of tf_util:Function (JohannesAck)
- added learning rate schedule for SAC
- fixed action probability for continuous actions with actor-critic models
- added optional parameter to action_probability for likelihood calculation of given action being taken.
- added more flexible custom LSTM policies
- added auto entropy coefficient optimization for SAC
- clip continuous actions at test time too for all algorithms (except SAC/DDPG where it is not needed)
- added a mean to pass kwargs to policy when creating a model (+ save those kwargs)
- fixed DQN examples in DQN folder
- added possibility to pass activation function for DDPG, DQN and SAC

We would like to thanks our contributors (in random order): abhiskk JohannesAck
EliasHasle mrakgr Bleyddyn
and welcoming a new maintainer: erniejunior

2.3.0

- added support for storing model in file like object. (thanks to erniejunior)
- fixed wrong image detection when using tensorboard logging with DQN
- fixed bug in ppo2 when passing non callable lr after loading
- fixed tensorboard logging in ppo2 when nminibatches=1
- added early stoppping via callback return value (erniejunior)
- added more flexible custom mlp policies (erniejunior)

2.2.1

- added VecVideoRecorder to record mp4 videos from environment.

Page 2 of 5

Releases

Has known vulnerabilities

Previous Next

Stable-baselines

Page 2 of 5

2.5.1

2.5.0

2.4.1

2.4.0

2.3.0

2.2.1

Page 2 of 5

Links

Releases