Sheeprl

Latest version: v0.5.7

Safety actively analyzes 683530 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 5

0.3.2

In this release we have fixed the logging time of every algorithm. In particular:

* The `Time/sps_env_interaction` measures the steps-per-second of the environment interaction of the agent, namely the forward to obtain the new action given the observation and the execution of `step` method of the environment. This value is local to the rank-0 and takes into consideration the `action_repeat` that one set through hydra/cli
* The `Time/sps_train` measures the steps-per-second of the train function that runs in a distributed manner, considering all the ranks calling the train function

0.3.1

In this release we have refactored some names inside every algorithm, in particular:

* we have introduced the concept of `policy_step`, which is the number of (distributed) policy steps per environment step, where the environment step does not take into consideration the action repeat, i.e. is the number of times the policy is called to collect an action given an observation. If one has `n` ranks and `m` environments per rank, then the number of policy steps per environment step is `policy_steps = n * m`

We have also refactored the hydra configs, in particular:

* we have introduced both the `metric`, `checkpoint` and `buffer` config, containing the shared hyperparameters for those objects in every algorithm
* the `metric` config has the `metric.log_every` parameter, which controls the logging frequency. Since it's hard for the `policy_step` variable to be divisible by the `metric.log_every` value, the logging will happen as soon as `policy_step - last_log >= cfg.metric.log_every`, with `last_log = policy_step` is updated everytime something is logged
* the `checkpoint` has the `every` and `resume_from` parameters. The `every` parameter works as the `metric.log_every` one, while the `resume_from` specifies the experiment folder, which must contain the `.hydra` folder, to resume the training from. This is now only supported by the Dreamer algorithms
* `num_envs` and `clip_reward` have been moved to the `env` config

0.3.0

This new release introduces [hydra](https://hydra.cc) as the default configuration manager. In particular it fixes #74 and automatically 75, since now the `cnn_keys` and `mlp_keys` can be specified separately for both the encoder and decoder.
The changes are mainly the following:

* Dreamer-V3 initialization follows directly Hafner's implementation (adapted from https://github.com/NM512/dreamerv3-torch/blob/main/tools.py)
* all `args.py` and the `HFArgumentParser` have been removed. Configs are now specified under the `sheeprl/configs` folder and hydra is the default configuration manager
* Every environment wrapper is directly instantiated through the `hydra.utils.instantiate` inside the `make_env` or `make_dict_env` method: in this way one can easily modify the wrapper passing whatever parameters to customize the env. Every wrapper **must** take as input the `id` parameter, which **must** be specified in the relative config
* Every optimizer is directly instantiated through the `hydra.utils.instantiate` and can be modified through the CLI on the experiment run
* The `howto/configs.md` has been added in which explain how the configs are organized inside the repo

0.2.2

* Fixed Dreamer-V3 test function: it uses its own instead of the Dreamer-V2 ones
* Added ruff in pre-commit and add pre-commit.ci

0.2.1

- Added Dreamer-V3 algorithm from https://arxiv.org/abs/2301.04104
- Added `RestartOnException` wrapper, which recreates and restarts the environments whenever somethig bad has happened during the `step` or `reset`. This has been added only on Dreamer-V3 algorithm
- Renamed classes and functions (in particular the `Player` classes fro both Dreamer-V1/V2)

0.2

- Added DiambraWrapper
- Added Multi-encoder/decoder to all the algorithms, but Droq, Sac and PPO Recurrent
- Added Multi-discrete support to PPO, DreamerV1 and P2E-DV1
- Modified the make_env function to be able to train the agents on environments that return both pixel-like and vector-like observations
- Modified the ReplayBuffer class to handle multiple observations
- Updated howtos
- Fixed 66
- Logger creation is moved to `sheeprl.utils.logger`
- Env creation is moved to `sheeprl.utils.env`
- PPO algo is now a single-folder algorithm (removed `ppo_pixel` and `ppo_continuous` folder)
- `sac_pixel` has been renamed to `sac_ae`
- Added support to `gymnasium==0.29.0`, `mujoco>=2.3.3` and `dm_control>=1.0.12`

Page 4 of 5

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.