Compared to v0.4.0, UtilsRL v0.5.0 provides more easy-to-use and powerful tools:
New Features
1. ``: Besides the normal MLP/CNN/RNN modules, UtilsRL also provides:
+ `EnsembleLinear`&`EnsembleMLP` for **network ensemble** and **efficient inference**
+ `NoisyLinear` for **efficient exploration**
+ `Attention` for self-attention network design
2. `UtilsRL.rl.buffer`: We **refactored the buffers**, and provided a **highly efficient (5x) Prioritized Experience Replay (PER)** implemented by c++ and pybind11.
3. ``&`UtilsRL.rl.critic`: We now allow users to **customize the output layers** in actors and critics. This is extremely useful to fulfill various modifications to network structures, for example, ensemble/adding layer normalization/adding dropout and etc.
4. `UtilsRL.logger`: We now support file logger, tensorboard logger and **WandB loggers**, and users can use `CompositeLogger` to freely combine different loggers.
New Examples
We add two examples to illustrate the pipeline of integrating UtilsRL (see `examples/`):
+ PPO algorithm, MuJoCo (continuous observation space, continuous control)
+ Rainbow algorithm, Atari (image input, discrete control)
Old bugs(already fixed)
* Fix: return proper value for `random_batch` method of `ReplayPool` by mansicer in
* Fix argparse logic for float precision; fix interface error for ensemble linear by typoverflow in
New friends
* mansicer made their first contribution in
**Full Changelog**: