Utilsrl

Latest version: v0.6.10

Safety actively analyzes 625566 Python packages for vulnerabilities to keep your Python projects secure.

0.6.0

New Features
+ Transformers, for RL. We implement Transformer, GPT-2, and an experimental version of RWKV in `UtilsRL.net.attention`, to provide support for the incorporation of those highly expressive sequence modeling techniques into reinforcement learning.
+ DMControl env wrappers (39)

What's Changed?
+ Argument parsing. Now you can use `--config /path/to/config` to designate the default config file from CLI.
+ Refactored Loggers. We refactored the logger module, unified the interfaces, and made them more handy for out-of-box usage.
+ Removal of redundant features. We removed some features, such as the Monitor module and the callback functions during arg-parsing.

Bug Fix
+ PER batch update (31) LyndonKong
+ (Urgent) Sick gradient computing for SquashedGaussianActor (28) typoverflow

0.5.0

Compared to v0.4.0, UtilsRL v0.5.0 provides more easy-to-use and powerful tools:

New Features
1. `UtilsRL.net`: Besides the normal MLP/CNN/RNN modules, UtilsRL also provides:
+ `EnsembleLinear`&`EnsembleMLP` for **network ensemble** and **efficient inference**
+ `NoisyLinear` for **efficient exploration**
+ `Attention` for self-attention network design
2. `UtilsRL.rl.buffer`: We **refactored the buffers**, and provided a **highly efficient (5x) Prioritized Experience Replay (PER)** implemented by c++ and pybind11.
3. `UtilsRL.rl.actor`&`UtilsRL.rl.critic`: We now allow users to **customize the output layers** in actors and critics. This is extremely useful to fulfill various modifications to network structures, for example, ensemble/adding layer normalization/adding dropout and etc.
4. `UtilsRL.logger`: We now support file logger, tensorboard logger and **WandB loggers**, and users can use `CompositeLogger` to freely combine different loggers.

New Examples
We add two examples to illustrate the pipeline of integrating UtilsRL (see `examples/`):
+ PPO algorithm, MuJoCo (continuous observation space, continuous control)
+ Rainbow algorithm, Atari (image input, discrete control)

Old bugs（already fixed）
* Fix: return proper value for `random_batch` method of `ReplayPool` by mansicer in https://github.com/typoverflow/UtilsRL/pull/17
* Fix argparse logic for float precision; fix interface error for ensemble linear by typoverflow in https://github.com/typoverflow/UtilsRL/pull/20

New friends
* mansicer made their first contribution in https://github.com/typoverflow/UtilsRL/pull/17

**Full Changelog**: https://github.com/typoverflow/UtilsRL/compare/v0.4.1...v0.5.0

0.4.0

本次更新
+ 添加了RL网络模块，其中包括常见的策略网络输出头和replay buffer、normalizer等结构。
+ 添加了tensorboard event file解析函数，作图模块待优化。
+ 添加了快照功能（snapshot）。
+ 添加了PPO在Gym-MuJoCo下的测试样例。
+ 对README进行了更新。

Bug修复
+ NameSpace类成员变量添加位置错误 (1)
+ 数个代码细节上的调整。

后续计划
项目还在持续改进，欢迎功能上的建议和bug report！
+ 功能上的改进计划见Issue区。
+ 完善框架文档，完全重构README。

**Full Changelog**: https://github.com/typoverflow/UtilsRL/compare/v0.3.13...v0.4.0

0.2.0

Features
+ Argument parsing utils
+ Training process monitor
+ Loggers
+ Device and seed management

Releases

Has known vulnerabilities