API Change
1. add new `BaseEnv` definition:
- remove `info` method
- add `random_action` method
- add `observation_space`, `action_space`, `reward_space` properties
- [Env English doc](https://di-engine-docs.readthedocs.io/en/latest/best_practice/ding_env.html) | [环境中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/best_practice/ding_env_zh.html)
2. modify the return value of `eval` method in `InteractionSerialEvaluator` class from `Tuple[bool, float]` to `Tuple[bool, dict]`.
3. move the default logger to rich logger, you can set env variable like `export ENABLE_RICH_LOGGING=False` to disable it.
4. add `train_iter` and `env_step` argument in ding CLI.
- you can use them like `ding -m serial -c pendulum_sac_config.py -s 0 --train-iter 1e3`
5. remove default `n_sample/n_episode` value in policy default config.
Env (dizoo)
1. add bitfilp HER DQN benchmark (192) (193) (197)
2. add slime volley league training demo (229)
Algorithm
1. Gated TransformXL (GTrXL) algorithm (136)
2. TD3 + VAE(HyAR) latent action algorithm (152)
6. stochastic dueling network (234)
7. use log prob instead of using prob in ACER (186)
Feature
1. support envpool env manager (228)
2. add league main and other improvements in new framework (177) (214)
3. add pace controller middleware in new framework (198)
4. add auto recover option in new framework (242)
5. add k8s parser in new framework (243)
8. support async event handler and logger (213)
9. add grad norm calculator (205)
10. add gym vector env manager (147)
11. add train_iter and env_step in serial pipeline (212)
12. add rich logger handler (219) (223) (232)
13. add naive lr_scheduler demo
Refactor
1. new BaseEnv and DingEnvWrapper (171) (231) (240) [Env English doc](https://di-engine-docs.readthedocs.io/en/latest/best_practice/ding_env.html) | [环境中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/best_practice/ding_env_zh.html)
Polish
Improve configurations in dizoo and add more algorithm benchmark [doc example](https://di-engine-docs.readthedocs.io/en/latest/hands_on/dqn.html) | [文档示例](https://di-engine-docs.readthedocs.io/zh_CN/latest/hands_on/dqn_zh.html)
1. MAPPO and MASAC smac config (209) (239)
2. QMIX smac config (175)
3. R2D2 atari config (181)
4. A2C atari config (189)
5. GAIL box2d and mujoco config (188)
6. ACER atari config (180)
7. SQIL atari config (230)
8. TREX atari/mujoco config
9. IMPALA atari config
10. MBPO/D4PG mujoco config
Fix
1. random_collect compatible to episode collector (190)
2. remove default n_sample/n_episode value in policy config (185)
3. PDQN model bug on gpu device (220)
4. TREX algorithm CLI bug (182)
5. DQfD JE computation bug and move to AdamW optimizer (191)
6. pytest problem for parallel middleware (211)
7. mujoco numpy compatibility bug
8. markupsafe 2.1.0 bug
9. framework parallel module network emit bug
10. mpire bug and disable algotest in py3.8
11. lunarlander env import and env_id bug
12. icm unittest repeat name bug
13. buffer thruput close bug
Test
1. resnet unittest (199)
2. SAC/SQN unittest (207)
3. CQL/R2D3/GAIL unittest (201)
4. NGU td unittest (210)
5. model wrapper unittest (215)
6. MAQAC model unittest (226)
Style
1. add doc docker (221) (latex support)
**Contributors: PaParaZz1 sailxjx puyuan1996 Will-Nie Weiyuhong-1998 davide97l zjowowen LuciusMos kxzxvbk Hcnaeg jayyoung0802 simonat2011 jiaruonan**