- env: add pistonball MARL env and its unittest/example (833)
- env: update trading env (831)
- env: update ppo config for better discrete action space performance (809)
- env: remove unused config fields in MuJoCo PPO
- algo: add AWR algorithm (828)
- algo: add encoder in MAVAC (823)
- algo: add HPT model architecture (841)
- algo: fix multiple model wrappers reset bug (846)
- algo: add hybrid action space support to ActionNoiseWrapper (829)
- algo: fix mappo adv compute bug (812)
- feature: add resume_training option to allow the envstep and train_iter resume seamlessly (835)
- feature: polish old/new pipeline DistributedDataParallel (DDP) implementation (842)
- feature: adapt DingEnvWrapper to gymnasium (817)
- fix: priority buffer delete bug (844)
- fix: middleware collector env reset bug (845)
- fix: fix many unittest bugs
- style: downgrade pyecharts log level to warning and polish installation doc (838)
- style: polish necessary requirements
- style: polish api doc details
- style: polish DI-engine citation authors
- style: upgrade CI macos version from 12 to 13
2024.06.27(v0.5.2)
- env: add taxi env (799) (807)
- env: add ising model env (782)
- env: add new Flozen Lake env (781)
- env: optimize ppo continuous config in MuJoCo (801)
- env: fix masac smac config multi_agent=True bug (791)
- env: update/speed up pendulum ppo
- algo: fix gtrxl compatibility bug (796)
- algo: fix complex obs demo for ppo pipeline (786)
- algo: add naive PWIL demo
- algo: fix marl nstep td compatibility bug
- feature: add GPU utils (788)
- feature: add deprecated function decorator (778)
- style: relax flask requirement (811)
- style: add new badge (hellogithub) in readme (805)
- style: update discord link and badge in readme (795)
- style: fix typo in config.py (776)
- style: polish rl_utils api docs
- style: add constraint about numpy<2
- style: polish macos platform test version to 12
- style: polish ci python version
2024.02.04(v0.5.1)
- env: add MADDPG pettingzoo example (774)
- env: polish NGU Atari configs (767)
- env: fix bug in cliffwalking env (759)
- env: add PettingZoo replay video demo
- env: change default max retry in env manager from 5 to 1
- algo: add QGPO diffusion-model related algorithm (757)
- algo: add HAPPO multi-agent algorithm (717)
- algo: add DreamerV3 + MiniGrid adaption (725)
- algo: fix hppo entropy_weight to avoid nan error in log_prob (761)
- algo: fix structured action bug (760)
- algo: polish Decision Transformer entry (754)
- algo: fix EDAC policy/model bug
- fix: env typos
- fix: pynng requirements bug
- fix: communication module unittest bug
- style: polish policy API doc (762) (764) (768)
- style: add agent API doc (758)
- style: polish torch_utils/utils API doc (745) (747) (752) (755) (763)
2023.11.06(v0.5.0)
- env: add tabmwp env (667)
- env: polish anytrading env issues (731)
- algo: add PromptPG algorithm (667)
- algo: add Plan Diffuser algorithm (700)
- algo: add new pipeline implementation of IMPALA algorithm (713)
- algo: add dropout layers to DQN-style algorithms (712)
- feature: add new pipeline agent for sac/ddpg/a2c/ppo and Hugging Face support (637) (730) (737)
- feature: add more unittest cases for model (728)
- feature: add collector logging in new pipeline (735)
- fix: logger middleware problems (715)
- fix: ppo parallel bug (709)
- fix: typo in optimizer_helper.py (726)
- fix: mlp dropout if condition bug
- fix: drex collecting data unittest bugs
- style: polish env manager/wrapper comments and API doc (742)
- style: polish model comments and API doc (722) (729) (734) (736) (741)
- style: polish policy comments and API doc (732)
- style: polish rl_utils comments and API doc (724)
- style: polish torch_utils comments and API doc (738)
- style: update README.md and Colab demo (733)
- style: update metaworld docker image
2023.08.23(v0.4.9)
- env: add cliffwalking env (677)
- env: add lunarlander ppo config and example
- algo: add BCQ offline RL algorithm (640)
- algo: add Dreamerv3 model-based RL algorithm (652)
- algo: add tensor stream merge network tools (673)
- algo: add scatter connection model (680)
- algo: refactor Decision Transformer in new pipeline and support img input and discrete output (693)
- algo: add three variants of Bilinear classes and a FiLM class (703)
- feature: polish offpolicy RL multi-gpu DDP training (679)
- feature: add middleware for Ape-X distributed pipeline (696)
- feature: add example for evaluating trained DQN (706)
- fix: to_ndarray fails to assign dtype for scalars (708)
- fix: evaluator return episode_info compatibility bug
- fix: cql example entry wrong config bug
- fix: enable_save_figure env interface
- fix: redundant env info bug in evaluator
- fix: to_item unittest bug
- style: polish and simplify requirements (672)
- style: add Hugging Face Model Zoo badge (674)
- style: add openxlab Model Zoo badge (675)
- style: fix py37 macos ci bug and update default pytorch from 1.7.1 to 1.12.1 (678)
- style: fix mujoco-py compatibility issue for cython<3 (711)
- style: fix type spell error (704)
- style: fix pypi release actions ubuntu 18.04 bug
- style: update contact information (e.g. wechat)
- style: polish algorithm doc tables
2023.05.25(v0.4.8)
- env: fix gym hybrid reward dtype bug (664)
- env: fix atari env id noframeskip bug (655)
- env: fix typo in gym any_trading env (654)
- env: update td3bc d4rl config (659)
- env: polish bipedalwalker config
- algo: add EDAC offline RL algorithm (639)
- algo: add LN and GN norm_type support in ResBlock (660)
- algo: add normal value norm baseline for PPOF (658)
- algo: polish last layer init/norm in MLP (650)
- algo: polish TD3 monitor variable
- feature: add MAPPO/MASAC task example (661)
- feature: add PPO example for complex env observation (644)
- feature: add barrier middleware (570)
- fix: abnormal collector log and add record_random_collect option (662)
- fix: to_item compatibility bug (646)
- fix: trainer dtype transform compatibility bug
- fix: pettingzoo 1.23.0 compatibility bug
- fix: ensemble head unittest bug
- style: fix incompatible gym version bug in Dockerfile.env (653)
- style: add more algorithm docs
2023.04.11(v0.4.7)
- env: add dmc2gym env support and baseline (451)
- env: update pettingzoo to the latest version (597)
- env: polish icm/rnd+onppo config bugs and add app_door_to_key env (564)
- env: add lunarlander continuous TD3/SAC config
- env: polish lunarlander discrete C51 config
- algo: add Procedure Cloning (PC) imitation learning algorithm (514)
- algo: add Munchausen Reinforcement Learning (MDQN) algorithm (590)
- algo: add reward/value norm methods: popart & value rescale & symlog (605)
- algo: polish reward model config and training pipeline (624)
- algo: add PPOF reward space demo support (608)
- algo: add PPOF Atari demo support (589)
- algo: polish dqn default config and env examples (611)
- algo: polish comment and clean code about SAC
- feature: add language model (e.g. GPT) training utils (625)
- feature: remove policy cfg sub fields requirements (620)
- feature: add full wandb support (579)
- fix: confusing shallow copy operation about next_obs (641)
- fix: unsqueeze action_args in PDQN when shape is 1 (599)
- fix: evaluator return_info tensor type bug (592)
- fix: deque buffer wrapper PER bug (586)
- fix: reward model save method compatibility bug
- fix: logger assertion and unittest bug
- fix: bfs test py3.9 compatibility bug
- fix: zergling collector unittest bug
- style: add DI-engine torch-rpc p2p communication docker (628)
- style: add D4RL docker (591)
- style: correct typo in task (617)
- style: correct typo in time_helper (602)
- style: polish readme and add treetensor example
- style: update contributing doc
2023.02.16(v0.4.6)
- env: add metadrive env and related ppo config (574)
- env: add acrobot env and related dqn config (577)
- env: add carracing in box2d (575)
- env: add new gym hybrid viz (563)
- env: update cartpole IL config (578)
- algo: add BDQ algorithm (558)
- algo: add procedure cloning model (573)
- feature: add simplified PPOF (PPO × Family) interface (567) (568) (581) (582)
- fix: to_device and prev_state bug when using ttorch (571)
- fix: py38 and numpy unittest bugs (565)
- fix: typo in contrastive_loss.py (572)
- fix: dizoo envs pkg installation bugs
- fix: multi_trainer middleware unittest bug
- style: add evogym docker (580)
- style: fix metaworld docker bug
- style: fix setuptools high version incompatibility bug
- style: extend treetensor lowest version
2022.12.13(v0.4.5)
- env: add beergame supply chain optimization env (512)
- env: add env gym_pybullet_drones (526)
- env: rename eval reward to episode return (536)
- algo: add policy gradient algo implementation (544)
- algo: add MADDPG algo implementation (550)
- algo: add IMPALA continuous algo implementation (551)
- algo: add MADQN algo implementation (540)
- feature: add new task IMPALA-type distributed training scheme (321)
- feature: add load and save method for replaybuffer (542)
- feature: add more DingEnvWrapper example (525)
- feature: add evaluator more info viz support (538)
- feature: add trackback log for subprocess env manager (534)
- fix: halfcheetah td3 config file (537)
- fix: mujoco action_clip args compatibility bug (535)
- fix: atari a2c config entry bug
- fix: drex unittest compatibility bug
- style: add Roadmap issue of DI-engine (548)
- style: update related project link and new env doc
2022.10.31(v0.4.4)
- env: add modified gym-hybrid including moving, sliding and hardmove (505) (519)
- env: add evogym support (495) (527)
- env: add save_replay_gif option (506)
- env: adapt minigrid_env and related config to latest MiniGrid v2.0.0 (500)
- algo: add pcgrad optimizer (489)
- algo: add some features in MLP and ResBlock (511)
- algo: delete mcts related modules (518)
- feature: add wandb middleware and demo (488) (523) (528)
- feature: add new properties in Context (499)
- feature: add single env policy wrapper for policy deployment
- feature: add custom model demo and doc
- fix: build logger args and unittests (522)
- fix: total_loss calculation in PDQN (504)
- fix: save gif function bug
- fix: level sample unittest bug
- style: update contact email address (503)
- style: polish env log and resblock name
- style: add details button in readme
2022.09.23(v0.4.3)
- env: add rule-based gomoku expert (465)
- algo: fix a2c policy batch size bug (481)
- algo: enable activation option in collaq attention and mixer
- algo: minor fix about IBC (477)
- feature: add IGM support (486)
- feature: add tb logger middleware and demo
- fix: the type conversion in ding_env_wrapper (483)
- fix: di-orchestrator version bug in unittest (479)
- fix: data collection errors caused by shallow copies (475)
- fix: gym==0.26.0 seed args bug
- style: add readme tutorial link(environment & algorithm) (490) (493)
- style: adjust location of the default_model method in policy (453)
2022.09.08(v0.4.2)
- env: add rocket env (449)
- env: updated pettingzoo env and improved related performance (457)
- env: add mario env demo (443)
- env: add MAPPO multi-agent config (464)
- env: add mountain car (discrete action) environment (452)
- env: fix multi-agent mujoco gym comaptibility bug
- env: fix gfootball env save_replay variable init bug
- algo: add IBC (Implicit Behaviour Cloning) algorithm (401)
- algo: add BCO (Behaviour Cloning from Observation) algorithm (270)
- algo: add continuous PPOPG algorithm (414)
- algo: add PER in CollaQ (472)
- algo: add activation option in QMIX and CollaQ
- feature: update ctx to dataclass (467)
- fix: base_env FinalMeta bug about gym 0.25.0-0.25.1
- fix: config inplace modification bug
- fix: ding cli no argument problem
- fix: import errors after running setup.py (jinja2, markupsafe)
- fix: conda py3.6 and cross platform build bug
- style: add project state and datetime in log dir (455)
- style: polish notes for q-learning model (427)
- style: revision to mujoco dockerfile and validation (474)
- style: add dockerfile for cityflow env
- style: polish default output log format
2022.08.12(v0.4.1)
- env: add gym trading env (424)
- env: add board games env (tictactoe, gomuku, chess) (356)
- env: add sokoban env (397) (429)
- env: add BC and DQN demo for gfootball (418) (423)
- env: add discrete pendulum env (395)
- algo: add STEVE model-based algorithm (363)
- algo: add PLR algorithm (408)
- algo: plugin ST-DIM in PPO (379)
- feature: add final result saving in training pipeline
- fix: random policy randomness bug
- fix: action_space seed compalbility bug
- fix: discard message sent by self in redis mq (354)
- fix: remove pace controller (400)
- fix: import error in serial_pipeline_trex (410)
- fix: unittest hang and fail bug (413)
- fix: DREX collect data unittest bug
- fix: remove unused import cv2
- fix: ding CLI env/policy option bug
- style: upgrade Python version from 3.6-3.8 to 3.7-3.9
- style: upgrade gym version from 0.20.0 to 0.25.0
- style: upgrade torch version from 1.10.0 to 1.12.0
- style: upgrade mujoco bin from 2.0.0 to 2.1.0
- style: add buffer api description (371)
- style: polish VAE comments (404)
- style: unittest for FQF (412)
- style: add metaworld dockerfile (432)
- style: remove opencv requirement in default setting
- style: update long description in setup.py
2022.06.21(v0.4.0)
- env: add MAPPO/MASAC all configs in SMAC (310) **(SOTA results in SMAC!!!)**
- env: add dmc2gym env (344) (360)
- env: remove DI-star requirements of dizoo/smac, use official pysc2 (302)
- env: add latest GAIL mujoco config (298)
- env: polish procgen env (311)
- env: add MBPO ant and humanoid config for mbpo (314)
- env: fix slime volley env obs space bug when agent_vs_agent
- env: fix smac env obs space bug
- env: fix import path error in lunarlander (362)
- algo: add Decision Transformer algorithm (327) (364)
- algo: add on-policy PPG algorithm (312)
- algo: add DDPPO & add model-based SAC with lambda-return algorithm (332)
- algo: add infoNCE loss and ST-DIM algorithm (326)
- algo: add FQF distributional RL algorithm (274)
- algo: add continuous BC algorithm (318)
- algo: add pure policy gradient PPO algorithm (382)
- algo: add SQIL + SAC algorithm (348)
- algo: polish NGU and related modules (283) (343) (353)
- algo: add marl distributional td loss (331)
- feature: add new worker middleware (236)
- feature: refactor model-based RL pipeline (ding/world_model) (332)
- feature: refactor logging system in the whole DI-engine (316)
- feature: add env supervisor design (330)
- feature: support async reset for envpool env manager (250)
- feature: add log videos to tensorboard (320)
- feature: refactor impala cnn encoder interface (378)
- fix: env save replay bug
- fix: transformer mask inplace operation bug
- fix: transtion_with_policy_data bug in SAC and PPG
- style: add dockerfile for ding:hpc image (337)
- style: fix mpire 2.3.5 which handles default processes more elegantly (306)
- style: use FORMAT_DIR instead of ./ding (309)
- style: update quickstart colab link (347)
- style: polish comments in ding/model/common (315)
- style: update mujoco docker download path (386)
- style: fix protobuf new version compatibility bug
- style: fix torch1.8.0 torch.div compatibility bug
- style: update doc links in readme
- style: add outline in readme and update wechat image
- style: update head image and refactor docker dir
2022.04.23(v0.3.1)
- env: polish and standardize dizoo config (252) (255) (249) (246) (262) (261) (266) (273) (263) (280) (259) (286) (277) (290) (289) (299)
- env: add GRF academic env and config (281)
- env: update env inferface of GRF (258)
- env: update D4RL offline RL env and config (285)
- env: polish PomdpAtariEnv (254)
- algo: DREX algorithm (218)
- feature: separate mq and parallel modules, add redis (247)
- feature: rename env variables; fix attach_to parameter (244)
- feature: env implementation check (275)
- feature: adjust and set the max column number of tabulate in log (296)
- feature: add drop_extra option for sample collect
- feature: speed up GTrXL forward method + GRU unittest (253) (292)
- fix: add act_scale in DingEnvWrapper; fix envpool env manager (245)
- fix: auto_reset=False and env_ref bug in env manager (248)
- fix: data type and deepcopy bug in RND (288)
- fix: share_memory bug and multi_mujoco env (279)
- fix: some bugs in GTrXL (276)
- fix: update gym_vector_env_manager and add more unittest (241)
- fix: mdpolicy random collect bug (293)
- fix: gym.wrapper save video replay bug
- fix: collect abnormal step format bug and add unittest
- test: add buffer benchmark & socket test (284)
- style: upgrade mpire (251)
- style: add GRF(google research football) docker (256)
- style: update policy and gail comment
2022.03.24(v0.3.0)
- env: add bitfilp HER DQN benchmark (192) (193) (197)
- env: slime volley league training demo (229)
- algo: Gated TransformXL (GTrXL) algorithm (136)
- algo: TD3 + VAE(HyAR) latent action algorithm (152)
- algo: stochastic dueling network (234)
- algo: use log prob instead of using prob in ACER (186)
- feature: support envpool env manager (228)
- feature: add league main and other improvements in new framework (177) (214)
- feature: add pace controller middleware in new framework (198)
- feature: add auto recover option in new framework (242)
- feature: add k8s parser in new framework (243)
- feature: support async event handler and logger (213)
- feautre: add grad norm calculator (205)
- feautre: add gym vector env manager (147)
- feautre: add train_iter and env_step in serial pipeline (212)
- feautre: add rich logger handler (219) (223) (232)
- feature: add naive lr_scheduler demo
- refactor: new BaseEnv and DingEnvWrapper (171) (231) (240)
- polish: MAPPO and MASAC smac config (209) (239)
- polish: QMIX smac config (175)
- polish: R2D2 atari config (181)
- polish: A2C atari config (189)
- polish: GAIL box2d and mujoco config (188)
- polish: ACER atari config (180)
- polish: SQIL atari config (230)
- polish: TREX atari/mujoco config
- polish: IMPALA atari config
- polish: MBPO/D4PG mujoco config
- fix: random_collect compatible to episode collector (190)
- fix: remove default n_sample/n_episode value in policy config (185)
- fix: PDQN model bug on gpu device (220)
- fix: TREX algorithm CLI bug (182)
- fix: DQfD JE computation bug and move to AdamW optimizer (191)
- fix: pytest problem for parallel middleware (211)
- fix: mujoco numpy compatibility bug
- fix: markupsafe 2.1.0 bug
- fix: framework parallel module network emit bug
- fix: mpire bug and disable algotest in py3.8
- fix: lunarlander env import and env_id bug
- fix: icm unittest repeat name bug
- fix: buffer thruput close bug
- test: resnet unittest (199)
- test: SAC/SQN unittest (207)
- test: CQL/R2D3/GAIL unittest (201)
- test: NGU td unittest (210)
- test: model wrapper unittest (215)
- test: MAQAC model unittest (226)
- style: add doc docker (221)
2022.01.01(v0.2.3)
- env: add multi-agent mujoco env (146)
- env: add delay reward mujoco env (145)
- env: fix port conflict in gym_soccer (139)
- algo: MASAC algorithm (112)
- algo: TREX algorithm (119) (144)
- algo: H-PPO hybrid action space algorithm (140)
- algo: residual link in R2D2 (150)
- algo: gumbel softmax (169)
- algo: move actor_head_type to action_space field
- feature: new main pipeline and async/parallel framework (142) (166) (168)
- feature: refactor buffer, separate algorithm and storage (129)
- feature: cli in new pipeline(ditask) (160)
- feature: add multiprocess tblogger, fix circular reference problem (156)
- feature: add multiple seed cli
- feature: polish eps_greedy_multinomial_sample in model_wrapper (154)
- fix: R2D3 abs priority problem (158) (161)
- fix: multi-discrete action space policies random action bug (167)
- fix: doc generate bug with enum_tools (155)
- style: more comments about R2D2 (149)
- style: add doc about how to migrate a new env
- style: add doc about env tutorial in dizoo
- style: add conda auto release (148)
- style: udpate zh doc link
- style: update kaggle tutorial link
2021.12.03(v0.2.2)
- env: apple key to door treasure env (128)
- env: add bsuite memory benchmark (138)
- env: polish atari impala config
- algo: Guided Cost IRL algorithm (57)
- algo: ICM exploration algorithm (41)
- algo: MP-DQN hybrid action space algorithm (131)
- algo: add loss statistics and polish r2d3 pong config (126)
- feautre: add renew env mechanism in env manager and update timeout mechanism (127) (134)
- fix: async subprocess env manager reset bug (137)
- fix: keepdims name bug in model wrapper
- fix: on-policy ppo value norm bug
- fix: GAE and RND unittest bug
- fix: hidden state wrapper h tensor compatiblity
- fix: naive buffer auto config create bug
- style: add supporters list
2021.11.22(v0.2.1)
- env: gym-hybrid env (86)
- env: gym-soccer (HFO) env (94)
- env: Go-Bigger env baseline (95)
- env: add the bipedalwalker config of sac and ppo (121)
- algo: DQfD Imitation Learning algorithm (48) (98)
- algo: TD3BC offline RL algorithm (88)
- algo: MBPO model-based RL algorithm (113)
- algo: PADDPG hybrid action space algorithm (109)
- algo: PDQN hybrid action space algorithm (118)
- algo: fix R2D2 bugs and produce benchmark, add naive NGU (40)
- algo: self-play training demo in slime_volley env (23)
- algo: add example of GAIL entry + config for mujoco (114)
- feature: enable arbitrary policy num in serial sample collector
- feautre: add torch DataParallel for single machine multi-GPU
- feature: add registry force_overwrite argument
- feature: add naive buffer periodic thruput seconds argument
- test: add pure docker setting test (103)
- test: add unittest for dataset and evaluator (107)
- test: add unittest for on-policy algorithm (92)
- test: add unittest for ppo and td (MARL case) (89)
- test: polish collector benchmark test
- fix: target model wrapper hard reset bug
- fix: fix learn state_dict target model bug
- fix: ppo bugs and update atari ppo offpolicy config (108)
- fix: pyyaml version bug (99)
- fix: small fix on bsuite environment (117)
- fix: discrete cql unittest bug
- fix: release workflow bug
- fix: base policy model state_dict overlap bug
- fix: remove on_policy option in dizoo config and entry
- fix: remove torch in env
- style: gym version > 0.20.0
- style: torch version >= 1.1.0, <= 1.10.0
- style: ale-py == 0.7.0
2021.9.30(v0.2.0)
- env: overcooked env (20)
- env: procgen env (26)
- env: modified predator env (30)
- env: d4rl env (37)
- env: imagenet dataset (27)
- env: bsuite env (58)
- env: move atari_py to ale-py
- algo: SQIL algorithm (25) (44)
- algo: CQL algorithm (discrete/continuous) (37) (68)
- algo: MAPPO algorithm (62)
- algo: WQMIX algorithm (24)
- algo: D4PG algorithm (76)
- algo: update multi discrete policy(dqn, ppo, rainbow) (51) (72)
- feature: image classification training pipeline (27)
- feature: add force_reproducibility option in subprocess env manager
- feature: add/delete/restart replicas via cli for k8s
- feautre: add league metric (trueskill and elo) (22)
- feature: add tb in naive buffer and modify tb in advanced buffer (39)
- feature: add k8s launcher and di-orchestrator launcher, add related unittest (45) (49)
- feature: add hyper-parameter scheduler module (38)
- feautre: add plot function (59)
- fix: acer bug and update atari result (21)
- fix: mappo nan bug and dict obs cannot unsqueeze bug (54)
- fix: r2d2 hidden state and obs arange bug (36) (52)
- fix: ppo bug when use dual_clip and adv > 0
- fix: qmix double_q hidden state bug
- fix: spawn context problem in interaction unittest (69)
- fix: formatted config no eval bug (53)
- fix: the catch statments that will never succeed and system proxy bug (71) (79)
- fix: lunarlander config
- fix: c51 head dimension mismatch bug
- fix: mujoco config typo bug
- fix: ppg atari config bug
- fix: max use and priority update special branch bug in advanced_buffer
- style: add docker deploy in github workflow (70) (78) (80)
- style: support PyTorch 1.9.0
- style: add algo/env list in README
- style: rename advanced_buffer register name to advanced
2021.8.3(v0.1.1)
- env: selfplay/league demo (12)
- env: pybullet env (16)
- env: minigrid env (13)
- env: atari enduro config (11)
- algo: on policy PPO (9)
- algo: ACER algorithm (14)
- feature: polish experiment directory structure (10)
- refactor: split doc to new repo (4)
- fix: atari env info action space bug
- fix: env manager retry wrapper raise exception info bug
- fix: dist entry disable-flask-log typo
- style: codestyle optimization by lgtm (7)
- style: code/comment statistics badge
- style: github CI workflow
2021.7.8(v0.1.0)