This release provides many new features and bug fixes.
TorchRL now publishes Apple Silicon compatible wheels.
We drop coverage of python 3.7 in favour of 3.11.
New and updated algorithms
Most algorithms have been cleaned and designed to reach (at least) SOTA results.
![image](https://github.com/pytorch/rl/assets/25529882/c6a97c8a-5efa-4508-ac34-79b860bac95b)
Compatibility with MARL settings has been drastically improved, and we provide a good amount of MARL examples within the library:
![image](https://github.com/pytorch/rl/assets/25529882/b7799087-cd0d-4476-8550-cc9514ca7271)
A prototype RLHF training script is also proposed (1597)
A whole new category of offline RL algorithms have been integrated: Decision transformers.
* [Algorithm] Update offpolicy examples by BY571 in https://github.com/pytorch/rl/pull/1206
* [Algorithm] Online Decision transformer by BY571 in https://github.com/pytorch/rl/pull/1149
* [Algorithm] QMixer loss and multiagent models by matteobettini in https://github.com/pytorch/rl/pull/1378
* [Algorithm] RLHF end-to-end, clean by vmoens in https://github.com/pytorch/rl/pull/1597
* [Algorithm] Update A2C examples by albertbou92 in https://github.com/pytorch/rl/pull/1521
* [Algorithm] Update DDPG Example by BY571 in https://github.com/pytorch/rl/pull/1525
* [Algorithm] Update DT by BY571 in https://github.com/pytorch/rl/pull/1560
* [Algorithm] Update PPO examples by albertbou92 in https://github.com/pytorch/rl/pull/1495
* [Algorithm] Update SAC Example by BY571 in https://github.com/pytorch/rl/pull/1524
* [Algorithm] Update TD3 Example by BY571 in https://github.com/pytorch/rl/pull/1523
New features
One of the major new features of the library is the introduction of the terminated / truncated / done distinction at __no cost__ within the library. All third-party and primary environments are now compatible with this, as well as losses and data collection primitives (collector etc). This feature is also compatible with complex data structures, such as those found in MARL training pipelines.
All losses are now compatible with tensordict-free inputs, for a more generic deployment.
New transforms
Atari games can now benefit from a EndOfLifeTransform that allows to use the end-of-life as a done state in the loss (1605)
We provide a KL transform to add a KL factor to the reward in RLHF settings.
Action masking is made possible through the ActionMask transform (1421)
VC1 is also integrated for better image embedding.
* [Feature] Allow sequential transforms to work offline by vmoens in https://github.com/pytorch/rl/pull/1136
* [Feature] ClipTransform + rename `min/maximum` -> `low/high` by vmoens in https://github.com/pytorch/rl/pull/1500
* [Feature] End-of-life transform by vmoens in https://github.com/pytorch/rl/pull/1605
* [Feature] KL Transform for RLHF by vmoens in https://github.com/pytorch/rl/pull/1196
* [Features] Conv3dNet and PermuteTransform by xmaples in https://github.com/pytorch/rl/pull/1398
* [Feature, Refactor] Scale in ToTensorImage based on the dtype and new from_int parameter by hyerra in https://github.com/pytorch/rl/pull/1208
* [Feature] CatFrames used as inverse by BY571 in https://github.com/pytorch/rl/pull/1321
* [Feature] Masking actions by vmoens in https://github.com/pytorch/rl/pull/1421
* [Feature] VC1 integration by vmoens in https://github.com/pytorch/rl/pull/1211
New models
We provide GRU alongside LSTM for POMDP training.
MARL model coverage is now richer of a MultiAgentMLP and MultiAgentCNN! Other improvments for MARL include coverage for nested keys in most places of the library (losses, data collection, environments...)/
* [Feature] Support for GRU by vmoens in https://github.com/pytorch/rl/pull/1586
* [Feature] TanhModule by vmoens in https://github.com/pytorch/rl/pull/1213
* [Features] Conv3dNet and PermuteTransform by xmaples in https://github.com/pytorch/rl/pull/1398
* [Feature] CNN version of MultiAgentMLP by MarkHaoxiang in https://github.com/pytorch/rl/pull/1479
Other features (misc)
* [Feature] RLHF Rollouts (reopened) by vmoens in https://github.com/pytorch/rl/pull/1329
* [Feature] Add CQL by BY571 in https://github.com/pytorch/rl/pull/1239
* [Feature] Allow multiple (nested) action, reward, done keys in `env`,`vec_env` and `collectors` by matteobettini in https://github.com/pytorch/rl/pull/1462
* [Feature] Auto-DoubleToFloat by vmoens in https://github.com/pytorch/rl/pull/1442
* [Feature] CompositeSpec.lock by vmoens in https://github.com/pytorch/rl/pull/1143
* [Feature] Device transform by vmoens in https://github.com/pytorch/rl/pull/1472
* [Feature] Dispatch DiscreteSAC loss module by Blonck in https://github.com/pytorch/rl/pull/1248
* [Feature] Dispatch PPO loss module by Blonck in https://github.com/pytorch/rl/pull/1249
* [Feature] Dispatch REDQ loss module by Blonck in https://github.com/pytorch/rl/pull/1251
* [Feature] Dispatch SAC loss module by Blonck in https://github.com/pytorch/rl/pull/1244
* [Feature] Dispatch TD3 loss module by Blonck in https://github.com/pytorch/rl/pull/1254
* [Feature] Dispatch for DDPG loss module by Blonck in https://github.com/pytorch/rl/pull/1215
* [Feature] Dispatch for SAC loss module by Blonck in https://github.com/pytorch/rl/pull/1223
* [Feature] Dispatch reinforce loss module by Blonck in https://github.com/pytorch/rl/pull/1252
* [Feature] Distpatch IQL loss module by Blonck in https://github.com/pytorch/rl/pull/1230
* [Feature] Fix DType casting lazy init by vmoens in https://github.com/pytorch/rl/pull/1589
* [Feature] Heterogeneous Environments compatibility by matteobettini in https://github.com/pytorch/rl/pull/1411
* [Feature] Log hparams from python dict by matteobettini in https://github.com/pytorch/rl/pull/1517
* [Feature] MARL exploration e-greedy compatibility by matteobettini in https://github.com/pytorch/rl/pull/1277
* [Feature] Make advantages compatible with Terminated, Truncated, Done by vmoens in https://github.com/pytorch/rl/pull/1581
* [Feature] Make losses inherit from TDMBase by vmoens in https://github.com/pytorch/rl/pull/1246
* [Feature] Making action masks compatible with q value modules and e-greedy by matteobettini in https://github.com/pytorch/rl/pull/1499
* [Feature] Nested keys in `OrnsteinUhlenbeckProcess` by matteobettini in https://github.com/pytorch/rl/pull/1305
* [Feature] Optional mapping of "state" in gym specs by matteobettini in https://github.com/pytorch/rl/pull/1431
* [Feature] Parallel environments lazy heterogenous data compatibility by matteobettini in https://github.com/pytorch/rl/pull/1436
* [Feature] Pettingzoo: add multiagent dimension to single agent groups by matteobettini in https://github.com/pytorch/rl/pull/1550
* [Feature] RLHF Reward Model (reopened) by vmoens in https://github.com/pytorch/rl/pull/1328
* [Feature] RLHF dataloading by vmoens in https://github.com/pytorch/rl/pull/1309
* [Feature] RLHF networks by apbard in https://github.com/pytorch/rl/pull/1319
* [Feature] Refactor categorical dists: Masked one-hot and pass-through gradients by vmoens in https://github.com/pytorch/rl/pull/1488
* [Feature] ReplayBuffer.empty by vmoens in https://github.com/pytorch/rl/pull/1238
* [Feature] Separate losses by MateuszGuzek in https://github.com/pytorch/rl/pull/1240
* [Feature] Single call to value network in advantages [bis] by vmoens in https://github.com/pytorch/rl/pull/1263
* [Feature] Single call to value network in advantages by vmoens in https://github.com/pytorch/rl/pull/1256
* [Feature] TensorStorage by vmoens in https://github.com/pytorch/rl/pull/1310
* [Feature] Threaded collection and parallel envs by vmoens in https://github.com/pytorch/rl/pull/1559
* [Feature] Unbind specs by vmoens in https://github.com/pytorch/rl/pull/1555
* [Feature] VMAS obs dict by matteobettini in https://github.com/pytorch/rl/pull/1419
* [Feature] VMAS: choose between categorical or one-hot actions by matteobettini in https://github.com/pytorch/rl/pull/1484
* [Feature] dispatch for DQNLoss by vmoens in https://github.com/pytorch/rl/pull/1194
* [Feature] log histograms by vmoens in https://github.com/pytorch/rl/pull/1306
* [Feature] make csv logger `exist_ok` on logging folder by matteobettini in https://github.com/pytorch/rl/pull/1561
* [Feature] shifted for all adv by vmoens in https://github.com/pytorch/rl/pull/1276
New environments and third-party improvements
We now cover SMAC-v2, PettingZoo, IsaacGymEnvs (prototype) and RoboHive. The D4RL dataset can now be used without the eponym library, which permit training with more recent or older versions of gym.
* [Environment, Docs] SMACv2 and docs on action masking by matteobettini in https://github.com/pytorch/rl/pull/1466
* [Environment] Petting zoo by matteobettini in https://github.com/pytorch/rl/pull/1471
* [Feature] D4rl direct download by MateuszGuzek in https://github.com/pytorch/rl/pull/1430
* [Feature] Gym 'vectorized' envs compatibility by vmoens in https://github.com/pytorch/rl/pull/1519
* [Feature] Gym compatibility: Terminal and truncated by vmoens in https://github.com/pytorch/rl/pull/1539
* [Feature] IsaacGymEnvs integration by vmoens in https://github.com/pytorch/rl/pull/1443
* [Feature] RoboHive integration by vmoens in https://github.com/pytorch/rl/pull/1119
Performance improvements
We provide several speed improvements, in particular for data collection.
![image](https://github.com/pytorch/rl/assets/25529882/b2894440-2ba2-4935-a3d8-05279577b5db)
* [Performance] Accelerate GAE by Blonck in https://github.com/pytorch/rl/pull/1142
* [Performance] Accelerate TD lambda return estimate by Blonck in https://github.com/pytorch/rl/pull/1158
* [Performance] Accelerate `_split_and_pad_sequence` by Blonck in https://github.com/pytorch/rl/pull/1147
* [Performance] Faster GAE by vmoens in https://github.com/pytorch/rl/pull/1153
* [Performance] Faster losses by vmoens in https://github.com/pytorch/rl/pull/1272
* [Performance] Improve performance and streamline the generating of the gammalambda tensor by Blonck in https://github.com/pytorch/rl/pull/1171
* [Performance] Miscellaneous efficiency improvements by vmoens in https://github.com/pytorch/rl/pull/1513
* [Performance] Reduce key accessing in transforms by matteobettini in https://github.com/pytorch/rl/pull/1590
* [Performance] Some efficiency improvements by vmoens in https://github.com/pytorch/rl/pull/1250
* [Performance] Vmas vectorized reset by matteobettini in https://github.com/pytorch/rl/pull/1146
Bug fixes
* [BugFIx] Fix entropy signature in truncated normal by vmoens in https://github.com/pytorch/rl/pull/1536
* [BugFix,CI] Fix virtualenv not found by vmoens in https://github.com/pytorch/rl/pull/1280
* [BugFix] Add `torch.no_grad()` for rendering in multiagent PPO tutorial by matteobettini in https://github.com/pytorch/rl/pull/1511
* [BugFix] Batched envs compatibility with custom keys by matteobettini in https://github.com/pytorch/rl/pull/1348
* [BugFix] C++17 by vmoens in https://github.com/pytorch/rl/pull/1169
* [BugFix] Check env specs for nested envs by matteobettini in https://github.com/pytorch/rl/pull/1332
* [BugFix] CompositeSpec.unsqueeze by btx0424 in https://github.com/pytorch/rl/pull/1464
* [BugFix] DDPG select also critic input for actor loss by matteobettini in https://github.com/pytorch/rl/pull/1563
* [BugFix] DQN loss dispatch respect configured tensordict keys by Blonck in https://github.com/pytorch/rl/pull/1285
* [BugFix] Discrete SAC rewrite by matteobettini in https://github.com/pytorch/rl/pull/1461
* [BugFix] Empty-spec tolerance by vmoens in https://github.com/pytorch/rl/pull/1501
* [BugFix] Fix Brax reset by vmoens in https://github.com/pytorch/rl/pull/1195
* [BugFix] Fix CatFrames by vmoens in https://github.com/pytorch/rl/pull/1336
* [BugFix] Fix ClipTransform device by vmoens in https://github.com/pytorch/rl/pull/1508
* [BugFix] Fix Cython for D4RL by vmoens in https://github.com/pytorch/rl/pull/1429
* [BugFix] Fix DDPG by vmoens in https://github.com/pytorch/rl/pull/1183
* [BugFix] Fix DDPG squeezing by matteobettini in https://github.com/pytorch/rl/pull/1487
* [BugFix] Fix Dreamer test error by vmoens in https://github.com/pytorch/rl/pull/1558
* [BugFix] Fix Gym Categorical/One-hot issues by vmoens in https://github.com/pytorch/rl/pull/1482
* [BugFix] Fix KL import errors by vmoens in https://github.com/pytorch/rl/pull/1207
* [BugFix] Fix KLTransform execution with LSTM by vmoens in https://github.com/pytorch/rl/pull/1426
* [BugFix] Fix KeyError in inverse transform replay buffer by BY571 in https://github.com/pytorch/rl/pull/1165
* [BugFix] Fix LSTM - VecEnv compatibility by vmoens in https://github.com/pytorch/rl/pull/1427
* [BugFix] Fix LSTM use with padded/masked segments by smorad in https://github.com/pytorch/rl/pull/1399
* [BugFix] Fix NoopResetEnv behavior when trials exceeded. by skandermoalla in https://github.com/pytorch/rl/pull/1477
* [BugFix] Fix QValueModule multi_one_hot by smorad in https://github.com/pytorch/rl/pull/1439
* [BugFix] Fix RLHF tests - transformers v4.34 by vmoens in https://github.com/pytorch/rl/pull/1601
* [BugFix] Fix RewardSum spec transform to mimic reward spec by matteobettini in https://github.com/pytorch/rl/pull/1478
* [BugFix] Fix SAC alpha optim by vmoens in https://github.com/pytorch/rl/pull/1192
* [BugFix] Fix SAC by vmoens in https://github.com/pytorch/rl/pull/1189
* [BugFix] Fix SAC by vmoens in https://github.com/pytorch/rl/pull/1190
* [BugFix] Fix SACv2 by vmoens in https://github.com/pytorch/rl/pull/1191
* [BugFix] Fix SMAC-v2 by vmoens in https://github.com/pytorch/rl/pull/1538
* [BugFix] Fix TD3 and compat with https://github.com/pytorch-labs/tensordict/pull/482 by vmoens in https://github.com/pytorch/rl/pull/1375
* [BugFix] Fix TD3 inplace updates by vmoens in https://github.com/pytorch/rl/pull/1219
* [BugFix] Fix TD3 target net by vmoens in https://github.com/pytorch/rl/pull/1186
* [BugFix] Fix `LazyStackedCompositeSpec` and introducing `consolidate_spec` by matteobettini in https://github.com/pytorch/rl/pull/1392
* [BugFix] Fix `step_mdp()` by matteobettini in https://github.com/pytorch/rl/pull/1334
* [BugFix] Fix action mask test by vmoens in https://github.com/pytorch/rl/pull/1492
* [BugFix] Fix brax by vmoens in https://github.com/pytorch/rl/pull/1346
* [BugFix] Fix bug in ppo example config by degensean in https://github.com/pytorch/rl/pull/1396
* [BugFix] Fix envpool by vmoens in https://github.com/pytorch/rl/pull/1530
* [BugFix] Fix error message of .set_keys() in advantage modules by Blonck in https://github.com/pytorch/rl/pull/1218
* [BugFix] Fix examples by vmoens in https://github.com/pytorch/rl/pull/1173
* [BugFix] Fix locked params modif by vmoens in https://github.com/pytorch/rl/pull/1307
* [BugFix] Fix max length by vmoens in https://github.com/pytorch/rl/pull/1233
* [BugFix] Fix missing ("next", "observation") key in dispatch of losses by Blonck in https://github.com/pytorch/rl/pull/1235
* [BugFix] Fix nested CompositeSpec creation by vmoens in https://github.com/pytorch/rl/pull/1261
* [BugFix] Fix nightly tensordict dependency by skandermoalla in https://github.com/pytorch/rl/pull/1302
* [BugFix] Fix ppo example by vmoens in https://github.com/pytorch/rl/pull/1225
* [BugFix] Fix ppo training NaN occurences by vmoens in https://github.com/pytorch/rl/pull/1403
* [BugFix] Fix reward sum within parallel envs by vmoens in https://github.com/pytorch/rl/pull/1454
* [BugFix] Fix run_type_checks by vmoens in https://github.com/pytorch/rl/pull/1570
* [BugFix] Fix safe tanh for older torch versions by vmoens in https://github.com/pytorch/rl/pull/1220
* [BugFix] Fix serialization of parallel envs by vmoens in https://github.com/pytorch/rl/pull/1197
* [BugFix] Fix split_trajs by vmoens in https://github.com/pytorch/rl/pull/1444
* [BugFix] Fix tanh/atanh vmap compatibility by vmoens in https://github.com/pytorch/rl/pull/1217
* [BugFix] Fix the bug of `RoundRobinWriter.extend(data)` by xmaples in https://github.com/pytorch/rl/pull/1295
* [BugFix] Fix tutorials by vmoens in https://github.com/pytorch/rl/pull/1382
* [BugFix] Fix typo in CatFrames Transform error message. by skandermoalla in https://github.com/pytorch/rl/pull/1491
* [BugFix] Fix vmap in VmapModule (torch 1.13 compat) by vmoens in https://github.com/pytorch/rl/pull/1350
* [BugFix] Improve collector buffer initialisation when policy spec is unavailable by matteobettini in https://github.com/pytorch/rl/pull/1547
* [BugFix] Instantiate 2 losses with different keys by matteobettini in https://github.com/pytorch/rl/pull/1553
* [BugFix] KL module integration by vmoens in https://github.com/pytorch/rl/pull/1212
* [BugFix] Key selection in batched envs by vmoens in https://github.com/pytorch/rl/pull/1253
* [BugFix] Load collector frames and iter by matteobettini in https://github.com/pytorch/rl/pull/1557
* [BugFix] Make VecNorm Transform pickable by albertbou92 in https://github.com/pytorch/rl/pull/1596
* [BugFix] Minor fixes PPO / A2C examples by albertbou92 in https://github.com/pytorch/rl/pull/1591
* [BugFix] Multiagent "auto" entropy fix in SAC by matteobettini in https://github.com/pytorch/rl/pull/1494
* [BugFix] Nested envs compatibility by matteobettini in https://github.com/pytorch/rl/pull/1347
* [BugFix] Nested key in replay buffer by matteobettini in https://github.com/pytorch/rl/pull/1485
* [BugFix] Nested keys in transforms by matteobettini in https://github.com/pytorch/rl/pull/1355
* [BugFix] Nested keys to probabilistic modules by matteobettini in https://github.com/pytorch/rl/pull/1363
* [BugFix] Parametric `rand_action()` in `BaseEnv` by matteobettini in https://github.com/pytorch/rl/pull/1267
* [BugFix] Parametric collectors by matteobettini in https://github.com/pytorch/rl/pull/1303
* [BugFix] Patch SAC to allow state_dict manipulation before exec by vmoens in https://github.com/pytorch/rl/pull/1607
* [BugFix] PettingZoo seeding by matteobettini in https://github.com/pytorch/rl/pull/1554
* [BugFix] Pickable buffer by albertbou92 in https://github.com/pytorch/rl/pull/1410
* [BugFix] QValue modules and nested action by matteobettini in https://github.com/pytorch/rl/pull/1351
* [BugFix] Reward sum custom key by matteobettini in https://github.com/pytorch/rl/pull/1413
* [BugFix] SafeModule not safely handling specs by matteobettini in https://github.com/pytorch/rl/pull/1352
* [BugFix] Small patches to SMAC by matteobettini in https://github.com/pytorch/rl/pull/1533
* [BugFix] Sparse info in SMACv2 by matteobettini in https://github.com/pytorch/rl/pull/1546
* [BugFix] ToTensorImage unsqueeze would not update the observation spec by hyerra in https://github.com/pytorch/rl/pull/1161
* [BugFix] Torch 1.13 compat by vmoens in https://github.com/pytorch/rl/pull/1294
* [BugFix] Unbreak tensordict import by vmoens in https://github.com/pytorch/rl/pull/1231
* [BugFix] Vectorized priority update in replay buffers by matteobettini in https://github.com/pytorch/rl/pull/1598
* [BugFix] _transpose_time with single dim by vmoens in https://github.com/pytorch/rl/pull/1155
* [BugFix] `RewardSum` transform for multiple reward keys by matteobettini in https://github.com/pytorch/rl/pull/1544
* [BugFix] `step_mdp` nested keys by matteobettini in https://github.com/pytorch/rl/pull/1339
* [BugFix] include buffers in policy_weights by vmoens in https://github.com/pytorch/rl/pull/1185
* [BugFix] load_state_dict in param updates for collectors by vmoens in https://github.com/pytorch/rl/pull/1145
* [BugFix] make value estimator with value_key from the PPOLoss init arg by xmaples in https://github.com/pytorch/rl/pull/1144
* [BugFix] unlock in tensordictmodules tests by vmoens in https://github.com/pytorch/rl/pull/1417
* [BugFix] valid_size not saved as attribute by tcbegley in https://github.com/pytorch/rl/pull/1337
Miscellaneous
* Envpool Tests to Nova by osalpekar in https://github.com/pytorch/rl/pull/1283
* Fix CI by matteobettini in https://github.com/pytorch/rl/pull/1368
* Fix MacOS Mujoco Failure by osalpekar in https://github.com/pytorch/rl/pull/1450
* Linux GPU Brax Unittests by osalpekar in https://github.com/pytorch/rl/pull/1133
* Linux Gym Unittests to GHA by osalpekar in https://github.com/pytorch/rl/pull/1139
* Linux Olddeps tests to Nova by osalpekar in https://github.com/pytorch/rl/pull/1289
* Move to More Efficient Windows Runner by osalpekar in https://github.com/pytorch/rl/pull/1476
* OptDeps Tests to Nova by osalpekar in https://github.com/pytorch/rl/pull/1290
* Remove Distributed CCI job by osalpekar in https://github.com/pytorch/rl/pull/1374
* Remove Envpool from CCI by osalpekar in https://github.com/pytorch/rl/pull/1390
* Remove old CircleCI Lint by osalpekar in https://github.com/pytorch/rl/pull/1134
* Removing Migrated and Unused CCI jobs by osalpekar in https://github.com/pytorch/rl/pull/1288
* Revert "[Feature] Single call to value network in advantages" by vmoens in https://github.com/pytorch/rl/pull/1262
* Revert "[Refactor,Performance] Faster collectors" by vmoens in https://github.com/pytorch/rl/pull/1330
* Sklearn test to Nova by osalpekar in https://github.com/pytorch/rl/pull/1291
* Windows Unittests on GHA by osalpekar in https://github.com/pytorch/rl/pull/1086
* [Benchmark,CI] Benchmarks in PR (pre) by vmoens in https://github.com/pytorch/rl/pull/1342
* [Benchmark,CI] Benchmarks in PR by vmoens in https://github.com/pytorch/rl/pull/1341
* [Benchmark] Benchmark Gym vs TorchRL by vmoens in https://github.com/pytorch/rl/pull/1602
* [Benchmark] Benchmark losses by vmoens in https://github.com/pytorch/rl/pull/1287
* [Benchmark] Benchmark number GPU vectorised environments in VMAS (TorchRL vs RLlib) by matteobettini in https://github.com/pytorch/rl/pull/1446
* [Benchmark] Improve benchmark precision + step_mdp + fix GPU by vmoens in https://github.com/pytorch/rl/pull/1340
* [CI] Add macOS M1 binaries Wheels by DanilBaibak in https://github.com/pytorch/rl/pull/1504
* [CI] Add ninja for MacOS builts by vmoens in https://github.com/pytorch/rl/pull/1564
* [CI] Concurrency on gha by vmoens in https://github.com/pytorch/rl/pull/1152
* [CI] Deprecate Windows GPU CCI by osalpekar in https://github.com/pytorch/rl/pull/1387
* [CI] Doc CI fix by matteobettini in https://github.com/pytorch/rl/pull/1384
* [CI] Fix CI PettingZoo by matteobettini in https://github.com/pytorch/rl/pull/1528
* [CI] Fix CI by vmoens in https://github.com/pytorch/rl/pull/1529
* [CI] Fix GHA gpu tests by vmoens in https://github.com/pytorch/rl/pull/1356
* [CI] Fix Jax version in Jumanji by vmoens in https://github.com/pytorch/rl/pull/1242
* [CI] Fix Mujoco version by vmoens in https://github.com/pytorch/rl/pull/1475
* [CI] Fix RoboHive CI by vmoens in https://github.com/pytorch/rl/pull/1541
* [CI] Fix brax and habitat by vmoens in https://github.com/pytorch/rl/pull/1353
* [CI] Fix examples CI by matteobettini in https://github.com/pytorch/rl/pull/1489
* [CI] Fix failing jobs by vmoens in https://github.com/pytorch/rl/pull/1318
* [CI] Fix failing jobs by vmoens in https://github.com/pytorch/rl/pull/1335
* [CI] Fix habitat CI by vmoens in https://github.com/pytorch/rl/pull/1537
* [CI] Fix jumanji by vmoens in https://github.com/pytorch/rl/pull/1566
* [CI] Fix nightly build dependency on tensordict by vmoens in https://github.com/pytorch/rl/pull/1300
* [CI] Fix opt deps machine and docker by vmoens in https://github.com/pytorch/rl/pull/1362
* [CI] Fix tuto deps by matteobettini in https://github.com/pytorch/rl/pull/1416
* [CI] Fix wheels by vmoens in https://github.com/pytorch/rl/pull/1301
* [CI] Less old deps by vmoens in https://github.com/pytorch/rl/pull/1255
* [CI] Less warnings in CI (costs) by vmoens in https://github.com/pytorch/rl/pull/1349
* [CI] Merge Distributed and Linux GPU job by osalpekar in https://github.com/pytorch/rl/pull/1182
* [CI] Migrate examples by vmoens in https://github.com/pytorch/rl/pull/1364
* [CI] Move linux stable to GHA by vmoens in https://github.com/pytorch/rl/pull/1503
* [CI] Reduce CI time by vmoens in https://github.com/pytorch/rl/pull/1226
* [CI] Remove CCI Config by osalpekar in https://github.com/pytorch/rl/pull/1456
* [CI] Remove examples from CCI by vmoens in https://github.com/pytorch/rl/pull/1367
* [CI] Update cuda version by vmoens in https://github.com/pytorch/rl/pull/1380
* [CI] Windows GPU Tests by osalpekar in https://github.com/pytorch/rl/pull/1386
* [Doc] Add link to paper in readme by giadefa in https://github.com/pytorch/rl/pull/1298
* [Doc] Add paper refs in doc and KB by vmoens in https://github.com/pytorch/rl/pull/1241
* [Doc] CITATION.cff by vmoens in https://github.com/pytorch/rl/pull/1229
* [Doc] Do not clean gh-pages by vmoens in https://github.com/pytorch/rl/pull/1150
* [Doc] Fix GPU benchmark by vmoens in https://github.com/pytorch/rl/pull/1151
* [Doc] Fix advantage examples by vmoens in https://github.com/pytorch/rl/pull/1600
* [Doc] Fix default value of `tanh_loc` in the documentation of `TruncatedNormal`. by skandermoalla in https://github.com/pytorch/rl/pull/1205
* [Doc] Fix doctest examples by degensean in https://github.com/pytorch/rl/pull/1393
* [Doc] Fix exploration modules docstrings by vmoens in https://github.com/pytorch/rl/pull/1326
* [Doc] Fix tanh_loc in docstrings by vmoens in https://github.com/pytorch/rl/pull/1203
* [Doc] TorchRL Logo by vmoens in https://github.com/pytorch/rl/pull/1234
* [Doc] Update citation by vmoens in https://github.com/pytorch/rl/pull/1228
* [Doc] Update coding_ppo.py by kushaangupta in https://github.com/pytorch/rl/pull/1483
* [Doc] correct typos in pendulum tutorial by kushaangupta in https://github.com/pytorch/rl/pull/1502
* [Doc] fixed typos in ppo tutorial by MatteoGaetzner in https://github.com/pytorch/rl/pull/1314
* [Docs] Fix multi-agent tutorial by matteobettini in https://github.com/pytorch/rl/pull/1599
* [Docs] Multi-agent environments by matteobettini in https://github.com/pytorch/rl/pull/1383
* [Example] Multiagent examples: MAPPO-IPPO-MADDPG-IDDPG-IQL-QMIX-VDN by matteobettini in https://github.com/pytorch/rl/pull/1027
* [Fix] Remove loss device by matteobettini in https://github.com/pytorch/rl/pull/1395
* [Lint] Add TorchFix linter by kit1980 in https://github.com/pytorch/rl/pull/1580
* [Minor] Capture error in CatFrame edit by vmoens in https://github.com/pytorch/rl/pull/1498
* [Minor] Fix prints by vmoens in https://github.com/pytorch/rl/pull/1257
* [Minor] Fix typo by vmoens in https://github.com/pytorch/rl/pull/1193
* [Minor] Missing commit from 1488 by vmoens in https://github.com/pytorch/rl/pull/1490
* [Minor] Missing lint by vmoens in https://github.com/pytorch/rl/pull/1556
* [Minor] More efficient SAC v1 by vmoens in https://github.com/pytorch/rl/pull/1507
* [Minor] Remove ya gymnasium deprecation warning in vectorized envs by vmoens in https://github.com/pytorch/rl/pull/1573
* [Minor] small fixes by vmoens in https://github.com/pytorch/rl/pull/1237
* [Nova] Jumanji Tests to GHA by osalpekar in https://github.com/pytorch/rl/pull/1282
* [Nova] Remove windows Unittests from CCI by osalpekar in https://github.com/pytorch/rl/pull/1159
* [Nova] Removing CircleCI Gym Unittests by osalpekar in https://github.com/pytorch/rl/pull/1179
* [Nova] Vmas Tests to GHA by osalpekar in https://github.com/pytorch/rl/pull/1284
* [Quality] Filter out warnings in subprocs by vmoens in https://github.com/pytorch/rl/pull/1552
* [Refacto] Migration due to tensordict 473 and 474 by vmoens in https://github.com/pytorch/rl/pull/1354
* [Refactor,Performance] Faster collectors (bis) by vmoens in https://github.com/pytorch/rl/pull/1331
* [Refactor,Performance] Faster collectors by vmoens in https://github.com/pytorch/rl/pull/1327
* [Refactor] Better GymLikeEnv by vmoens in https://github.com/pytorch/rl/pull/1168
* [Refactor] Better batch-size handling by RBs by vmoens in https://github.com/pytorch/rl/pull/1311
* [Refactor] Better updaters by vmoens in https://github.com/pytorch/rl/pull/1184
* [Refactor] Change objectives parameter/buffer/target logic by vmoens in https://github.com/pytorch/rl/pull/1424
* [Refactor] Edit ppo params by vmoens in https://github.com/pytorch/rl/pull/1322
* [Refactor] Expose all wrappers in torchrl.envs by vmoens in https://github.com/pytorch/rl/pull/1532
* [Refactor] Faster envs (2) by vmoens in https://github.com/pytorch/rl/pull/1457
* [Refactor] Fix imports by vmoens in https://github.com/pytorch/rl/pull/1551
* [Refactor] Follow-up on tensordict PR 473 by vmoens in https://github.com/pytorch/rl/pull/1361
* [Refactor] More unravel fixes by vmoens in https://github.com/pytorch/rl/pull/1357
* [Refactor] Nested reward and done specs by vmoens in https://github.com/pytorch/rl/pull/1115
* [Refactor] Refactor DDPG loss in standalone methods by vmoens in https://github.com/pytorch/rl/pull/1603
* [Refactor] Refactor _reset in ParallelEnv by vmoens in https://github.com/pytorch/rl/pull/1172
* [Refactor] Refactor losses for generalization by vmoens in https://github.com/pytorch/rl/pull/1286
* [Refactor] Remove pkg_resources import by vmoens in https://github.com/pytorch/rl/pull/1379
* [Refactor] Remove private calls to _set by vmoens in https://github.com/pytorch/rl/pull/1370
* [Refactor] Shape ops in LSTM based on tensor shape, not tensordict by vmoens in https://github.com/pytorch/rl/pull/1170
* [Refactor] Use _set_tuple for faster set by vmoens in https://github.com/pytorch/rl/pull/1372
* [Refactor] Use `wait` instead of `is_set` to get results in ParallelEnv by vmoens in https://github.com/pytorch/rl/pull/1562
* [Refactor] Use masking in collectors by vmoens in https://github.com/pytorch/rl/pull/1412
* [Refactor] Vmas nested by matteobettini in https://github.com/pytorch/rl/pull/1366
* [Refactor] the usage of tensordict keys in loss modules by Blonck in https://github.com/pytorch/rl/pull/1175
* [Setup] Update setup.py python versions by vmoens in https://github.com/pytorch/rl/pull/1496
* [Test,BugFix] Fix Jax backend tests by vmoens in https://github.com/pytorch/rl/pull/1162
* [Test,CI,Feature] Total time per test by vmoens in https://github.com/pytorch/rl/pull/1232
* [Test] Remove import of test class by matteobettini in https://github.com/pytorch/rl/pull/1549
* [Test] Skip tests in python 3.11 by vmoens in https://github.com/pytorch/rl/pull/1535
* [Test] Skip threading tests in OSX by vmoens in https://github.com/pytorch/rl/pull/1571
* [Test] Test split trajs by vmoens in https://github.com/pytorch/rl/pull/1445
* [Test] Test state_dict and loss modules by vmoens in https://github.com/pytorch/rl/pull/1527
* [Tests] Collector compatibility for heterogeneous environments by matteobettini in https://github.com/pytorch/rl/pull/1414
* [Tests] DDPG extra critic input tests by matteobettini in https://github.com/pytorch/rl/pull/1568
* [Tutorial] Multiagent PPO tutorial by matteobettini in https://github.com/pytorch/rl/pull/1385
* [Versioning] Python 3.11 by vmoens in https://github.com/pytorch/rl/pull/1433
* [Versioning] Use python 3.8 for GPU tests by vmoens in https://github.com/pytorch/rl/pull/1577
* [Versioning] Write version all cases in setup.py by vmoens in https://github.com/pytorch/rl/pull/1579
* d4rl Test to Nova by osalpekar in https://github.com/pytorch/rl/pull/1293
* python 3.11 in README by vmoens in https://github.com/pytorch/rl/pull/1434
New Contributors
* Blonck made their first contribution in https://github.com/pytorch/rl/pull/1142
* hyerra made their first contribution in https://github.com/pytorch/rl/pull/1161
* skandermoalla made their first contribution in https://github.com/pytorch/rl/pull/1205
* giadefa made their first contribution in https://github.com/pytorch/rl/pull/1298
* MatteoGaetzner made their first contribution in https://github.com/pytorch/rl/pull/1314
* MateuszGuzek made their first contribution in https://github.com/pytorch/rl/pull/1240
* degensean made their first contribution in https://github.com/pytorch/rl/pull/1393
* smorad made their first contribution in https://github.com/pytorch/rl/pull/1399
* kushaangupta made their first contribution in https://github.com/pytorch/rl/pull/1483
* kit1980 made their first contribution in https://github.com/pytorch/rl/pull/1580
* MarkHaoxiang made their first contribution in https://github.com/pytorch/rl/pull/1479
* DanilBaibak made their first contribution in https://github.com/pytorch/rl/pull/1504
A great THANKS to our contributors, in particular (but not in any particular order) skandermoalla, matteobettini, BY571 and albertbou92 for their tremendous dedication.
**Full Changelog**: https://github.com/pytorch/rl/compare/v0.1.1...v0.2.0