As always, we want to warmly thank the RL community who's supporting this project. A special thanks to our first time
contributors:
* priba made their first contribution in https://github.com/pytorch/rl/pull/2543
* carschandler made their first contribution in https://github.com/pytorch/rl/pull/2545
* 4d616e61 made their first contribution in https://github.com/pytorch/rl/pull/2624
* valterschutz made their first contribution in https://github.com/pytorch/rl/pull/2626
* raresdan made their first contribution in https://github.com/pytorch/rl/pull/2616
* oslumbers made their first contribution in https://github.com/pytorch/rl/pull/2609
* codingWhale13 made their first contribution in https://github.com/pytorch/rl/pull/2682
as well as all the users who wrote issues, suggestions, started discussions here, on [discord](https://discord.gg/cZs26Qq3Dd),
on the [pytorch forum](https://discuss.pytorch.org/) or elsewhere! We value your feedback!
BC-Breaking changes and Deprecated behaviors
Removed classes
As announced, we removed the following classes:
- AdditiveGaussianWrapper
- InPlaceSampler
- NormalParamWrapper
- OrnsteinUhlenbeckProcessWrapper
Default MLP config
The default MLP depth has passed from 3 to 0 (i.e., now `MLP(in_features=3, out_features=4)` is equivalent to a regular
`nn.Linear` layer).
Locking envs
Environments specs are now carefully locked by default (2729, 2730). This means that
python
env.observation_spec = spec
is allowed (specs will be unlocked/re-locked automatically) but
python
env.observation_spec["value"] = spec
will not work. The core idea here is that we want to cache as much info as we can, such as action keys or whether
the env has dynamic specs. We can only do that if we can guarantee that the env has not been modified. Locking the specs
provides us such guarantee.
Note that a version of this already existed but it was not as robust as the new one.
Changes to composite distributions
TL;DR: We're changing the way log-probs and entropies are collected and written in ProbabilisticTensorDictModule and
in CompositeDistribution. The `"sample_log_prob"` default key will soon be `"<value>_log_prob` (or
`("path", "to", "<value>_log_prob")` for nested keys). For `CompositeDistribution`, a different log-prob will be
written for each leaf tensor in the distribution. This new behavior is controlled by the
`tensordict.nn.set_composite_lp_aggregate(mode: bool)` function or by the `COMPOSITE_LP_AGGREGATE` environment variable.
We strongly encourage users to adopt the new behavior by setting `tensordict.nn.set_composite_lp_aggregate(False).set()`
at the beginning of their training script.
The behavior of `CompositeDistribution` and its interaction with on-policy losses such as PPO has changed.
The PPO documentation now includes a section about multi-head policies and the examples also give such information.
See the [tensordict v0.7.0 release notes](https://github.com/pytorch/tensordict/releases/tag/v0.7.0) or #2707 to know more.
[Deprecation] Change the default MLP depth (2746) (12e6bce60) by vmoens ghstack-source-id: bd34b8e9112c4fc3a30bd095e3ac073a7d0b5469
[Deprecation] Gracing old *Spec with v0.8 versioning (2751) (fa697fe59) by vmoens ghstack-source-id: e7c6e0a4b8520da887fe7e602a351c3c72a08c4c
[Deprecation] Remove AdditiveGaussianWrapper (2748) (6c7f4fbda) by vmoens ghstack-source-id: 78f248e1239a04fc5213aa4418a158f741679593
[Deprecation] Remove InPlaceSampler (2750) (0feef11f9) by vmoens ghstack-source-id: eeae1bf0611a5d293f533767eee7b9700e720cc8
[Deprecation] Remove NormalParamWrapper (2747) (a38604e47) by vmoens ghstack-source-id: 4a70178f54f9e25d602c86a0b61248d66f3e39bd
[Deprecation] Remove OrnsteinUhlenbeckProcessWrapper (2749) (0111a8795) by vmoens ghstack-source-id: 401fdfaca2e27122d5a67fc7177e1015047f0098
New features
Compile compatibility
We gave a strong focus on a better compatibility with torch.compile across the SOTA training scripts which now
all accept a `compile=1` argument. The overall speedups range from 1 to 4x
<img width="566" alt="Screenshot 2025-02-05 at 21 20 54" src="https://github.com/user-attachments/assets/bd36ce4d-426e-4d7d-a4da-ff62eee78240" />
Loss module speedups are displayed in the [README.md](https://github.com/pytorch/rl) page.
Replay buffers are also mostly compatible with compile now (with the notable exception of distributed and memmaped ones).
Specs: auto_spec_, `<attr>_spec_unbatched`
You can now use `env.auto_spec_` to set the specs automatically based on a dummy rollout.
For batched environments, the unbatched spec can now be accessed via `env.<attr>_spec_unbatched`. This is useful to
create random policies, for example.
New transforms
We added `TrajCounter` (2532), `Hash` and `Tokenizer` (2648, 2700) and LineariseReward (2681).
LazyStackStorage
We provide a new `ListStorage`-based storage (`LazyStackStorage`) that automatically represents samples as a `LazyStackedTensorDict`
which makes it easy to store ragged tensors (although not contiguously in memory) 2723.
ChessEnv
A new `torchrl.envs.ChessEnv` allows users to train agents to play chess!
Tutorials on exporting torchrl modules
We also opensourced a tutorial to export TorchRL modules on hardware: 2557
Full list of features
[Feature, Test] Adding tests for envs that have no specs (2621) (c72583f75) by vmoens ghstack-source-id: 4c75691baa1e70f417e518df15c4208cff189950
[Feature,Refactor] Chess improvements: fen, pgn, pixels, san, action mask (2702) (d425777b8) by vmoens ghstack-source-id: f294a2bc99a17911c9b62558d530b148d3c0350f
[Feature] A2C compatibility with compile (2464) (507766a88) by vmoens ghstack-source-id: 66a7f0d1dd82d6463d61c1671e8e0a14ac9a55e7
[Feature] ActionDiscretizer custom sampling (2609) (3da76f006) oslumbers Co-authored-by: Oliver Slumbers <oliver.slumbershelsing.ai>
[Feature] Add Hash transform (2648) (50011dcf1) kurtamohler ghstack-source-id: dccf63fe4f9d5f76947ddb7d5dedcff87ff8cdc5
[Feature] Add `Choice` spec (2713) (9368ca68e) kurtamohler ghstack-source-id: afa315a311845ab39ade3e75046f32757f9d94f1
[Feature] Add `LossModule.reset_parameters_recursive` (2546) (218d5bf70) by kurtamohler
[Feature] Add `Stack` transform (2567) (594462d6b) by kurtamohler
[Feature] Add deterministic_sample to masked categorical (2708) (49d9897af) by vmoens ghstack-source-id: d34fcf9b44d7a7c60dbde80b0835189f990ef226
[Feature] Adds ordinal distributions (2520) (c851e1698) by louisfaury Co-authored-by: louisfaury
[Feature] Avoid some recompiles of `ReplayBuffer.extend/sample` (2504) (0f29c7e93) kurtamohler
[Feature] CQL compatibility with compile (2553) (e2be42e82) by vmoens ghstack-source-id: d362d6c17faa0eb609009bce004bb4766e345d5e
[Feature] CROSSQ compatibility with compile (2554) (01a421e76) by vmoens ghstack-source-id: 98a2b30e8f6a1b0bc583a9f3c51adc2634eb8028
[Feature] CatFrames.make_rb_transform_and_sampler (2643) (9ee1ae7ee) by vmoens ghstack-source-id: 7ecf952ec9f102a831aefdba533027ff8c4c29cc
[Feature] ChessEnv (2641) (17983d43e) by vmoens ghstack-source-id: 087c3b12cd621ea11a252b34c4896133697bce1a
[Feature] Composite.batch_size (2597) (2e82cab19) by vmoens ghstack-source-id: 621884a559a71e80a4be36c7ba984fd08be47952
[Feature] Composite.pop (2598) (8d16c12bd) by vmoens ghstack-source-id: 64d5bd736657ef56e37d57726dfcfd25b16b699f
[Feature] Composite.separates (2599) (83e0b0568) by vmoens ghstack-source-id: fbfc4308a81cd96ecc61723df8c0eb870c442def
[Feature] Custom conversion tool for gym specs (2726) (dbc8e2ee0) by vmoens ghstack-source-id: d38bb02f15267a9b1637b3ed25fb44ef013e2456
[Feature] DDPG compatibility with compile (2555) (7d7cd9538) by vmoens ghstack-source-id: f18928a419f81794d6870fd4e9fe1205c1b137e1
[Feature] DQN compatibility with compile (2571) (f149811da) by vmoens ghstack-source-id: 113dc8c4a5562d217ed867ace1942b2f6b8a39f9
[Feature] DT compatibility with compile (2556) (fbfe10488) by vmoens ghstack-source-id: 362b6e88bad4397f35036391729e58f4f7e4a25d
[Feature] Discrete SAC compatibility with compile (2569) (9e2d214fa) by vmoens ghstack-source-id: ddc131acedbbe451b28758e757a8c240ebd72b80
[Feature] Ensure out-place policy compatibility in rollout and collectors (2717) (ec370c6b6) by vmoens ghstack-source-id: 41a6aa56e0a045a20224b96f9537a7ae3ae14494
[Feature] EnvBase.auto_specs_ (2601) (d537dcb63) by vmoens ghstack-source-id: 329679238c5172d7ff13097ceaa189479d4f4145
[Feature] EnvBase.check_env_specs (2600) (00d3199ec) by vmoens ghstack-source-id: 332dbf92db496c71c5ce6aba340ad123eac0f5d6
[Feature] GAIL compatibility with compile (2573) (6482766b8) by vmoens ghstack-source-id: 98c7602ec0343d7a83cb19bddeb579484c42e77e
[Feature] IQL compatibility with compile (2649) (2cfc2abd6) by vmoens ghstack-source-id: 77bca166701d28dd69ef3964f55ab4f3e4b17fed
[Feature] LLMHashingEnv (2635) (30d21e599) by vmoens ghstack-source-id: d1a20ecd023008683cf18cf9e694340cfdbdac8a
[Feature] LazyStackStorage (2723) (fe3f00c6c) by vmoens ghstack-source-id: e9c031470aa0bdafbb2b26c73c06b25685a128e5
[Feature] Linearise reward transform (2681) (ff1ff7e9c) by louisfaury Co-authored-by: louisfaury
[Feature] Log each entropy for composite distributions in PPO (2707) (319bb68f0) by louisfaury Co-authored-by: louisfaury
[Feature] Log pbar rate in SOTA implementations (2662) (1ce25f19a) by vmoens ghstack-source-id: 283cc1bb4ad2d60281296d2cfb78ec41c77f4129
[Feature] MCTSForest (2307) (e9d167711) by vmoens ghstack-source-id: 9ac5cd3de39a4dbe1c7c33cb71ff6f45a886ae65
[Feature] Make PPO compatible with composite actions and log-probs (2665) (256a7002c) by vmoens ghstack-source-id: c41718e697f9b6edda17d4ddb5bd6d41402b7c30
[Feature] PPO compatibility with compile (2652) (f5a187d7d) by vmoens ghstack-source-id: 0ed29f352fcd85f0dc0683d90e95bdbecf6c14f9
[Feature] Re-enable cache for specs (2730) (4262ab91e) by vmoens ghstack-source-id: 797132312bfd9749f8926a2dd0b03eff65b8f51c
[Feature] SAC compatibility with compile (2655) (87a59fb30) by vmoens ghstack-source-id: b57caeaf6e2d3690fb3311f4c9b8cca8575d3974
[Feature] Send info dict to the storage device in RBs (2527) (d524d0d6b) by vmoens ghstack-source-id: 4ed60d649b17f96b49f90d234e679937c60a3c32
[Feature] TD3 compatibility with compile (2658) (1b7eda199) by vmoens ghstack-source-id: fb94307557f2b8604403b48211e3da6fb2139e28
[Feature] TD3-bc compatibility with compile (2657) (91064bc27) by vmoens ghstack-source-id: 8a33e39829f620c1e1a579a0255162ba93eaca91
[Feature] TensorSpec.enumerate() (2354) (14b63e4f0) by vmoens ghstack-source-id: 9db2f5ee47a197eb0403cb4622266fb03b99360f
[Feature] TrajCounter transform (2532) (05aeb8975) by vmoens ghstack-source-id: 62a3091e5c9072f26266143319f30de1729c0d4e
[Feature] UnaryTransform for input entries (2700) (093a1599f) by vmoens ghstack-source-id: bb0ea97f47bdad6ba5e73692969fece4e2efbfb4
[Feature] `example_data` for NonTensor spec (2698) (80690d221) by vmoens ghstack-source-id: 6fe5d82763dfcc9044d6debe88f0f34bb739c987
[Feature] automatically determine return_contiguous (2724) (cac93eb0e) by vmoens ghstack-source-id: 6d1fc31d87cb021e6286cdb07db2d9b0e2302f7d
[Feature] env.step_mdp (2636) (4bc40a808) by vmoens ghstack-source-id: 145e37cd772fdd74e35e5ffe6accc5c81ad689f3
[Feature] flexible batch_locked for jumanji (2382) (35a78139b) by vmoens ghstack-source-id: e356b6511ff3da8a6c583747214cfa90f42c9083
[Feature] lock_ / unlock_ graphs (2729) (601483e71) by vmoens ghstack-source-id: 01e375e636b97b26a89f9bbab2e955db6c85978a
[Feature] multiagent data standardization: PPO advantages (2677) (b7a0d11e5) by matteobettini Co-authored-by: Vincent Moens <vmoensmeta.com>
[Feature] no_cuda_sync arg in collectors (2727) (280297aee) by vmoens ghstack-source-id: 9baba31b3ee844882fd4b6a6f69874946caf3b3e
[Feature] single_<attr>_spec (2549) (58c384713) by vmoens ghstack-source-id: 27e247ea1775e455999a114dd6d95fac748376c4
[Feature] spec.cardinality (2638) (dd26ae79f) by vmoens ghstack-source-id: 1160900f8a81dd51dc72436e1af69c8248bff162
[Feature] spec.is_empty(recurse) (2596) (097d8ad98) by vmoens ghstack-source-id: faa3b1df5133c77462d6dd013d3854d684cc7e94
[Feature] timeit.printevery (2653) (187de7c8b) by vmoens ghstack-source-id: 19165bbfbea5cdc0a6b159493fb02571bab872f3
[Minor,Feature] Add `prefix` arg to `timeit.todict` (2576) (7bc84d15d) by vmoens ghstack-source-id: f1ff685caf6e8950d02dfc44ad2c1eb496495ad1
[Minor,Feature] `group_optimizers` (2577) (7829bd3f3) by vmoens ghstack-source-id: 81a94ed641544a420bb1c455921ca6a17ecd6a22
Doc
[Doc] Add AOTInductor back (2564) (9f8f77cdb) by vmoens ghstack-source-id: 774eb5973045861f284fdc67f74945b1eecdeaf2
[Doc] Add Tokenizer and auto-reset doc link (2754) (ee4006a6b) by vmoens ghstack-source-id: 90f55b568e85ae151bea4370025144c19e74602b
[Doc] Add `Stack` transform link in docs (2689) (c5f1565de) by kurtamohler
[Doc] Adding recurrent policies to export tutorial (2559) (705123870) by vmoens ghstack-source-id: 1f1af399b120db8bbb1789748641f44fd3b1bd5e
[Doc] Better doc for SliceSampler (2607) (90572ac11) by vmoens ghstack-source-id: 7d79ef7d37c4dc2ffbdff5b422cf5da24d93c0da
[Doc] Fix broken links and formatting issues in doc (2574) (5a2d9e205) by vmoens ghstack-source-id: 4e3f84fe436de6a6e9696894cd06318a98e4a23b
[Doc] Fix modules doc (2531) (edbf3dee3) by vmoens
[Doc] Fix tutorials (2560) (2f3b4cd4d) by vmoens ghstack-source-id: 6c9114384015e76e96b3bbd0c8893cc42344537a
[Doc] Fix typo in torchrl/modules/distributions/continuous.py (2624) (b2e9f291a) by Mana
[Doc] Fix typos (2682) (f672c708f) by Nils Kiele Co-authored-by: Vincent Moens <vincentmoensgmail.com>
[Doc] MADDPG bug fix of buffer device and improve explaination (2519) (3e4b2928e) by matteobettini
[Doc] Minor fixes to the docs and type hints (2548) (50a35f69b) by thomasbbrunner
[Doc] Tutorial on exporting TorchRL models (2557) (c0187a93e) by vmoens ghstack-source-id: b93146e22d8376563e7ac302b5cff95f09ae50d4
[Doc] Typo in docs for actors.py (2545) (19dbeebf0) by carschandler
[Doc] Update docstring for TruncatedNormal with correct parameter names (2625) (d22266d05) by valterschutz Co-authored-by: Valter Schutz <valterschutzproton.me>
[Doc] actor docstrings (2626) (825779935) by valterschutz Co-authored-by: Valter Schutz <valterschutzproton.me>
[Doc] fix several typos (2603) (de153bf45) by carschandler
[Doc] torchrl_demo.py revamp (2561) (304e707ef) by vmoens ghstack-source-id: 2f0087850e4a7d4d4393f0662156af9bfca8e3e1
[Example] Efficient Trajectory Sampling with CompletedTrajRepertoire (2642) (b840a772c) by vmoens ghstack-source-id: 4d5c587c69230aa8f3a1b9b6fe19f52fa683d703
[Example] RNN-based policy example (2675) (d009835b4) by vmoens ghstack-source-id: ef0087e9b5cba40be428f57ef70ecd2f63483d03
[Example] Using Collector's device args (2705) (539c2158d) by vmoens ghstack-source-id: 9aec8daa53000bdfd6091be706c7bc46778d5983
Performance
[Performance] Accelerate slice sampler on GPU (2672) (84c3ec322) by vmoens ghstack-source-id: a4dc1515d8b51f5ec150b2fae4e1a84254f2af09
[Performance] Avoid cloning trajs in SliceSampler (2671) (4fd54fef4) by vmoens ghstack-source-id: 2e133fcea716b202694cfa84df3f6e4ba3507bbc
[Performance] Improve performance of compiled ReplayBuffer (2529) (2a07f4c0f) by kurtamohler
[Benchmark] Add benchmark for compiled `ReplayBuffer.extend/sample` (2514) (5e03a5518) kurtamohler ghstack-source-id: d4562697e2c1a8392cf5bdcadb50f8b7b6939e41
Better engineering
[BE] Add trailing spaces when necessary (2581) (600760f5b) by vmoens ghstack-source-id: 198b5b5668cce8336d44206c10dacb8a9b1a9785
[BE] Add type annotation for tensor_keys to facilitate auto-complete (2696) (4b3279a3f) by vmoens ghstack-source-id: b4a8fe38e7c6b028759eef082f65f26036bc0250
[Refactor,CI] Refactor SOTA tests (2583) (c0ba3ff54) by vmoens ghstack-source-id: b14c59bb1ca7bf056bde05fa0abd01fa7e9b3710
[Refactor] Allow safe-tanh for torch >= 2.6.0 (2580) (1474f8517) by vmoens ghstack-source-id: 92df1954451453ee051bbde499f6db5ebaafed49
[Refactor] Deprecate recurrent_mode API to use decorators/CMs instead (2584) (14b277513) by vmoens ghstack-source-id: 80f705e022abc111df3960fc09576d5e266ed4dd
[Refactor] Refactor trees (2634) (57dc25a44) by vmoens ghstack-source-id: 368ba4c4402b6db0bc8b0688802ce161db9776b7
[Refactor] Rename Recorder and LogReward (2616) (607ebc52d) by Goia Rares Dan Tiago
[Refactor] Use <spec>_unbatched in VMAS (2593) (a126a6f94) by vmoens ghstack-source-id: 2190278de44ba59a3bc8d38398fddae9ecc42a84
[Refactor] Use default device instead of CPU in losses (2687) (c3b9d1dc7) by vmoens ghstack-source-id: 8b98062c3ae88d8780ef7428fdfa07e305c790b9
[Refactor] compile compatibility improvements (2578) (db7f08d76) by vmoens ghstack-source-id: 95f8241b56e42b80e828485cb5f377288bff6f5e
[Quality,BE] Better doc for step_mdp (2639) (ef5a37d8a) by vmoens ghstack-source-id: 1f5aed6fb2e97ead9d379f9545ae742f7728c585
[Quality] Better TD construction in codebase (2565) (a4c1ee3b3) by vmoens ghstack-source-id: 9e280d9d7d4a735e5055beb0450d933547530e55
[Quality] Better warning when c++ binaries failed to be imported (2541) (0a13cbd5e) by vmoens
[Quality] IMPALA auto-device (2654) (526b38d5c) by vmoens ghstack-source-id: abbb3048f33c9f7f6a623e32e139871093ea74fa
[Minor] Fix doc and MARL tests (2759) (ad7d2a10b) by vmoens ghstack-source-id: 9308be3ebc7fac30b5bde321792eb97069d55996
[Minor] Fix fbcode imports of mocking classes (2526) (da0bf1897) by vmoens ghstack-source-id: 74f9f3bedf8f48988a1956084548f6cd2f720934
[Minor] Make fbcode happy with imports (2517) (a70b258cd) by vmoens ghstack-source-id: d4bfce9d51269bc0ab6154ee4c2d1e1ff7af0895
Bug fixes
[BugFix, BE] Document and fix fps passing in recorder and loggers (2694) (61e05b3d9) by vmoens ghstack-source-id: b3996a9a27643eb5da8a78135f6b9fcef3685f17
[BugFix,Doc] Fix BATCHED_PIPE_TIMEOUT refs and doc (2695) (dc25a55a7) by vmoens ghstack-source-id: 6e43c4ff1c319545cf0952abf6f35f3e7ed473e0
[BugFix,Doc] Revert dynamic shape in export tutorial (2563) (9d292a007) by vmoens ghstack-source-id: fc856218e840469a5bb0143241d100e9cc612538
[BugFix,Test,Benchmark] Fix graph breaks induced by device context manager (2602) (152bc81b7) by vmoens ghstack-source-id: 0df2728928280a43de4abd30afed20826b0af091
[BugFix,Test] test chess rendering (2721) (ddbb6fdd5) by vmoens ghstack-source-id: 59b37e6fa2f8c11f600eea334da0bd8257ed382c
[BugFix] Account for composite actions in gym (2718) (1246db197) by vmoens ghstack-source-id: c09b59904a89d45fa24a61a5e8a24fe307320794
[BugFix] Account for terminating data in SAC losses (2606) (c8676f4a8) by vmoens ghstack-source-id: dc1870292786c262b4ab6a221b3afb551e0efb9b
[BugFix] ActionDiscretizer scalar integration (2619) (830f2f26c) by vmoens ghstack-source-id: b22102f3730914b125ef0f813f4d2f22dec0b26e
[BugFix] Allow expanding TensorDictPrimer transforms shape with parent batch size (2552) (83a7a57da) by Albert Bou Co-authored-by: Vincent Moens <vmoensmeta.com>
[BugFix] Avoid KeyError in slice sampler (for compile) (2670) (21eeca42c) by vmoens ghstack-source-id: 6e2a3036f0e50d365387cced50a761b97a47317d
[BugFix] Better account of composite distributions in PPO (2622) (90c8e40f6) by vmoens ghstack-source-id: 3d86f99bc5b20a53e4092d786e96a5f7e83405ac
[BugFix] Compatibility of tensordict primers with batched envs (specifically for LSTM and GRU) (2668) (f4709c143) by vmoens ghstack-source-id: e1da58ecfd36ca01b8a11fe90e5f3c5fe77f064c
[BugFix] Fix MARL PPO tutorial action_spec call (2628) (1ca134cc3) by vmoens ghstack-source-id: 1d9058c45b28c0f0279e4243a2a0f96c622a51d8
[BugFix] Fix batching envs with non tensor data (2674) (ab4250ec7) by vmoens ghstack-source-id: daba8a95459cfa978da09291757b6380fab4f308
[BugFix] Fix call to tree.plot in tests (2547) (09d6866e0) by vmoens ghstack-source-id: 4a5babbf46294ab6ed4a791e26cfacaf3a41a2e0
[BugFix] Fix collector length with non-empty batch size (2575) (b87597922) by vmoens ghstack-source-id: 0c6a7a49f0570fad083340a64dd89c0f4c220c06
[BugFix] Fix compile weakrefs errors (2742) (ffa99b2a2) by vmoens ghstack-source-id: 3cb4c62f465a3c0581064b3ff89290b9d225eb3f
[BugFix] Fix device transfer for collectors with init_random_frames mixed devices (2704) (1d45117ba) by vmoens ghstack-source-id: 1684399a7c84dd19b396db6c903fbf68c971c73d
[BugFix] Fix export aoti_compile_and_package API change (2629) (1cffffee9) by vmoens ghstack-source-id: 07a0f063f8955815157c2a3eac02c6460a82f672
[BugFix] Fix failing tests (2582) (863121a27) by vmoens ghstack-source-id: a43a2e3dbf76cd63c57ae00028df04b41a4e2f2b
[BugFix] Fix get_default_device calls in older PT versions (2586) (705ecc2bb) by vmoens ghstack-source-id: fd3a739d38feba075073801dda362be598822a94
[BugFix] Fix imports (2605) (d90b9e3d1) by vmoens ghstack-source-id: db85f2611c1c0b22e9179b4fdd6c2dcea78ac8dd
[BugFix] Fix init_random_frames=0 (2645) (19dfefc84) by vmoens ghstack-source-id: 38a544ea15631f9affb4c385c09e7c4df94af55d
[BugFix] Fix missing min/max alpha clamps in losses (2684) (ed656a15f) by vmoens
[BugFix] Fix output of `SipHash(as_tensor=False)` (2664) (1fc9577c4) by kurtamohler
[BugFix] Fix partial device transfers in collector (2703) (afb81de51) by vmoens ghstack-source-id: 2cd74c2d6fceaf079122ae801b67bdbfc29cddaf
[BugFix] Fix pendulum device (2516) (6799a7f5d) by vmoens ghstack-source-id: bcaf20de6e317d4bda0e1511e0b1e46653a6f352
[BugFix] Fix safe probabilistic backward by removing in-place modif (2755) (2f8c118e3) by vmoens ghstack-source-id: 574eb1f9b662c1eb5be25e97020e11b3fadf625e
[BugFix] Fix tests failing because of https://github.com/pytorch/pytorch/pull/137602 (165163abe) by vmoens
[BugFix] Fix typing for python 3.9 (2631) (e7062a1d6) by vmoens ghstack-source-id: 663da84096214611804a726e2d38d27a6f21c958
[BugFix] Fix typing in chess env (2646) (cb8e241b2) by vmoens ghstack-source-id: ad6086bbb7d1ee528ca24ec1d1232da47372e2b5
[BugFix] Fix typing in llm env (2647) (e3c304733) by vmoens ghstack-source-id: b5608f91756b5a81141941903158417a111e0710
[BugFix] Fix version parsing in extensions (2542) (997d90e1b) by vmoens ghstack-source-id: 903f2b01b508b81b1b4f92c4297d390da79fe8a2
[BugFix] PettingZoo dict action spaces (2692) (1a6c9e2d0) by matteobettini
[BugFix] Remove erroneous python 3.8 compatibility classifier (2540) (528875a9f) by vmoens
[BugFix] Remove raisers in specs (2651) (bb6f87adb) by vmoens ghstack-source-id: a005a62847aa2ff1d286f2c4ad13fd14f9e631d3
[BugFix] Rename RayCollector example file to avoid ImportError (2525) (8eac84ad2) by Albert Bou
[BugFix] Support for tensor collection in the `PPOLoss` (2543) (0eabb7897) by Pau Riba Co-authored-by: Pau Riba <pau.ribahelsing.ai>
[BugFix] Temporarily remove unsafe caching in envs (2728) (dc63e820d) by vmoens ghstack-source-id: a139cf6dc9fcfcfa525a6aa6375163d379593550
[BugFix] Wrong spec returned (2604) (a1e21f598) by matteobettini
[BugFix] action_spec_unbatched whenever necessary (2592) (d30599ec0) by vmoens ghstack-source-id: ec87794dabaf5023dac85cfc898a7c000e93331d
[BugFix] adapt log-prob TD batch-size to advantage shape in PPO (2756) (cb37521e1) by vmoens ghstack-source-id: 8ccd12f65f4a74a42356a630e0e5a1f015337d4a
[BugFix] make buffers zero-dim in exploration modules (2591) (a47b32c07) by vmoens ghstack-source-id: fd2705eb9132169da4871b27b354f7895c644061
[BugFix] patch rand_action in TransformedEnv to read the base_env method (2699) (2c19fcc70) by vmoens ghstack-source-id: 04e2e85e2675cf34c349ebadb8fa85a5aff2e532
[BugFix] requested_frames_per_batch in distributed collectors (2579) (408cf7d04) by vmoens ghstack-source-id: 49289de6956460d9aed13d982eb8003eafc35118
[BugFix] skip_done_states in SAC (2613) (de61e4d5e) by vmoens ghstack-source-id: 39d97360e3b0e45dd8c327487eac50ddafe2254d
CI and Tests
[Test] Add tests and a few fixes for ChessEnv (2661) (7bbd7e3b6) kurtamohler ghstack-source-id: d0fbb520e35c74305041340722a7560ac2f958f2
[Test] Add tests for CatFrames with PermuteTransform (2715) (d4e401993) kurtamohler ghstack-source-id: e554d1cda8d7e4458c9397f1f93345c855e68e5c
[Test] Add tests for Tree (2738) (bb9440b40) kurtamohler ghstack-source-id: 8f7aa07a4d36aa3664eaa19cc35bd66fb9e61c24
[Test] Fix warnings in SOTA tests (2710) (a90106475) by vmoens ghstack-source-id: c79223b5d6548a6c5a6ef649f6eb8e1703258815
[Test] More comprehensive tests for auto_spec (2640) (6c7d233a4) by vmoens ghstack-source-id: 75352490436fd706af3d36f9b8016e80a8a3f46a
[Test] Skip tokenizer tests if transformers is not in workspace (2744) (20a19fe2a) by vmoens ghstack-source-id: b92facfd14cba62511e7888567c94d3986419ab5
[Test] Str2StrEnv test (2725) (5fd509232) by vmoens ghstack-source-id: 45a0e5f4b33c4624758171b9fe31f1e3932ff5e4
[CI, BugFix] Py3.8 for old deps (2568) (f3275dab0) by vmoens ghstack-source-id: 13c7923c0e5c8725c12c3bacc6c21b250d9f7457
[CI] Change doc image (2632) (2511c04a5) by vmoens ghstack-source-id: eceab242294ec55135d79f29e848345a5d5d455e
[CI] Cuda 12.4 (2733) (37a514d6c) by vmoens ghstack-source-id: 2f3842a17d03e530add9608ee4525347a7c6a0e5
[CI] Fix Cairo-2 Chess import error (2743) (10f015e0c) by vmoens ghstack-source-id: c2bcbfc4522bd1b4f1fea3dbb006dc9552b09cb4
[CI] Fix docs upload (2587) (0f592266f) by vmoens ghstack-source-id: 49d7df06340fc432c29cd9f2d0ed2ae3d5619a38
[CI] Fix dreamer run in SOTA tests (2627) (aed03fda4) by vmoens ghstack-source-id: dfe3ab6fe0d29fcdcaf57f31f84d04e07e36bad3
[CI] Fix nightly build (2666) (133d70936) by vmoens ghstack-source-id: 5502fa94b6abcc154e020dcb165093fdc30ca025
[CI] Fix olddeps dm_control (2734) (3ac61270f) by vmoens ghstack-source-id: 750edcb8cd6b17167f77fb7c9ebd538608cfbde6
[CI] Fix windows build (2760) (03f56ffb0) by vmoens
[CI] Install stable torch/tv in docs when on release branch (2761) (57bdc6aec) by vmoens ghstack-source-id: 7c39c049c7cff0ee112be2d07597f2e291d2fafd
[CI] Local import of PIL (2720) (d628a507f) by vmoens ghstack-source-id: 6eb4ace11022632e902a7277dd51344bb9fe1f65
[CI] Longer timeout for windows (2765) (4c06ce2b8) by vmoens ghstack-source-id: 381e7e39d650e0178178a78076321a2210237b39
[CI] Make MAX_IDLE_COUNT a feature of tests (2752) (963f3cdf6) by vmoens ghstack-source-id: 2bf31dfff3d7862a54abeea86c8c5cc47a0f302d
[CI] Remove gym import in test_libs.py (2719) (f2cf5e044) by vmoens ghstack-source-id: b0474588cfc81ed135d70efb58203c0b503f4ff0
[CI] Revert upgrade of upload image in docs (2585) (236d38f8a) by vmoens ghstack-source-id: f323dd2667a073b6c763ed17a793ecd0eec6b7be
[CI] Upgrade GHA versions (2740) (cd4f359ef) by vmoens ghstack-source-id: 1876f1f0c18cb11c74edc9d96c17fdc985bc7b1a
[CI] Upgrade cu121 to cu124 (2764) (5da1f6522) by vmoens ghstack-source-id: 4b3c9c0c31a60a5e151ff13b21e54853dc426416
[CI] Upgrade to v0.7 (2745) (0ecfbe36e) by vmoens ghstack-source-id: e548bbbb4578d44a8eee000ab0a40c89713afc27
[CI] linux_job_v2.yml (2570) (527a26a27) by vmoens ghstack-source-id: ae13b53bd2885263e80019c087171421f5f7d0d5
[CI] minari[hf] (2722) (dda0df165) by vmoens ghstack-source-id: 6eb84d906dfbc66839706f328e214014aef7b65f
[CI] workflow permissions (2706) (b000685f3) by vmoens ghstack-source-id: f520a1b1e7697b1147cb29e66e2ecb1d07cb4cbc