TorchRL Initial Alpha Release
TorchRL is the soon-to-be official RL domain library for PyTorch.
It contains primitives that are aimed at covering most of the modern RL research space.
Getting started with the library
Installation
The library can be installed through
$ pip install torchrl
Currently, torchrl wheels are provided for linux and macos (not M1) machines. For other architectures or for the latest features, refer to the [README.md](README.md) and [CONTRIBUTING.md](CONTRIBUTING.md) files for advanced installation instructions.
Environments
TorchRL currently supports gym and dm_control out-of-the-box. To create a gym wrapped environment, simply use
python
from torchrl.envs import GymEnv, GymWrapper
env = GymEnv("Pendulum-v1")
similarly
env = GymWrapper(gym.make("Pendulum-v1"))
Environment can be transformed using the `torchrl.envs.transforms` module. See the [environment tutorial](tutorials/envs.ipynb) for more information.
The [`ParallelEnv`](torchrl/envs/vec_env.py) allows to run multiple environments in parallel.
Policy and modules
TorchRL modules interacts using `TensorDict`, a new data carrier class. Although it is not necessary to use it and one can find workarounds for it, we advise to use the [`TensorDictModule`](torchrl/modules/tensordict_module) class to read tensordicts:
python
from torchrl.modules import TensorDictModule
>>> policy_module = nn.Linear(n_obs, n_act)
>>> policy = TensorDictModule(policy_module,
... in_keys=["observation"], keys to be read for the module input
... out_keys=["action"], keys to be written with the module output
)
>>> tensordict = env.reset()
>>> tensordict = policy(tensordict)
>>> action = tensordict["action"]
By using `TensorDict` and `TensorDictModule`, you can make sure that your algorithm is robust to changes in configuration (e.g. usage of an RNN for the policy, exploration strategies etc.) `TensorDict` instances can be reshaped in several ways, cast to device, updated, shared among processes, stacked, concatenated etc.
Some specialized `TensorDictModule` are implemented for convenience: `Actor`, `ProbabilisticActor`, `ValueOperator`, `ActorCriticOperator`, `ActorCriticWrapper` and `QValueActor` can be found in [actors.py](torchrl/modules/tensordict_module/actors.py).
Collecting data
[DataColllectors](torchrl/collectors/collectors.py) is the TorchRL data loading class family. We provide single process, sync and async multiprocess loaders. We also provide [`ReplayBuffers`](torchrl/data/replay_buffers) that can be stored in memory or on disk using the various [storage](torchrl/data/replay_buffers/storages.py) options.
Loss modules and advantage computation
[Loss modules](torchrl/objectives/costs) are provided for each algorithm class independently. They are accompanied by efficient implementations of [value and advantage computation](https://github.com/facebookresearch/rl/tree/main/torchrl/objectives/returns) functions.
TorchRL is devoted to be fully compatible with [functorch](https://github.com/pytorch/functorch), the functional programming PyTorch library.
Examples
A bunch of examples are provided as well. Check the [`examples`](examples) directory to learn more about exploration strategies, loss modules etc.