D3rlpy

Latest version: v2.4.0

Safety actively analyzes 622894 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 6

2111.03788

Benchmarks
The full benchmark results are finally available at [d3rlpy-benchmarks](https://github.com/takuseno/d3rlpy-benchmarks).

Algorithms
- Implicit Q-Learning (IQL)
- https://arxiv.org/abs/2110.06169

Enhancements
- `deterministic` option is added to `collect` method
- `rollout_return` metrics is added to online training
- `random_steps` is added to `fit_online` method
- `--save` option is added to `d3rlpy` CLI commands (thanks, pstansell )
- `multiplier` option is added to reward normalizers
- many reproduction scripts are added
- `policy_type` option is added to BC
- `get_atari_transition` function is added for the Atari 2600 offline benchmark procedure

Bugfix
- document fix (thanks, araffin )
- Fix TD3+BC's actor loss function
- Fix gaussian noise for TD3 exploration

1712.00378

batch online training
When training with computationally expensive environments such as robotics simulators or rich 3D games, it will take a long time to finish due to the slow environment steps.
To solve this, d3rlpy supports batch online training.
py
from d3rlpy.algos import SAC
from d3rlpy.envs import AsyncBatchEnv

if __name__ == '__main__': this is necessary if you use AsyncBatchEnv
env = AsyncBatchEnv([lambda: gym.make('Hopper-v2') for _ in range(10)]) distributing 10 environments in different processes

sac = SAC(use_gpu=True)
sac.fit_batch_online(env) train with 10 environments concurrently

docker image
Pre-built d3rlpy docker image is available in [DockerHub](https://hub.docker.com/repository/docker/takuseno/d3rlpy).

$ docker run -it --gpus all --name d3rlpy takuseno/d3rlpy:latest bash

enhancements
- `BEAR` algorithm is updated based on the official implementation
- new `mmd_kernel` option is available
- `to_mdp_dataset` method is added to `ReplayBuffer`
- `ConstantEpsilonGreedy` explorer is added
- `d3rlpy.envs.ChannelFirst` wrapper is added (thanks for reporting, feyza-droid )
- new dataset utility function `d3rlpy.datasets.get_d4rl` is added
- this is handling timeouts inside the function
- offline RL paper reproduction codes are added
- smoothed moving average plot at `d3rlpy plot` CLI function (thanks, pstansell )
- user-friendly messages for assertion errors
- better memory consumption
- `save_interval` argument is added to `fit_online`

bugfix
- core dumps are fixed in Google Colaboratory tutorials
- typos in some documentations (thanks for reporting, pstansell )

2.4.0

Tuple observations
In v2.4.0, d3rlpy supports tuple observations.
py
import numpy as np
import d3rlpy

observations = [np.random.random((1000, 100)), np.random.random((1000, 32))]
actions = np.random.random((1000, 4))
rewards = np.random.random((1000, 1))
terminals = np.random.randint(2, size=(1000, 1))
dataset = d3rlpy.dataset.MDPDataset(
observations=observations,
actions=actions,
rewards=rewards,
terminals=terminals,
)

You can find an example script [here](https://github.com/takuseno/d3rlpy/blob/master/examples/tuple_observation.py)

Enhancements
- `logging_steps` and `logging_strategy` options have been added to `fit` and `fit_online` methods (thanks, claudius-kienle )
- Logging with WanDB has been supported. (thanks, claudius-kienle )
- Goal-conditioned envs in Minari have been supported.

Bugfix
- Fix errors for distributed training.
- OPE documentation has been fixed.

2.3.0

Distributed data parallel training
Distributed data parallel training with multiple nodes and GPUs has been one of the most demanded feature. Now, it's finally available! It's extremely easy to use this feature.

Example:
py
train.py
from typing import Dict

import d3rlpy

def main() -> None:
GPU version:
rank = d3rlpy.distributed.init_process_group("nccl")
rank = d3rlpy.distributed.init_process_group("gloo")
print(f"Start running on rank={rank}.")

GPU version:
device = f"cuda:{rank}"
device = "cpu:0"

setup algorithm
cql = d3rlpy.algos.CQLConfig(
actor_learning_rate=1e-3,
critic_learning_rate=1e-3,
alpha_learning_rate=1e-3,
).create(device=device)

prepare dataset
dataset, env = d3rlpy.datasets.get_pendulum()

disable logging on rank != 0 workers
logger_adapter: d3rlpy.logging.LoggerAdapterFactory
evaluators: Dict[str, d3rlpy.metrics.EvaluatorProtocol]
if rank == 0:
evaluators = {"environment": d3rlpy.metrics.EnvironmentEvaluator(env)}
logger_adapter = d3rlpy.logging.FileAdapterFactory()
else:
evaluators = {}
logger_adapter = d3rlpy.logging.NoopAdapterFactory()

start training
cql.fit(
dataset,
n_steps=10000,
n_steps_per_epoch=1000,
evaluators=evaluators,
logger_adapter=logger_adapter,
show_progress=rank == 0,
enable_ddp=True,
)

d3rlpy.distributed.destroy_process_group()

if __name__ == "__main__":
main()

You need to use `torchrun` command to start training, which should be already installed once you install PyTorch.

$ torchrun \
--nnodes=1 \
--nproc_per_node=3 \
--rdzv_id=100 \
--rdzv_backend=c10d \
--rdzv_endpoint=localhost:29400 \
train.py

In this case, 3 processes will be launched and start training loop. `DecisionTransformer`-based algorithms also support this distributed training feature.

The example is also available [here](https://github.com/takuseno/d3rlpy/blob/master/examples/distributed_offline_training.py)

Minari support (thanks, grahamannett !)
[Minari](https://github.com/Farama-Foundation/Minari) is an OSS library to provide a standard format of offline reinforcement learning datasets. Now, d3rlpy provides an easy access to this library.

You can install Minari via d3rlpy CLI.

$ d3rlpy install minari

Example:
py
import d3rlpy

dataset, env = d3rlpy.datasets.get_minari("antmaze-umaze-v0")

iql = d3rlpy.algos.IQLConfig(
actor_learning_rate=3e-4,
critic_learning_rate=3e-4,
batch_size=256,
weight_temp=10.0,
max_weight=100.0,
expectile=0.9,
reward_scaler=d3rlpy.preprocessing.ConstantShiftRewardScaler(shift=-1),
).create(device="cpu:0")

iql.fit(
dataset,
n_steps=1000000,
n_steps_per_epoch=100000,
evaluators={"environment": d3rlpy.metrics.EnvironmentEvaluator(env)},
)

Minimize redundant computes
From this version, calculation of some algorithms are optimized to remove redundant inference. Therefore, especially algorithms with dual optimization such as `SAC` and `CQL` became extremely faster than the previous version.

Enhancements
- `GoalConcatWrapper` has been added to support goal-conditioned environments.
- `return_to_go` has been added to `Transition` and `TransitionMiniBatch`
- `MixedReplayBuffer` has been added to sample two experiences from multiple buffers with arbitrary ratio.
- `initial_temperature` supports 0 at `DiscreteSAC`.

Bugfix
- Getting started page has been fixed.

2.2.0

Algorithm
`DiscreteDecisionTransformer`, a Decision Transformer implementation for discrete action-space, has been finally implemented in v2.2.0! The reduction results with Atari 2600 are available [here](https://github.com/takuseno/d3rlpy-benchmarks/blob/main/atari_table.csv).

py
import d3rlpy

dataset, env = d3rlpy.datasets.get_cartpole()

dt = d3rlpy.algos.DiscreteDecisionTransformerConfig(
batch_size=64,
num_heads=1,
learning_rate=1e-4,
max_timestep=1000,
num_layers=3,
position_encoding_type=d3rlpy.PositionEncodingType.SIMPLE,
encoder_factory=d3rlpy.models.VectorEncoderFactory([128], exclude_last_activation=True),
observation_scaler=d3rlpy.preprocessing.StandardObservationScaler(),
context_size=20,
warmup_tokens=100000,
).create()

dt.fit(
dataset,
n_steps=100000,
n_steps_per_epoch=1000,
eval_env=env,
eval_target_return=500,
)

Enhancement
- Expose `action_size` and `action_space` options for manual dataset creation 338
- `FrameStackTrajectorySlicer` has been added.

Refactoring
- Typing check of `numpy` is enabled. Some parts of codes differentiate data types of numpy arrays, which is checked by mypy.

Bugfix
- Device error at AWAC 341
- Invalid `batch.intervals` 346
- :warning: This fix is important to retain the performance of Q-learning algorithms since v1.1.1.

2.1.0

Upgrade PyTorch to v2
From this version, d3rlpy requires PyTorch v2 (v1 still may partially work). To do this, the minimum Python version has been bumped to 3.8. This change allows d3rlpy to utilize more advanced features such as `torch.compile` in the upcoming releases.

Healthcheck
From this version, d3rlpy diagnoses dependency health automatically. In this version, the version of `Gym` is checked to make sure you have installed the correct version of `Gym`.

Gymnasium support
d3rlpy now supports `Gymnasium` as well as `Gym`. You can use it just same as `Gym`. Please check [example](https://github.com/takuseno/d3rlpy/blob/master/examples/gymnasium_env.py) for the further details.

d3rlpy install command
To make your life easier, d3rlpy provides `d3rlpy install` commands to install additional dependencies. This is the part of `d3rlpy` CLI. Please check [docs](https://d3rlpy.readthedocs.io/en/v2.1.0/cli.html#install) for the further details.

$ d3rlpy install atari Atari 2600 dependencies
$ d3rlpy install d4rl_atari Atari 2600 + d4rl-atari dependencies
$ d3rlpy install d4rl D4RL dependencies

Refactoring
In this version, the internal design has been refactored. The algorithm implementation and the way to assign models are mainly refactored. :warning: Because of this change, the previously saved models might be incompatible to load in this version.

Enhancement
- Added Jupyter Notebook for TPU on Google Colaboratory.
- Added `d3rlpy.notebook_utils` to provide utilities for Jupyter Notebook.
- Updated notebook link 313 (thanks asmith26 !)

Bugfix
- Fixed typo docstrings 316 (thanks asmith26 !)
- Fixed docker build 311 (thanks HassamSheikh !)

Page 1 of 6

Releases

Has known vulnerabilities

D3rlpy

Page 1 of 6

2111.03788

1712.00378

2.4.0

2.3.0

2.2.0

2.1.0

Page 1 of 6

Links

Releases