D3rlpy

Latest version: v2.7.0

Safety actively analyzes 701442 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 7

1.00

Currently, I'm benchmarking all algorithms with d4rl dataset. Through the experiments, I realized that it's very difficult to reproduce the table reported in the paper because they actually didn't reveal full hyper-parameters, which are tuned to each dataset. So I gave up reproducing the table, and start producing numbers with the official codes to see if d3rlpy's result matches.

1.0.0

It's proud to announce that v1.0.0 has been finally released! The first version was released in Aug 2020 under the support of the IPA MITOU program. At the first release, d3rlpy only supported a few algorithms and did not even support online training. After months of constructive feedbacks and insights from the users and the community, d3rlpy has been established as the first offline deep RL library with many online and offline algorithms support and unique features. The next chapter also starts towards the ambitious v2.0.0 today. Please stay tuned for the next announcement!

NeurIPS 2021 Offline RL Workshop
The workshop paper about d3rlpy has been presented at the NeurIPS 2021 Offline RL Workshop.

0.91

Algorithm
- TD3+BC
- https://arxiv.org/abs/2106.06860

RewardScaler
From this version, the preprocessors are available for the rewards, which allow you to normalize, standardize and clip the reward values.
py
import d3rlpy

normalize
cql = d3rlpy.algos.CQL(reward_scaler="min_max")

standardize
cql = d3rlpy.algos.CQL(reward_scaler="standardize")

clip (you can't use string alias)

0.90

Algorithm
- Conservative Offline Model-Based Optimization (COMBO)
- https://arxiv.org/abs/2102.08363

Drop data augmentation feature
From this version, the data augmentation feature has been dropped. The reason for this is that the feature introduces a lot of code complexity. In order to make d3rlpy support many algorithms and keep it as simple as possible, the feature was dropped. Instead, `TorchMiniBatch` was internally introduced, and all algorithms become more simple.

collect method
In offline RL experiments, data collection plays an important role especially when you try new tasks.
From this version, `collect` method is finally available.

py
import d3rlpy
import gym

prepare environment
env = gym.make('Pendulum-v0')

prepare algorithm
sac = d3rlpy.algos.SAC()

prepare replay buffer
buffer = d3rlpy.online.buffers.ReplayBuffer(maxlen=100000, env=env)

start data collection without updates
sac.collect(env, buffer)

export to MDPDataset
dataset = buffer.to_mdp_dataset()

save as file
dataset.dump('pendulum.h5')


Along with this change, random policies are also introduced. These are useful to collect dataset with random policy.
py
continuous action-space
policy = d3rlpy.algos.RandomPolicy()

discrete action-space
policy = d3rlpy.algos.DiscreteRandomPolicy()


Enhancements
- CQL and BEAR become closer to the official implementations
- `callback` argument has been added to algorithms
- random dataset has been added to cartpole and pendulum dataset
- you can specify it via `dataset_type='random'` at `get_cartpole` and `get_pendulum` method

Bugfix
- fix action normalization at `predict_value` method (thanks, navidmdn )
- fix seed settings at reproduction codes

0.80

Algorithms
New algorithms are introduced in this version.

- Critic Regularized Regression (CRR)
- https://arxiv.org/abs/2006.15134
- Model-based Offline Policy Optimization (MOPO)
- https://arxiv.org/abs/2005.13239

Model-based RL
Previously, model-based RL has been supported. The model-based specific logic was implemented in `dynamics` side. This approach enabled us to combine model-based algorithms with arbitrary model-free algorithms. However, this requires complex designs to implement the recent model-based RL. So, the dynamics interface was refactored and the MOPO is the first algorithm to show how d3rlpy supports model-based RL algorithms.

py
train dynamics model
from d3rlpy.datasets import get_pendulum
from d3rlpy.dynamics import ProbabilisticEnsembleDynamics
from d3rlpy.metrics.scorer import dynamics_observation_prediction_error_scorer
from d3rlpy.metrics.scorer import dynamics_reward_prediction_error_scorer
from d3rlpy.metrics.scorer import dynamics_prediction_variance_scorer
from sklearn.model_selection import train_test_split

dataset, _ = get_pendulum()

train_episodes, test_episodes = train_test_split(dataset)

dynamics = d3rlpy.dynamics.ProbabilisticEnsembleDynamics(learning_rate=1e-4, use_gpu=True)

dynamics.fit(train_episodes,
eval_episodes=test_episodes,
n_epochs=100,
scorers={
'observation_error': dynamics_observation_prediction_error_scorer,
'reward_error': dynamics_reward_prediction_error_scorer,
'variance': dynamics_prediction_variance_scorer,
})

train Model-based RL algorithm
from d3rlpy.algos import MOPO

give mopo as generator argument.
mopo = MOPO(dynamics=dynamics)

mopo.fit(dataset, n_steps=100000)


enhancements
- `fitter` method has been implemented (thanks jamartinh )
- `tensorboard_dir` repleces `tensorboard` flag at `fit` method (thanks navidmdn )
- show warning messages when the unused arguments are passed
- show comprehensive error messages when action-space is not compatible
- `fit` method accepts `MDPDataset` object
- `dropout` option has been implemented in encoders
- add appropriate `__repr__` methods to show pretty outputs when `print(algo)`
- metrics collection is refactored

bugfix
- fix `core dumped` errors by fixing numpy version
- fix CQL backup

0.70

import d3rlpy
dataset = d3rlpy.datasets.get_cartpole()


new logger style
From this version, `structlog` is internally used to print information instead of raw `print` function. This allows us to emit more structural information. Furthermore, you can control what to show and what to save to the file if you overwrite logger configuration.

![image](https://user-images.githubusercontent.com/5235131/108342268-f7af2200-721d-11eb-98d4-4cf09277ce8e.png)



enhancements
- `soft_q_backup` option is added to `CQL`.
- `Paper Reproduction` page has been added to the documentation in order to show the performance with the paper configuration.
- `commit` method at `D3RLPyLogger` returns metrics (thanks, jamartinh )

bugfix
- fix `epoch` count in offline training.
- fix `total_step` count in online training.
- fix typos at documentation (thanks, pstansell )

Page 4 of 7

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.