D3rlpy

Latest version: v2.7.0

Safety actively analyzes 701442 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 5 of 7

0.61

CLI
`record` command is newly introduced in this version. You can record videos of evaluation episodes with the saved model.

$ d3rlpy record d3rlpy_logs/CQL_20210131144357/model_100.pt --env-id Hopper-v2


You can also use the wrapped environment.

$ d3rlpy record d3rlpy_logs/DQN_online_20210130170041/model_1000.pt \
--env-header 'import gym; from d3rlpy.envs import Atari; env = Atari(gym.make("BreakoutNoFrameskip-v4"), is_eval=True)'


bugfix
- fix saving models every step in `fit_online` method
- fix Atari wrapper to reproduce the paper result
- fix CQL and BEAR algorithms

0.60

logo
New logo images are made for d3rlpy 🎉

| standard | inverted |
|:-:|:-:|
|![image](https://user-images.githubusercontent.com/5235131/106005833-f5164c80-60f7-11eb-93d0-fc9ec467da75.png)|![d3rlpy_cover_narrow](https://user-images.githubusercontent.com/5235131/106005918-0eb79400-60f8-11eb-8cf7-9ab3b109eba7.png)|

ActionScaler
`ActionScaler` provides action scaling pre/post-processing for continuous control algorithms. Previously actions must be in between `[-1.0, 1.0]`. From now on, you don't need to care about the range of actions.

py
from d3rlpy.cql import CQL

cql = CQL(action_scaler='min_max') just pass action_scaler argument


handling timeout episodes
Episodes terminated by timeouts should not be clipped at bootstrapping. From this version, you can specify episode boundaries as well as the terminal flags.
py
from d3rlpy.dataset import MDPDataset

observations = ...
actions = ...
rewards = ...
terminals = ... this indicates the environmental termination
episode_terminals = ... this indicates episode boundaries

datasets = MDPDataset(observations, actions, rewards, terminals, episode_terminals)

if episode_terminals are omitted, terminals will be used to specify episode boundaries
datasets = MDPDataset(observations, actions, rewards, terminals)


In online training, you can specify this option via `timelimit_aware` flag.
py
from d3rlpy.sac import SAC

env = gym.make('Hopper-v2') make sure if the environment is wrapped by gym.wrappers.Timelimit

sac = SAC()
sac.fit_online(env, timelimit_aware=True) this flag is True by default

0.51

minor fix
- add `typing-extensions` depdency
- update MANIFEST.in

0.50

typing
Now, d3rlpy is fully type-annotated not only for the better use of this library but also for the better contribution experiences.
- `mypy` and `pylint` check the type consistency and code quality.
- due to a lot of changes to add type annotations, there might be degradation that is not detected by linters.

CLI
v0.50 introduces the new command-line interface, `d3rlpy` command that helps you to do more without any efforts. For now, `d3rlpy` provides the following commands.


plot CSV data
$ d3rlpy plot d3rlpy_logs/XXX/YYY.csv

plot CSV data
$ d3rlpy plot-all d3rlpy_logs/XXX

export the save model as inference formats (e.g. ONNX, TorchScript)
$ d3rlpy export d3rlpy_logs/XXX/model_YYY.pt


enhancements
- faster CPU to GPU transfer
- this change makes online training x2 faster
- make IQN Q function more precise based on the paper

documentation
- Add doc about SB3 integration ( thanks, araffin )

0.41

Algorithm
- Policy in Latent Action Space (PLAS)
- https://arxiv.org/abs/2011.07213

Off-Policy Evaluation
Off-policy evaluation (OPE) is a method to evaluate policy performance only with the offline dataset.

py
train policy
from d3rlpy.algos import CQL
from d3rlpy.datasets import get_pybullet
dataset, env = get_pybullet('hopper-bullet-mixed-v0')
cql = CQL()
cql.fit(dataset.episodes)

Off-Policy Evaluation
from d3rlpy.ope import FQE
from d3rlpy.metrics.scorer import soft_opc_scorer
from d3rlpy.metrics.scorer import initial_state_value_estimation_scorer
fqe = FQE(algo=cql)
fqe.fit(dataset.episodes,
eval_episodes=dataset.episodes
scorers={
'soft_opc': soft_opc_scorer(1000),
'init_value': initial_state_value_estimation_scorer
})


- Fitted Q-Evaluation
- https://arxiv.org/abs/2007.09055

Q Function Factory
d3rlpy provides flexible controls over Q functions through Q function factory. Following this change, the previous `q_func_type` argument was renamed to `q_func_factory`.

py
from d3rlpy.algos import DQN
from d3rlpy.q_functions import QRQFunctionFactory

initialize Q function factory
q_func_factory = QRQFunctionFactory(n_quantiles=32)

give it to algorithm object
dqn = DQN(q_func_factory=q_func_factory)

You can pass Q function name as string too.
py
dqn = DQN(q_func_factory='qr')


You can also make your own Q function factory. Currently, these are the supported Q function factory.

- [MeanQFunctionFactory](https://github.com/takuseno/d3rlpy/blob/502d85f1b786b125d91b4b15b5901e33a3c5cc1a/d3rlpy/q_functions.py#L52)
- [QRQFunctionFactory](https://github.com/takuseno/d3rlpy/blob/502d85f1b786b125d91b4b15b5901e33a3c5cc1a/d3rlpy/q_functions.py#L81)
- [IQNQFunctionFactory](https://github.com/takuseno/d3rlpy/blob/502d85f1b786b125d91b4b15b5901e33a3c5cc1a/d3rlpy/q_functions.py#L113)
- [FQFQFunctionFactory](https://github.com/takuseno/d3rlpy/blob/502d85f1b786b125d91b4b15b5901e33a3c5cc1a/d3rlpy/q_functions.py#L149)

EncoderFactory
- DenseNet architecture (only for vector observation)
- https://arxiv.org/abs/2010.09163

py
from d3rlpy.algos import DQN

dqn = DQN(encoder_factory='dense')


N-step TD calculation
d3rlpy supports N-step TD calculation for **ALL algorithms**. You can pass `n_steps` arugment to configure this parameters.
py
from d3rlpy.algos import DQN

dqn = DQN(n_steps=5) n_steps=1 by default


Paper reproduction scripts
d3rlpy supports many algorithms including online and offline paradigms. Originally, d3rlpy is designed for industrial practitioners. But, academic research is still important to push deep reinforcement learning forward. Currently, there are online DQN-variant reproduction codes.

- [DQN](https://github.com/takuseno/d3rlpy/blob/master/reproductions/online/dqn.py)
- [Double DQN](https://github.com/takuseno/d3rlpy/blob/master/reproductions/online/double_dqn.py)
- [QR-DQN](https://github.com/takuseno/d3rlpy/blob/master/reproductions/online/qr_dqn.py)
- [IQN](https://github.com/takuseno/d3rlpy/blob/master/reproductions/online/iqn.py)
- [FQF](https://github.com/takuseno/d3rlpy/blob/master/reproductions/online/fqf.py)

The evaluation results will be also available soon.

enhancements
- `build_with_dataset` and `build_with_env` methods are added to algorithm objects
- `shuffle` flag is added to `fit` method (thanks, jamartinh )

0.40

Algorithms
- Support the discrete version of Soft Actor-Critic
- https://arxiv.org/abs/1910.07207
- `fit_online` has `n_steps` argument instead of `n_epochs` for the complete reproduction of the papers.

OptimizerFactory
d3rlpy provides more flexible controls for optimizer configuration via `OptimizerFactory`.

py
from d3rlpy.optimizers import AdamFactory
from d3rlpy.algos import DQN

dqn = DQN(optim_factory=AdamFactory(weight_decay=1e-4))

See more at https://d3rlpy.readthedocs.io/en/v0.40/references/optimizers.html .

EncoderFactory
d3rlpy provides more flexible controls for the neural network architecture via `EncoderFactory`.

py
from d3rlpy.algos import DQN
from d3rlpy.encoders import VectorEncoderFactory

encoder factory
encoder_factory = VectorEncoderFactory(hidden_units=[300, 400], activation='tanh')

set OptimizerFactory
dqn = DQN(encoder_factory=encoder_factory)


Also you can build your own encoders.

py
import torch
import torch.nn as nn

from d3rlpy.encoders import EncoderFactory

your own neural network
class CustomEncoder(nn.Module):
def __init__(self, obsevation_shape, feature_size):
self.feature_size = feature_size
self.fc1 = nn.Linear(observation_shape[0], 64)
self.fc2 = nn.Linear(64, feature_size)

def forward(self, x):
h = torch.relu(self.fc1(x))
h = torch.relu(self.fc2(h))
return h

THIS IS IMPORTANT!
def get_feature_size(self):
return self.feature_size

your own encoder factory
class CustomEncoderFactory(EncoderFactory):
TYPE = 'custom' this is necessary

def __init__(self, feature_size):
self.feature_size = feature_size

def create(self, observation_shape, action_size=None, discrete_action=False):
return CustomEncoder(observation_shape, self.feature_size)

def get_params(self, deep=False):
return {
'feature_size': self.feature_size
}

dqn = DQN(encoder_factory=CustomEncoderFactory(feature_size=64))


See more at https://d3rlpy.readthedocs.io/en/v0.40/references/network_architectures.html .

Stable Baselines 3 wrapper
- Now d3rlpy is partially compatible with [Stable Baselines 3](https://github.com/DLR-RM/stable-baselines3).
- https://github.com/takuseno/d3rlpy/blob/master/d3rlpy/wrappers/sb3.py
- More documentations will be available soon.

bugfix
- fix the memory leak problem at `fit_online`.
- Now, you can train online algorithms with the big replay buffer size for the image observation.
- fix preprocessing at CQL.
- fix ColorJitter augmentation.

installation
PyPi
- From this version, d3rlpy officially supports Windows.
- The binary packages for each platform are built in GitHub Actions. And they are uploaded, which means that you don't have to install Cython to install this package from PyPi.

Anaconda
- From previous version, d3rlpy is available in conda-forge.

Page 5 of 7

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.