D3rlpy

Latest version: v2.7.0

Safety actively analyzes 701442 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 7

2.0.3

An emergency patch to fix a bug of `predict_value` method 297 .

2.0.2

The major update has been finally released! Since the start of the project, this project has earned almost 1K GitHub stars :star: , which is a great milestone of d3rlpy. In this update, there are many major changes.

Upgrade Gym version
From this version, d3rlpy only supports the latest Gym version `0.26.0`. This change allows us to support `Gymnasium` in the future update.

Algorithm

Clear separation between configuration and algorithm
From this version, each algorithm (e.g. "DQN") has a config class (e.g. "DQNConfig"). This allows us to serialize and deserialize algorithms as described later.

py
dqn = d3rlpy.algos.DQNConfig(learning_rate=3e-4).create(device="cuda:0")


Decision Transformer
`Decision Transformer` is finally available! You can check [reproduction](https://github.com/takuseno/d3rlpy/blob/master/reproductions/offline/decision_transformer.py) code to see how to use it.

py
import d3rlpy

dataset, env = d3rlpy.datasets.get_pendulum()

dt = d3rlpy.algos.DecisionTransformerConfig(
batch_size=64,
learning_rate=1e-4,
optim_factory=d3rlpy.models.AdamWFactory(weight_decay=1e-4),
encoder_factory=d3rlpy.models.VectorEncoderFactory(
[128],
exclude_last_activation=True,
),
observation_scaler=d3rlpy.preprocessing.StandardObservationScaler(),
reward_scaler=d3rlpy.preprocessing.MultiplyRewardScaler(0.001),
context_size=20,
num_heads=1,
num_layers=3,
warmup_steps=10000,
max_timestep=1000,
).create(device="cuda:0")

dt.fit(
dataset,
n_steps=100000,
n_steps_per_epoch=1000,
save_interval=10,
eval_env=env,
eval_target_return=0.0,
)


Serialization
In this version, d3rlpy introduces a compact serialization, `d3` format, that includes both hyperparameters and model parameters in a single file. This makes it possible for you to easily save checkpoints and reconstruct algorithms for evaluation and deployment.

py
import d3rlpy

dataset, env = d3rlpy.datasets.get_cartpole()

dqn = d3rlpy.algos.DQNConfig().create()

dqn.fit(dataset, n_steps=10000)

save as d3 file
dqn.save("model.d3")

reconstruct the exactly same DQN
new_dqn = d3rlpy.load_learnable("model.d3")


ReplayBuffer
From this version, there is no clear separation between `ReplayBuffer` and `MDPDataset` anymore. Instead, `ReplayBuffer` has unlimited flexibility to support any kinds of algorithms and experiments. Please check details at [documentation](https://d3rlpy.readthedocs.io/en/v2.0.2/references/dataset.html).

2.0.0

- Sophisticated config system using `dataclasses`
- Dump configuration and model parameters in a single file
- Change MDPDataset format to align with D4RL datasets
- Support large dataset
- Support tuple observation
- Support large-scale data-parallel offline training
- Support large-scale distributed online training
- Support Transformer architecture (e.g. Decision Transformer)
- Speed up training with `torch.jit.script` and [CUDA Graphs](https://pytorch.org/blog/accelerating-pytorch-with-cuda-graphs/)
- Change library name to represent the unification of offline and online

1.1.1

Benchmark
The benchmark results of IQL and NFQ have been added to [d3rlpy-benchmarks](https://github.com/takuseno/d3rlpy-benchmarks). Plus, the results of the more random seeds up to 10 have been added to all algorithms. The benchmark results are more reliable now.

Documentation
- More descriptions have been added to `Finetuning` tutorial page.
- `Offline Policy Selection` tutorial page has been added

Enhancements
- `cloudpickle` and `GPUUtil` dependencies have been removed.
- gaussian likelihood computation for MOPO becomes more mathematically right (thanks tominku )

1.1.0

MDPDataset
The timestep alignment is now exactly the same as D4RL:

observations = [o_1, o_2, ..., o_n]
observations = np.random.random((1000, 10))

actions = [a_1, a_2, ..., a_n]
actions = np.random.random((1000, 10))

rewards = [r(o_1, a_1), r(o_2, a_2), ...]
rewards = np.random.random(1000)

terminals = [t(o_1, a_1), t(o_2, a_2), ...]
terminals = ...

where `r(o, a)` is the reward function and `t(o, a)` is the terminal function.

The reason of this change is that the many users were confused with the difference between d3rlpy and D4RL. But, now it's aligned in the same way. This change might break your dataset.

Algorithms
- Neural Fitted Q-iteration (NFQ)
- https://link.springer.com/chapter/10.1007/11564096_32

Enhancements
- AWAC, CRR and IQL use a non-squashed gaussian policy function.
- The more tutorial pages have been added to the documentation.
- The software design page has been added to the documentation.
- The reproduction script for IQL has been added.
- The progress bar in online training is visually improved in Jupyter Notebook 161 (thanks, aiueola )
- The nan checks have been added to `MDPDataset`.
- The `target_reduction_type` and `bootstrap` options have been removed.

Bugfix
- The unnecessary test conditions have been removed
- Typo in `dataset.pyx` has been fixed 167 (thanks, zbzhu99 )
- The details of IQL implementation have been fixed.

1.0

copy_policy_from and copy_q_function_from methods
In the scenario of finetuning, you might want to initialize SAC's policy function with the pretrained CQL's policy function to boost the initial performance. From this version, you can do that as follows:
py
import d3rlpy

pretrain with static dataset
cql = d3rlpy.algos.CQL()
cql.fit(...)

transfer the policy function
sac = d3rlpy.algos.SAC()
sac.copy_policy_from(cql)

you can also transfer the Q-function
sac.copy_q_function_from(cql)

finetuning with online algorithm
sac.fit_online(...)


Enhancements
- show messages for skipping model builds
- add `alpha` parameter option to `DiscreteCQL`
- keep counting the number of gradient steps
- allow expanding MDPDataset with the larger discrete actions (thanks, jamartinh )
- `callback` function is called every gradient step (previously, it's called every epoch)

Bugfix
- FQE's loss function has been fixed (thanks for the report, guyk1971)
- fix documentation build (thanks, astrojuanlu)
- fix d4rl dataset conversion for MDPDataset (this will have a significant impact on the performance for d4rl dataset)

Page 3 of 7

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.