D3rlpy

Latest version: v2.7.0

Safety actively analyzes 701468 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 6 of 7

0.32

This version introduces hotfix.

- ⚠️ Fix the significant bug in the case of online training with image observation.

0.31

This version introduces minor changes.
- Move `n_epochs` arguments to `fit` method.
- Fix scikit-learn compatibility issues.
- Fix zero-division error during online training.

0.30

Algorithm
- Support Advantage-Weighted Actor-Critic (AWAC)
- https://arxiv.org/abs/2006.09359
- `fit_online` method is available as a convenient alias to `d3rlpy.online.iterators.train` function.
- unnormalizing action problem is fixed at AWR.

Metrics
- The following metrics are available.
- initial_state_value_estimation_scorer
- https://arxiv.org/abs/1906.01624
- soft_opc_scorer
- https://arxiv.org/abs/2007.09055

⚠️ MDPDataset
- `d3rlpy.dataset` module is now implemented with Cython in order to speed up memory copies.
- Following operations are significantly faster than the previous version.
- creating `TransitionMiniBatch` object
- frame stacking via `n_frames` argument
- lambda return calculation at AWR algorithms
- This change approximately makes Atari training 6% faster.

0.23

Algorithm
- Support Advantage-Weighted Regression (AWR)
- https://arxiv.org/abs/1910.00177
- `n_frames` option is added to all algorithms
- `n_frames` option controls frame stacking for image observation
- `eval_results_` property is added to all algorithms
- evaluation results can be retrieved from `eval_results_` after training.

MDPDataset
- `prev_transition` and `next_transition` properties are added to `d3rlpy.dataset.Transition`.
- these properties are used for frame stacking and Monte-Carlo returns calculation at AWR.

Document
- new tutorial page is added

0.22

Support ONNX export
Now, the trained policy can be exported as ONNX as well as TorchScript

py
cql.save_policy('policy.onnx', as_onnx=True)


Support more data augmentations
- data augmentations for vector obsrevation
- ColorJitter augmentation for image observation

0.2

- support model-based algorithm
- Model-based Offline Policy Optimization
- support data augmentation (for image observation)
- Data-reguralized Q-learning
- a lot of improvements
- more dataset statistics
- more options to customize neural network architecture
- optimize default learning rates
- etc

Page 6 of 7

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.