- ⚠️ Fix the significant bug in the case of online training with image observation.
0.31
This version introduces minor changes. - Move `n_epochs` arguments to `fit` method. - Fix scikit-learn compatibility issues. - Fix zero-division error during online training.
0.30
Algorithm - Support Advantage-Weighted Actor-Critic (AWAC) - https://arxiv.org/abs/2006.09359 - `fit_online` method is available as a convenient alias to `d3rlpy.online.iterators.train` function. - unnormalizing action problem is fixed at AWR.
Metrics - The following metrics are available. - initial_state_value_estimation_scorer - https://arxiv.org/abs/1906.01624 - soft_opc_scorer - https://arxiv.org/abs/2007.09055
⚠️ MDPDataset - `d3rlpy.dataset` module is now implemented with Cython in order to speed up memory copies. - Following operations are significantly faster than the previous version. - creating `TransitionMiniBatch` object - frame stacking via `n_frames` argument - lambda return calculation at AWR algorithms - This change approximately makes Atari training 6% faster.
0.23
Algorithm - Support Advantage-Weighted Regression (AWR) - https://arxiv.org/abs/1910.00177 - `n_frames` option is added to all algorithms - `n_frames` option controls frame stacking for image observation - `eval_results_` property is added to all algorithms - evaluation results can be retrieved from `eval_results_` after training.
MDPDataset - `prev_transition` and `next_transition` properties are added to `d3rlpy.dataset.Transition`. - these properties are used for frame stacking and Monte-Carlo returns calculation at AWR.
Document - new tutorial page is added
0.22
Support ONNX export Now, the trained policy can be exported as ONNX as well as TorchScript
py cql.save_policy('policy.onnx', as_onnx=True)
Support more data augmentations - data augmentations for vector obsrevation - ColorJitter augmentation for image observation
0.2
- support model-based algorithm - Model-based Offline Policy Optimization - support data augmentation (for image observation) - Data-reguralized Q-learning - a lot of improvements - more dataset statistics - more options to customize neural network architecture - optimize default learning rates - etc