nnabla-rl Changelog

0.14.0

special notes

This version does NOT support the version v0.26.0 and greater of [openai gym](https://github.com/openai/gym).
We're going to support [openai gym](https://github.com/openai/gym) version v0.26.0 and greater in the next release of nnablaRL. nnablaRL will stop officially supporting version less than v0.26.0 of [openai gym](https://github.com/openai/gym) from the next release.

release-note-bugfix
- [Fixing testing code errors](https://github.com/sony/nnabla-rl/pull/103)
- [Fix Deprecation error when using PendulumEnv](https://github.com/sony/nnabla-rl/pull/104)
- [Pass id as positional arg to avoid unexpected error on old gym](https://github.com/sony/nnabla-rl/pull/106)
- [fix evaluation script](https://github.com/sony/nnabla-rl/pull/107)

release-note-algorithm
- [Add xql](https://github.com/sony/nnabla-rl/pull/101)
- [Support mixed (discrete and continuous) state env](https://github.com/sony/nnabla-rl/pull/109)
- [Support tuple action](https://github.com/sony/nnabla-rl/pull/110)
- [Add begin_of_episode to warmup_action](https://github.com/sony/nnabla-rl/pull/111)

release-note-utility
- [improve plot script](https://github.com/sony/nnabla-rl/pull/108)

release-note-docs
- [Apply docformatter](https://github.com/sony/nnabla-rl/pull/100)

release-note-samples
- [Add sample rl project template](https://github.com/sony/nnabla-rl/pull/105)

Install the latest nnablaRL by:

pip install nnabla-rl

0.13.0

special notes

- This version does NOT support the version v0.26.0 and greater of [openai gym](https://github.com/openai/gym).
- We're going to support [openai gym](https://github.com/openai/gym) version v0.26.0 and greater in the next release of nnablaRL. nnablaRL will stop officially supporting version less than v0.26.0 of [openai gym](https://github.com/openai/gym) from the next release.

release-note-bugfix
- [Fix colab notebook](https://github.com/sony/nnabla-rl/pull/93)
- [Add pygame installation for colab demos](https://github.com/sony/nnabla-rl/pull/94)
- [Fix float max/min value range of gaussian explorer](https://github.com/sony/nnabla-rl/pull/99)

release-note-algorithm
- [Implement SACD](https://github.com/sony/nnabla-rl/pull/92)
- [Add decision transformer](https://github.com/sony/nnabla-rl/pull/96)

release-note-utility
- [Add progress bar hook](https://github.com/sony/nnabla-rl/pull/95)
- [Add accepted shape decorator](https://github.com/sony/nnabla-rl/pull/98)

release-note-docs
- [Add docformatter]( https://github.com/sony/nnabla-rl/pull/97)

Install the latest nnablaRL by:

pip install nnabla-rl

0.12.0

special notes

- This version does NOT support the version v0.26.0 and greater of [openai gym](https://github.com/openai/gym).
- We're going to support [openai gym](https://github.com/openai/gym) version v0.26.0 and greater in the next release of nnablaRL. nnablaRL will stop officially supporting version less than v0.26.0 of [openai gym](https://github.com/openai/gym) from the next release.
- [Only support python 3.7 or greater](https://github.com/sony/nnabla-rl/pull/90)
- Python 3.6 is not supported from this new release

release-note-bugfix
- [Fix algos. Properly apply grad clip and weight decay](https://github.com/sony/nnabla-rl/pull/73)
- [Correct variable to use during rnn training](https://github.com/sony/nnabla-rl/pull/75)
- [Check np_random instance and use correct randint alternative](https://github.com/sony/nnabla-rl/pull/79)
- [Fix pendulum-env render](https://github.com/sony/nnabla-rl/pull/84)
- [Fix ScreenRenderEnv to support gym 0.25.0](https://github.com/sony/nnabla-rl/pull/87)

release-note-algorithm
- [Run PPO on single process when actor num is 1](https://github.com/sony/nnabla-rl/pull/70)
- [Add qrsac algorithm](https://github.com/sony/nnabla-rl/pull/71)
- [Add REDQ algorithm](https://github.com/sony/nnabla-rl/pull/72)
- [Update to support discrete tuple](https://github.com/sony/nnabla-rl/pull/76)
- [Add icra2018 qtopt](https://github.com/sony/nnabla-rl/pull/77)
- [Add goal_env module](https://github.com/sony/nnabla-rl/pull/78)
- [Add PPO tuple state support](https://github.com/sony/nnabla-rl/pull/80)
- [Add iLQR and LQR](https://github.com/sony/nnabla-rl/pull/81)
- [Add mppi](https://github.com/sony/nnabla-rl/pull/82)
- [Add ddp](https://github.com/sony/nnabla-rl/pull/83)

release-note-distributions
- [Add gmm and Update gaussian](https://github.com/sony/nnabla-rl/pull/89)

release-note-utility
- [Support nnabla-browser](https://github.com/sony/nnabla-rl/pull/85)

release-note-docs
- [Fix module path of sac](https://github.com/sony/nnabla-rl/pull/74)
- [Improve README with graph visulization feature with nnabla-browser](https://github.com/sony/nnabla-rl/pull/88)

release-note-build
- [Extend github build timelimit to 5 minutes](https://github.com/sony/nnabla-rl/pull/86)

Install the latest nnablaRL by:

pip install nnabla-rl

0.11.0

release-note-bugfix
- [Fix readme of reproduction](https://github.com/sony/nnabla-rl/pull/37)
- [Fix cem test](https://github.com/sony/nnabla-rl/pull/38)
- [Fix README samples and add prerequisites for Atari reproduction codes](https://github.com/sony/nnabla-rl/pull/53)
- [Fix tutorial-model](https://github.com/sony/nnabla-rl/pull/62)
- [Fix add workaround to avoid gym error](https://github.com/sony/nnabla-rl/pull/64)

release-note-algorithm
- [Add ATRPO](https://github.com/sony/nnabla-rl/pull/39)
- [Add implementation for RNN support and DRQN algorithm](https://github.com/sony/nnabla-rl/pull/42),
[Support RNN models on DQN and DQN inherited algorithms](https://github.com/sony/nnabla-rl/pull/48),
[Follow DRQN author's implementation and update results](https://github.com/sony/nnabla-rl/pull/52)
- [Expand RNN support to dist rl algorithms](https://github.com/sony/nnabla-rl/pull/49)
- [Add rnn support to actor critic algorithms](https://github.com/sony/nnabla-rl/pull/50)
- [Support n-step q learning in ddpg, td3, her, sac](https://github.com/sony/nnabla-rl/pull/44) and [ICML2018SAC](https://github.com/sony/nnabla-rl/pull/51)
- [Stop back propagating to target v function](https://github.com/sony/nnabla-rl/pull/47)
- [Add MME-SAC algorithm and Sparse/Delayed mujoco environment](https://github.com/sony/nnabla-rl/pull/56) and
[Add Disentangled version of MME-SAC](https://github.com/sony/nnabla-rl/pull/57)

release-note-functions
- [Add stop gradient function](https://github.com/sony/nnabla-rl/pull/60)
- [Add random shooting](https://github.com/sony/nnabla-rl/pull/63)
- [Update cem function interface](https://github.com/sony/nnabla-rl/pull/67)

release-note-distributions
- [Add Bernoulli distribution](https://github.com/sony/nnabla-rl/pull/58)
- [Enable sampling from multidimensional logits](https://github.com/sony/nnabla-rl/pull/59)
- [Add one hot softmax](https://github.com/sony/nnabla-rl/pull/61)

release-note-utility
- [Support batched states for evaluation](https://github.com/sony/nnabla-rl/pull/54)
- [Add convenient episode result env](https://github.com/sony/nnabla-rl/pull/65)
- [Add profile function](https://github.com/sony/nnabla-rl/pull/68)

release-note-docs
- [Update version in algorithm catalog](https://github.com/sony/nnabla-rl/pull/36)
- [Add readthedocs yaml](https://github.com/sony/nnabla-rl/pull/40) and [Fixed yaml file](https://github.com/sony/nnabla-rl/pull/41)
- [Add HER and IQN to algorithm catalog](https://github.com/sony/nnabla-rl/pull/45)

Install the latest nnablaRL by:

pip install nnabla-rl

0.10.0

release-note-bugfix

- [Fix interactive-demos used in colab](https://github.com/sony/nnabla-rl/pull/2) and [Fix interactive-demos used in colab about gpu id](https://github.com/sony/nnabla-rl/pull/4)

release-note-algorithm

- [Add HER](https://github.com/sony/nnabla-rl/pull/32)
- [Add Rainbow](https://github.com/sony/nnabla-rl/pull/29)
- [Fix algorithm reproduction directory path](https://github.com/sony/nnabla-rl/pull/27)
- [Add rank-based prioritized replay](https://github.com/sony/nnabla-rl/pull/24)
- [Add Double Dqn](https://github.com/sony/nnabla-rl/pull/21)
- [Move algorithms reproduction dir to reproductions/algorithms](https://github.com/sony/nnabla-rl/pull/20)
- [Enable injecting explorer to algorithm](https://github.com/sony/nnabla-rl/pull/18)
- [Support multi-step Q learning](https://github.com/sony/nnabla-rl/pull/17)
- [Add Categorical Double Dqn](https://github.com/sony/nnabla-rl/pull/16)
- [Add c51 all atari game results](https://github.com/sony/nnabla-rl/pull/15)
- [Support Tuple State](https://github.com/sony/nnabla-rl/pull/14) and [Update compute_v_target_and_advantage to support tuple state](https://github.com/sony/nnabla-rl/pull/23)

release-note-parametric_functions

- [Add spatial_softmax function](https://github.com/sony/nnabla-rl/pull/25) and [Add spatial softmax docs](https://github.com/sony/nnabla-rl/pull/26)
- [Add noisy net](https://github.com/sony/nnabla-rl/pull/11)

release-note-functions

- [Add batch_flatten function](https://github.com/sony/nnabla-rl/pull/13)
- [Add triangular_matrix function](https://github.com/sony/nnabla-rl/pull/8)

release-note-utility

- [Fix load_snapshot](https://github.com/sony/nnabla-rl/pull/28)

release-note-docs

- [Fix docs typo](https://github.com/sony/nnabla-rl/pull/22)
- [Fix typo in readme](https://github.com/sony/nnabla-rl/pull/19)
- [Display correct version](https://github.com/sony/nnabla-rl/pull/12)
- [Fix numpy array typing to np.ndarray](https://github.com/sony/nnabla-rl/pull/10)
- [Add function docs](https://github.com/sony/nnabla-rl/pull/9)
- [Fix docstring of algorithms](https://github.com/sony/nnabla-rl/pull/6)
- [Update NNablaRL to nnablaRL](https://github.com/sony/nnabla-rl/pull/5)
- [Fix typo seemless -> seamless](https://github.com/sony/nnabla-rl/pull/3)
- [Fix build badge URL](https://github.com/sony/nnabla-rl/pull/1)

Install the latest nnablaRL by:

pip install nnabla-rl

0.9.0

We are happy to announce the release of **nnablaRL**, a deep reinforcement learning (RL) library built on top of [nnabla](https://nnabla.org/).
Reinforcement learning is one of the cutting edge machine learning technology that [achieves super human performance](https://www.nature.com/articles/nature16961) in the field of gaming, robotics, etc..
We hope that this new library, **nnablaRL**, helps RL experts and also non-RL experts using reinforcement learning algorithms easily among our nnabla ecosystem.

Features of nnablaRL is the following.

**Friendly API**

nnablaRL has friendly Python APIs which enables to start training with only 3 lines of python code.

py
import nnabla_rl
import nnabla_rl.algorithms as A
from nnabla_rl.utils.reproductions import build_atari_env

env = build_atari_env("BreakoutNoFrameskip-v4") 1
dqn = A.DQN(env) 2
dqn.train(env) 3

You can also customize the algorithm's hyper parameters easily. For example, you can change the batch size of training data as follows.

python
import nnabla_rl
import nnabla_rl.algorithms as A
from nnabla_rl.utils.reproductions import build_atari_env

env = build_atari_env("BreakoutNoFrameskip-v4")
config = A.DQNConfig(batch_size=100)
dqn = A.DQN(env, config=config)
dqn.train(env)

In addition to algorithm hyper parameters, you can also flexibly change the training component such as neural network models and model solvers. For details, see [sample codes](https://github.com/sony/nnabla-rl/tree/master/examples) and [API documents](https://nnabla-rl.readthedocs.io).

**Many builtin algorithms**

Most of famous/SoTA deep reinforcement learning algorithms, such as DQN, SAC, BCQ, GAIL, etc., is already implemented in nnablaRL. Implemented algorithms are carefully tested and evaluated. You can easily start training your agent using these verified implementations.
Please check the sample codes and document for detail usage of each algorithm.
You can find the list of implemented algorithms [here](https://github.com/sony/nnabla-rl/tree/master/nnabla_rl/algorithms/README.md).

**Seemless switching of online and offline training**

In reinforcement learning, there are two main training procedures, online and offline, to train the agent. Online training is a training procedure that executes both data collection and network update alternately. Conversely, offline training is a training procedure that updates the network using only existing data. With nnablaRL, you can switch these two training procedures seemlessly. For example, as shown below, you can easily train a robot's controller online using simulated environment and finetune it offline with real robot dataset.

py
import nnabla_rl
import nnabla_rl.algorithms as A

simulator = get_simulator() This is just an example. Assuming that simulator exists
dqn = A.DQN(simulator, config=config)
dqn.train_online(simulator)

real_data = get_real_data() This is also an example. Assuming that you have real robot data
dqn.train_offline(real_data)

Getting started

You can find both notebook style interactive demos and raw python scripts as a sample code to get started. If you are unfamiliar with reinforcement learning, we recommend trying the notebook as a starting point. You can immediately launch and start training through google colaboratory! Check the list of notebooks [here](https://github.com/sony/nnabla-rl#getting-started).

Development of nnablaRL has just started. We will continue adding new reinforcement learning algorithms and SoTA techniques to nnablaRL. Feedbacks, feature requests and contributions are welcome! Check the [contribution guide](https://github.com/sony/nnabla-rl/tree/master/CONTRIBUTING.md) for details.

Nnabla-rl

Page 1 of 1