This version of BECCA sports a reworked reinforcement learning algorithm. There are significant changes to both the hub and arborkey. In its new form, only actions can be selected as goals. It used to be that features could be selected as goals as well, but, despite its theoretical appeal, this didn't help BECCA learn faster or perform better on its current task set. Curiosity has been implemented in the hub in a way that tries to walk the exploration/exploitation line a little more elegantly. The arborkey now includes restlessness, a propensity to act when the agent hasn't acted for a while. This helps the agent keep from getting stuck in long term ruts of poor performance.
The benchmark has grown to include another task, a version of the 1D grid task in which the reward is delayed by a small random number of time steps. This tests an agent's ability to handle a challenging credit assignment problem appropriately.
[See BECCA in action.](http://youtu.be/uU1_13c6umo?list=PLF861CC4C40439EEB)