fixed gae calculation to deal with autoreset= true and false. The idea consists in passing the current values and the next values separately, and to make all the Tensor adaptations from the outside.
0.2.9
dict observation are now within the env_obs name
0.2.8.2
fixing plot_critic
0.2.8
fixed plot_critic
0.2.7
plot_critic now plots the max q value when no action is specified, this makes more sense than a random action