My personal thanks to everyone who participated in the monthly dev cycle and, in particular, dosssman who implemented the SAC with discrete action spaces.
Additional improvement include
support gym.wrappers.Monitor to automatically record agent’s performance at certain episodes (default is 1, 2, 9, 28, 65, ... 1000, 2000, 3000) and integrate with wandb. (so cool, see screenshot below) 4
Use the same replay buffer from minimalRL for DQN and SAC 5
https://app.wandb.ai/cleanrl/cleanrl.benchmark
![image](https://user-images.githubusercontent.com/5555347/72108416-8f46ff00-3301-11ea-91d7-04c611f28ee7.png)