New Features
- Added new policy distribution `ReparameterizedBetaPolicyDistribution`. It's a reparameterized version of the Beta distribution (who could have guessed) that predicts the mean and spread of the PDF instead of alpha and beta. This allows the spread to be controlled without input dependency, leading to more stability.
- Improved customizability of the `Gatherer` class. `Gatherer` now has a `.postprocess()` method called during data collection to postprocess data collected in the data buffer. The default buffer only normalizes advantages via this method, but custom gatherers can apply more postprocessing and even add or filter data to/from the buffer.
Monitoring
- New group view in the web monitor. Experiments now can be assigned an optional group name. The new view can investigate the mean reward progression of grouped experiments. More functionality on this will follow in future updates.
- Better filtering of experiments in the web monitor.
- The hyperparameter view now also shows Gatherer information
- Improved robustness against corrupted JSON files.
Other Changes
- Upgraded from TensorFlow `2.4.2` to `2.9.1`. Should at this point still be backward-compatible though.
- Throughout optimization, several assertions have been added to simplify debugging when facing NaN/Inf values.
- Added new unit tests.