Changes
- Add support of for AlphaZero.
- Add [PGX](https://github.com/sotetsuk/pgx) environments to be used with AlphaZero.
- Remove use of JAX callbacks for logging (previous implementation wasn't compatible with TPUs). Training loop now exits out of XLA execution to execute callbacks.
- Lots of refactoring/restructuring of codebase.
- Increase JAX dependency version.