This update changes the algorithm interface:
<table>
<tr>
<th>Before (v0.0.x)</th>
<th>After (v0.1.0)</th>
</tr>
<tr>
<td>
<pre><code>from rejax import PPO, PPOConfig
config = PPOConfig.create(**kwargs)
PPO.train(config, rng)</code></pre>
</td>
<td>
<pre><code>from rejax import PPO
ppo = PPO.create(**kwargs)
ppo.train(rng)</code></pre>
</td>
</tr>
</table>
Rationale:
1. It's simpler and more intuitive
2. Parameters and algorithm subroutines depend on each other (e.g. an algorithm that samples from a replay buffer also has the buffer's size as an HP). It makes sense to collect them in the same class to modularize the algorithm architecture.
3. We can eliminate a lot of boilerplate code by inheriting from mixins that have both parameters and subroutines.
What's Changed
* **Merged config and algorithm**
* New algorithm: [Implicit Quantile Networks](https://arxiv.org/abs/1806.06923) by Dabney et al, 2018
* New algorithm: [Parallelised Q Networks](https://arxiv.org/abs/2407.04811) by Gallici, Fellows et al, 2024
* Removed DDPG, as it is now a special case of TD3
* Added support for more than two critics to SAC and TD3
* Changed default hyperparameters (to be powers of 2 mostly)
* Change the name of hyperparameters: (`gradient_steps` -> `num_epochs`, `tau` -> `polyak`)
* Removed `rejax.evaluate.make_evaluate`, use `rejax.evaluate.evaluate` instead
* Moved `rejax.algos.networks` and `rejax.algos.buffers` to `rejax`
* New module: `rejax.compat` implements loading environments from different packages. Currently supports [gymnax](https://github.com/RobertTLange/gymnax), [brax](https://github.com/google/brax/), and [navix](https://github.com/epignatelli/navix/)
* Removed `rejax.brax2gymnax` (use the new `rejax.compat` instead)
**Full Changelog**: https://github.com/keraJLi/rejax/compare/v0.0.1...v0.1.0