Excited to announce our first beta release of `openrlbenchmark`, a tool to help you grab metrics from popular RL libraries, such as SB3, CleanRL, baselines, Tianshou, etc.
Here is an example snippet. You can open it at [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openrlbenchmark/openrlbenchmark/blob/master/README.ipynb)
bash
pip install openrlbenchmark
python -m openrlbenchmark.rlops \
--filters '?we=openrlbenchmark&wpn=sb3&ceik=env&cen=algo&metric=rollout/ep_rew_mean' \
'a2c' \
'ddpg' \
'ppo_lstm' \
'sac' \
'td3' \
'ppo' \
'trpo' \
--filters '?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return' \
'sac_continuous_action?tag=rlops-pilot' \
--env-ids HalfCheetahBulletEnv-v0 \
--ncols 1 \
--ncols-legend 2 \
--output-filename compare.png \
--report
which generates
![](https://github.com/openrlbenchmark/openrlbenchmark/blob/main/static/cleanrl_vs_sb3.png)
What happened?
The idea is to use filters-like syntax to grab the metrics of interest. Here, we created multiple filters. The first string in the first filter is `'?we=openrlbenchmark&wpn=sb3&ceik=env&cen=algo&metric=rollout/ep_rew_mean'`, which is a query string that specifies the following:
* `we`: the W&B entity name
* `wpn`: the W&B project name
* `ceik`: the custom key for the environment id
* `cen`: the custom key for the experiment name
* `metric`: the metric we are interested in
So we are fetching metrics from [https://wandb.ai/openrlbenchmark/sb3](https://wandb.ai/openrlbenchmark/sb3). The environment id is stored in the `env` key, and the experiment name is stored in the `algo` key. The metric we are interested in is `rollout/ep_rew_mean`.
Similarly, we are fetching metrics from [https://wandb.ai/openrlbenchmark/cleanrl](https://wandb.ai/openrlbenchmark/cleanrl). The environment id is stored in the `env_id` key, and the experiment name is stored in the `exp_name` key. The metric we are interested in is `charts/episodic_return`.
More docs and examples
**See more examples and docs at https://github.com/openrlbenchmark/openrlbenchmark**
Notably, https://github.com/openrlbenchmark/openrlbenchmark#currently-supported-libraries has the list of supported libraries, and below are some examples of the plots.
![image](https://user-images.githubusercontent.com/5555347/210252230-ede0e277-06f0-4c28-8ea5-72c31b12d441.png)
![image](https://user-images.githubusercontent.com/5555347/210252272-de93b00f-1b18-44ab-b609-a782c3d0c809.png)
More info
Please check out the following links for more info.
* 💾 [GitHub Repo](https://github.com/openrlbenchmark/openrlbenchmark): source code and more docs.
* 📜 [Design docs](https://docs.google.com/document/d/1cDI_AMr2QVmkC53dCHFMYwGJtLC8V4p6KdL2wnYPaiI/edit?usp=sharing): our motivation and vision.
* 🔗 [Open RL Benchmark reports](https://wandb.ai/openrlbenchmark/openrlbenchmark/reportlist): W&B reports with tracked Atari, MuJoCo experiments from SB3, CleanRL, and others.
What's going on right now?
This is a project we are slowly working on. There is no specific timeline or roadmap, but if you want to get involved. Feel free to reach out to me or open an issue. We are looking for volunteers to help us with the following:
* Add experiments from other libraries
* Run more experiments from currently supported libraries
* Documentation and designing standards
* Download the tensorboard metrics from the tracked experiments and load them locally to save time