Openrlbenchmark

Latest version: v0.2.0

Safety actively analyzes 622904 Python packages for vulnerabilities to keep your Python projects secure.

0.2.0

This release brings exciting new features. I am excited to share that `openrlbenchmark` now includes direct integration with https://github.com/google-research/rliable. The new release also supports offline mode and plotting with multi metrics.

rliable integration

python -m openrlbenchmark.rlops \
--filters '?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return' \
'ppo_continuous_action?tag=v1.0.0-27-gde3f410&cl=CleanRL PPO' \
--filters '?we=openrlbenchmark&wpn=baselines&ceik=env&cen=exp_name&metric=charts/episodic_return' \
'baselines-ppo2-mlp?cl=openai/baselines PPO2' \
--env-ids HalfCheetah-v2 Hopper-v2 Walker2d-v2 \
--env-ids HalfCheetah-v2 Hopper-v2 Walker2d-v2 \
--no-check-empty-runs \
--pc.ncols 3 \
--pc.ncols-legend 3 \
--rliable \
--rc.score_normalization_method maxmin \
--rc.normalized_score_threshold 1.0 \
--rc.sample_efficiency_plots \
--rc.sample_efficiency_and_walltime_efficiency_method Median \
--rc.performance_profile_plots \
--rc.aggregate_metrics_plots \
--rc.sample_efficiency_num_bootstrap_reps 10 \
--rc.performance_profile_num_bootstrap_reps 10 \
--rc.interval_estimates_num_bootstrap_reps 10 \
--output-filename compare \
--scan-history

now yields

![](https://raw.githubusercontent.com/openrlbenchmark/openrlbenchmark/b9411061b5131a4209ac708c7ffbd20064f1c50a/static/baseline_vs_cleanrl.png)
![](https://raw.githubusercontent.com/openrlbenchmark/openrlbenchmark/b9411061b5131a4209ac708c7ffbd20064f1c50a/static/baseline_vs_cleanrl-time.png)
![](https://raw.githubusercontent.com/openrlbenchmark/openrlbenchmark/b9411061b5131a4209ac708c7ffbd20064f1c50a/static/baseline_vs_cleanrl_sample_walltime_efficiency.png)
![](https://raw.githubusercontent.com/openrlbenchmark/openrlbenchmark/b9411061b5131a4209ac708c7ffbd20064f1c50a/static/baseline_vs_cleanrl_sample_efficiency.png)
![](https://raw.githubusercontent.com/openrlbenchmark/openrlbenchmark/b9411061b5131a4209ac708c7ffbd20064f1c50a/static/baseline_vs_cleanrl_performance_profile.png)
![](https://raw.githubusercontent.com/openrlbenchmark/openrlbenchmark/b9411061b5131a4209ac708c7ffbd20064f1c50a/static/baseline_vs_cleanrl_aggregate.png)

CC rliable contributors agarwl qgallouedec DennisSoemers lkevinzc.

Offline mode

We introduced an experimental **offline** mode. Sometimes even with caching `--scan-history` the script can still take a long time if there are too many environments or experiments. This is because we are still calling many `wandb.Api().runs(..., filters)` under the hood.

No worries though. When running with `--scan-history`, we also automatically build a local `sqlite` database to store the metadata of runs. Then, you can run `python -m openrlbenchmark.rlops ... --scan-history --offline` to generate the plots without having access to the internet. It should considerably speed up the plotting process as well. We are still working on improving the offline mode, so please let us know if you encounter any issues.

Multi metrics

**Experimental! API may change.**

Thanks to ffelten, we can now plot multi metrics in the same plot now.

shell
python -m openrlbenchmark.rlops_multi_metrics \
--filters '?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metrics=charts/episodic_return&metrics=charts/episodic_length&metrics=charts/SPS&metrics=losses/actor_loss&metrics=losses/qf1_values&metrics=losses/qf1_loss' \
'ddpg_continuous_action?tag=pr-371' \
'ddpg_continuous_action?tag=pr-299' \
'ddpg_continuous_action?tag=rlops-pilot' \
'ddpg_continuous_action_jax?tag=pr-371-jax' \
'ddpg_continuous_action_jax?tag=pr-298' \
--env-ids HalfCheetah-v2 Hopper-v2 Walker2d-v2 \
--no-check-empty-runs \
--pc.ncols 3 \
--pc.ncols-legend 2 \
--output-filename static/multi-metrics \
--scan-history --offline

![](https://raw.githubusercontent.com/openrlbenchmark/openrlbenchmark/b9411061b5131a4209ac708c7ffbd20064f1c50a/static/multi-metrics.png)

What's Changed
* Fix MuJoCo plots by vwxyzjn in https://github.com/openrlbenchmark/openrlbenchmark/pull/4
* New plotting API by vwxyzjn in https://github.com/openrlbenchmark/openrlbenchmark/pull/5
* Rlops API by vwxyzjn in https://github.com/openrlbenchmark/openrlbenchmark/pull/6
* Glow up README by qgallouedec in https://github.com/openrlbenchmark/openrlbenchmark/pull/8
* Add MORL Baselines to README by ffelten in https://github.com/openrlbenchmark/openrlbenchmark/pull/10
* Add rlops for plotting human normalized scores by vwxyzjn in https://github.com/openrlbenchmark/openrlbenchmark/pull/11
* use tyro for argparse by vwxyzjn in https://github.com/openrlbenchmark/openrlbenchmark/pull/12
* Various refactor and features by vwxyzjn in https://github.com/openrlbenchmark/openrlbenchmark/pull/13
* Remove unused `tyro.conf.OmitSubcommandPrefixes` config by vwxyzjn in https://github.com/openrlbenchmark/openrlbenchmark/pull/16
* Refactor by vwxyzjn in https://github.com/openrlbenchmark/openrlbenchmark/pull/21
* Create citation by vwxyzjn in https://github.com/openrlbenchmark/openrlbenchmark/pull/17
* General rliable support by vwxyzjn in https://github.com/openrlbenchmark/openrlbenchmark/pull/22
* [WIP] Multi metric support by ffelten in https://github.com/openrlbenchmark/openrlbenchmark/pull/23

New Contributors
* qgallouedec made their first contribution in https://github.com/openrlbenchmark/openrlbenchmark/pull/8
* ffelten made their first contribution in https://github.com/openrlbenchmark/openrlbenchmark/pull/10

**Full Changelog**: https://github.com/openrlbenchmark/openrlbenchmark/compare/v0.0.1...v0.2.0

0.1.1b3

This release brings better table printing:

python -m openrlbenchmark.rlops \
--filters '?we=openrlbenchmark&wpn=baselines&ceik=env&cen=exp_name&metric=charts/episodic_return' 'baselines-ppo2-mlp' \
--filters '?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return' 'ppo_continuous_action?tag=v1.0.0-27-gde3f410' \
--filters '?we=openrlbenchmark&wpn=jaxrl&ceik=env_name&cen=algo&metric=training/return' 'sac' \
--env-ids HalfCheetah-v2 Walker2d-v2 Hopper-v2 InvertedPendulum-v2 Humanoid-v2 Pusher-v2 \
--check-empty-runs False \
--ncols 3 \
--ncols-legend 3 \
--output-filename compare \
--scan-history

<img width="1120" alt="image" src="https://user-images.githubusercontent.com/5555347/228295452-0868ed51-5743-459b-ab84-d2a50c030000.png">

0.1.1b2

What's Changed

The v0.1.1b2 supports customing the legend via the `cl` query string, customizing the figure labels with `--xlabel` and `--ylabel`. Additionally, the default line name in the legend now prefixes with the wandb project and entity. For example, `a2c` from SB3 will be prefixed with `openrlbenchmark/sb3/a2c`.

python -m openrlbenchmark.rlops \
--filters '?we=openrlbenchmark&wpn=sb3&ceik=env&cen=algo&metric=rollout/ep_rew_mean' \
'a2c' \
'ddpg' \
'ppo_lstm?cl=PPO w/ LSTM' \
'sac' \
'td3' \
'ppo' \
'trpo' \
--filters '?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return' \
'sac_continuous_action?tag=rlops-pilot&cl=SAC' \
--env-ids HalfCheetahBulletEnv-v0 \
--ncols 1 \
--ncols-legend 2 \
--xlabel 'Training Steps' \
--ylabel 'Episodic Return' \
--output-filename compare

generates

| cleanrl vs. Stable Baselines 3 | cleanrl vs. Stable Baselines 3 (Time) |
|:----------------------------------:|:----------------------------------------:|
| ![](https://github.com/openrlbenchmark/openrlbenchmark/raw/3763ff723c62d3f4d82d2d5bcdf25901bb8d3606/static/cleanrl_vs_sb3.png) | ![](https://github.com/openrlbenchmark/openrlbenchmark/raw/3763ff723c62d3f4d82d2d5bcdf25901bb8d3606/static/cleanrl_vs_sb3-time.png) |

* Glow up README by qgallouedec in https://github.com/openrlbenchmark/openrlbenchmark/pull/8

New Contributors
* qgallouedec made their first contribution in https://github.com/openrlbenchmark/openrlbenchmark/pull/8

**Full Changelog**: https://github.com/openrlbenchmark/openrlbenchmark/compare/v0.1.1b0...v0.1.1b2

0.1.1b0

Excited to announce our first beta release of `openrlbenchmark`, a tool to help you grab metrics from popular RL libraries, such as SB3, CleanRL, baselines, Tianshou, etc.

Here is an example snippet. You can open it at [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openrlbenchmark/openrlbenchmark/blob/master/README.ipynb)

bash
pip install openrlbenchmark
python -m openrlbenchmark.rlops \
--filters '?we=openrlbenchmark&wpn=sb3&ceik=env&cen=algo&metric=rollout/ep_rew_mean' \
'a2c' \
'ddpg' \
'ppo_lstm' \
'sac' \
'td3' \
'ppo' \
'trpo' \
--filters '?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return' \
'sac_continuous_action?tag=rlops-pilot' \
--env-ids HalfCheetahBulletEnv-v0 \
--ncols 1 \
--ncols-legend 2 \
--output-filename compare.png \
--report

which generates
![](https://github.com/openrlbenchmark/openrlbenchmark/blob/main/static/cleanrl_vs_sb3.png)

What happened?

The idea is to use filters-like syntax to grab the metrics of interest. Here, we created multiple filters. The first string in the first filter is `'?we=openrlbenchmark&wpn=sb3&ceik=env&cen=algo&metric=rollout/ep_rew_mean'`, which is a query string that specifies the following:

* `we`: the W&B entity name
* `wpn`: the W&B project name
* `ceik`: the custom key for the environment id
* `cen`: the custom key for the experiment name
* `metric`: the metric we are interested in

So we are fetching metrics from [https://wandb.ai/openrlbenchmark/sb3](https://wandb.ai/openrlbenchmark/sb3). The environment id is stored in the `env` key, and the experiment name is stored in the `algo` key. The metric we are interested in is `rollout/ep_rew_mean`.

Similarly, we are fetching metrics from [https://wandb.ai/openrlbenchmark/cleanrl](https://wandb.ai/openrlbenchmark/cleanrl). The environment id is stored in the `env_id` key, and the experiment name is stored in the `exp_name` key. The metric we are interested in is `charts/episodic_return`.

More docs and examples

**See more examples and docs at https://github.com/openrlbenchmark/openrlbenchmark**

Notably, https://github.com/openrlbenchmark/openrlbenchmark#currently-supported-libraries has the list of supported libraries, and below are some examples of the plots.

![image](https://user-images.githubusercontent.com/5555347/210252230-ede0e277-06f0-4c28-8ea5-72c31b12d441.png)

![image](https://user-images.githubusercontent.com/5555347/210252272-de93b00f-1b18-44ab-b609-a782c3d0c809.png)

More info

Please check out the following links for more info.

* 💾 [GitHub Repo](https://github.com/openrlbenchmark/openrlbenchmark): source code and more docs.
* 📜 [Design docs](https://docs.google.com/document/d/1cDI_AMr2QVmkC53dCHFMYwGJtLC8V4p6KdL2wnYPaiI/edit?usp=sharing): our motivation and vision.
* 🔗 [Open RL Benchmark reports](https://wandb.ai/openrlbenchmark/openrlbenchmark/reportlist): W&B reports with tracked Atari, MuJoCo experiments from SB3, CleanRL, and others.

What's going on right now?

This is a project we are slowly working on. There is no specific timeline or roadmap, but if you want to get involved. Feel free to reach out to me or open an issue. We are looking for volunteers to help us with the following:

* Add experiments from other libraries
* Run more experiments from currently supported libraries
* Documentation and designing standards
* Download the tensorboard metrics from the tracked experiments and load them locally to save time

0.0.1

What's Changed
* [WIP] Prototype plotting utility by vwxyzjn in https://github.com/openrlbenchmark/openrlbenchmark/pull/2
* Plot by vwxyzjn in https://github.com/openrlbenchmark/openrlbenchmark/pull/3

New Contributors
* vwxyzjn made their first contribution in https://github.com/openrlbenchmark/openrlbenchmark/pull/2

**Full Changelog**: https://github.com/openrlbenchmark/openrlbenchmark/commits/v0.0.1

Releases

Has known vulnerabilities