Cleanrl

Latest version: v1.2.0

Safety actively analyzes 623586 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 5

2386.46

| Walker2DBulletEnv-v0 | 567.61 ± 15.01 | 2177.57 ± 65.49 | 1377.68 ± 51.96 |
| HalfCheetahBulletEnv-v0 | 2847.63 ± 212.31 | 2537.34 ± 347.20 | 2347.64 ± 51.56 |
| AntBulletEnv-v0 | 2094.62 ± 952.21 | 3253.93 ± 106.96 | 1775.50 ± 50.19 |
| HopperBulletEnv-v0 | 1262.70 ± 424.95 | 2271.89 ± 24.26 | 2311.20 ± 45.28 |
| HumanoidBulletEnv-v0 | -54.45 ± 13.99 | 937.37 ± 161.05 | 204.47 ± 1.00 |
| BipedalWalker-v3 | 66.01 ± 127.82 | 78.91 ± 232.51 | 272.08 ± 10.29 |
| LunarLanderContinuous-v2 | 162.96 ± 65.60 | 281.88 ± 0.91 | 215.27 ± 10.17 |
| Pendulum-v0 | -238.65 ± 14.13 | -345.29 ± 47.40 | -1255.62 ± 28.37 |
| MountainCarContinuous-v0 | -1.01 ± 0.01 | -1.12 ± 0.12 | 93.89 ± 0.06 |

Other Results

| gym_id | ppo | dqn |
|:---------------|:---------------|:----------------|
| CartPole-v1 | 500.00 ± 0.00 | 182.93 ± 47.82 |
| Acrobot-v1 | -80.10 ± 6.77 | -81.50 ± 4.72 |

1910.07207

My personal thanks to everyone who participated in the monthly dev cycle and, in particular, dosssman who implemented the SAC with discrete action spaces.

Additional improvement include
support gym.wrappers.Monitor to automatically record agent’s performance at certain episodes (default is 1, 2, 9, 28, 65, ... 1000, 2000, 3000) and integrate with wandb. (so cool, see screenshot below) 4
Use the same replay buffer from minimalRL for DQN and SAC 5

https://app.wandb.ai/cleanrl/cleanrl.benchmark

![image](https://user-images.githubusercontent.com/5555347/72108416-8f46ff00-3301-11ea-91d7-04c611f28ee7.png)

1812.05905

1801.01290

67.22

| PongNoFrameskip-v4 | 19.06 ± 0.83 | 18.00 ± 0.00 | 19.78 ± 0.22 | 20.72 ± 0.28 |
| BreakoutNoFrameskip-v4 | 364.97 ± 58.36 | 386.10 ± 21.77 | 353.39 ± 30.61 | 380.67 ± 35.29 |

Mujoco Results

| gym_id | ddpg_continuous_action | td3_continuous_action | ppo_continuous_action |
|:--------------------|:-------------------------|:------------------------|:------------------------|
| Reacher-v2 | -6.25 ± 0.54 | -6.65 ± 0.04 | -7.86 ± 1.47 |
| Pusher-v2 | -44.84 ± 5.54 | -59.69 ± 3.84 | -44.10 ± 6.49 |
| Thrower-v2 | -137.18 ± 47.98 | -80.75 ± 12.92 | -58.76 ± 1.42 |
| Striker-v2 | -193.43 ± 27.22 | -269.63 ± 22.14 | -112.03 ± 9.43 |

31.67

| HalfCheetah-v2 | 10386.46 ± 265.09 | 9265.25 ± 1290.73 | 1717.42 ± 20.25 |
| Hopper-v2 | 1128.75 ± 9.61 | 3095.89 ± 590.92 | 2276.30 ± 418.94 |
| Swimmer-v2 | 114.93 ± 29.09 | 103.89 ± 30.72 | 111.74 ± 7.06 |
| Walker2d-v2 | 1946.23 ± 223.65 | 3059.69 ± 1014.05 | 3142.06 ± 1041.17 |
| Ant-v2 | 243.25 ± 129.70 | 5586.91 ± 476.27 | 2785.98 ± 1265.03 |
| Humanoid-v2 | 877.90 ± 3.46 | 6342.99 ± 247.26 | 786.83 ± 95.66 |

Pybullet Results

| gym_id | ddpg_continuous_action | td3_continuous_action | ppo_continuous_action |
|:-----------------------------------|:-------------------------|:------------------------|:------------------------|
| MinitaurBulletEnv-v0 | -0.17 ± 0.02 | 7.73 ± 5.13 | 23.20 ± 2.23 |
| MinitaurBulletDuckEnv-v0 | -0.31 ± 0.03 | 0.88 ± 0.34 | 11.09 ± 1.50 |
| InvertedPendulumBulletEnv-v0 | 742.22 ± 47.33 | 1000.00 ± 0.00 | 1000.00 ± 0.00 |

Page 1 of 5

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.