Nanoppo

Latest version: v0.15.post2

Safety actively analyzes 723144 Python packages for vulnerabilities to keep your Python projects secure.

0.15

New Features
- Created actor/critic causal attention policy. (521f062)
- Added version number and custom learning rate scheduler to PPO agent. Modified `train_ppo_agent.py` to use the new scheduler.
- Added new version number to `__init__.py`.
- Modified `PPOAgent` class in `continuous_action_ppo.py` to accept an optional `lr_scheduler` argument.
- Added `cosine_lr_scheduler.py` to define custom learning rate scheduler. (36479a2)
- Added gradient and weight inf and nan check. (00abb4e)
- Added debug flag when detecting NAN in model parameters. (dd21ff4)
- Avoid policy loss:nan, entropy loss:-inf. Enhance the stability.
- set `torch.nn.utils.clip_grad_norm_` to use `max_norm=0.7`.
- Sanitizing LogProbs by replacing `-inf` log probabilities with large negative numbers. (af2c474)
- Modified check for stop reward and num cumulative rewards before saving best weights. (af434c1)
- Prepare to upgrade to version 0.15. (4f8f491)
- Update `train_ppo_agent.py` to use avg_reward instead of metrics for training. (28f9ab3)
- Added 'train_reward' to metrics dictionary in `train_agent` function. (273238b)
- Added Cosine LR scheduler and updated `PPOAgent` to use it, with iterative learning rate adjustment. (52833d9)
- Set placeholder optimizer in Cosine LR scheduler in `PPOAgent`. (7f805c9)
- Update python-version in `publish.yml` to include only 3.10 and 3.11. (c1aa7df)

0.14

New Features:
- **Training Control**: Added a `stop_reward` check within the `train_agent()` function. Training will now skip if the `best_reward` is greater than the `stop_reward`.
- **Gradient Clipping**: Introduced gradient clipping with `clip grad norm` to prevent updates that are too aggressive.
- **Device Configuration**: Enhanced the Proximal Policy Optimization (PPO) agent with added device parameters, facilitating both CPU and GPU configurations. This ensures the agent can now be run on a GPU for accelerated training.
- **Action Rescaling**: Initialized the feature to rescale actions, though it's currently disabled for stability concerns.

Enhancements:
- **Cumulative Reward**: Implemented the calculation of a rolling average cumulative reward.
- **Code Quality**: The entire source code and examples have been reformatted for consistency and readability using the `black` code formatter.

Miscellaneous:
- Various updates related to release configurations and changelog updates.

_Note: Always refer to the official documentation or repository for a more detailed breakdown of changes._

0.13

We are excited to announce the initial release of nanoPPO, version 0.13! This release lays the foundation for reinforcement learning practitioners, providing a lightweight and efficient implementation of the Proximal Policy Optimization (PPO) algorithm.

**Highlights:**

- **PPO Implementation:** Besides supporting discrete action spaces in v0.1, now supporting continuous action spaces in v0.13 for a wide range of applications.
- **Ease of Use:** Simple API to get started with PPO training quickly.
- **Examples Included:** Contains examples to help users understand how to train agents on various environments.
- **Custom Environments:** We create two environments: PointMass1D and PointMass2D for easy testing of the PPO agent training.
- **Test Suite:** Initial test suite to ensure code quality and functionality.

**Installation:**
You can install nanoPPO via PyPI:

bash
pip install nanoPPO

Or clone the repository and install from source:

bash
git clone https://github.com/jamesliu/nanoPPO.git
cd nanoPPO
pip install .

**Support & Contribution:**
We welcome feedback, issues, and contributions. Please refer to our [contribution guidelines](https://github.com/jamesliu/nanoPPO/blob/main/CONTRIBUTING.md) for more details.

Thank you for your interest in nanoPPO, and we look forward to hearing your feedback and seeing what you build with it!

0.12

We are excited to announce the initial release of nanoPPO, version 0.12! This release lays the foundation for reinforcement learning practitioners, providing a lightweight and efficient implementation of the Proximal Policy Optimization (PPO) algorithm.

**Highlights:**

- **PPO Implementation:** Besides supporting discrete action spaces in v0.1, now supporting continuous action spaces in v0.12 for a wide range of applications.
- **Ease of Use:** Simple API to get started with PPO training quickly.
- **Examples Included:** Contains examples to help users understand how to train agents on various environments.
- **Custom Environments:** We create two environments: PointMass1D and PointMass2D for easy testing of the PPO agent training.
- **Test Suite:** Initial test suite to ensure code quality and functionality.

**Installation:**
You can install nanoPPO via PyPI:

bash
pip install nanoPPO

Or clone the repository and install from source:

bash
git clone https://github.com/jamesliu/nanoPPO.git
cd nanoPPO
pip install .

**Support & Contribution:**
We welcome feedback, issues, and contributions. Please refer to our [contribution guidelines](https://github.com/jamesliu/nanoPPO/blob/main/CONTRIBUTING.md) for more details.

Thank you for your interest in nanoPPO, and we look forward to hearing your feedback and seeing what you build with it!

0.11

We are excited to announce the initial release of nanoPPO, version 0.11! This release lays the foundation for reinforcement learning practitioners, providing a lightweight and efficient implementation of the Proximal Policy Optimization (PPO) algorithm.

**Highlights:**

- **PPO Implementation:** Besides supporting discrete action spaces in v0.1, now supporting continuous action spaces in v0.11 for a wide range of applications.
- **Ease of Use:** Simple API to get started with PPO training quickly.
- **Examples Included:** Contains examples to help users understand how to train agents on various environments.
- **Custom Environments:** We create two environments: PointMass1D and PointMass2D for easy testing of the PPO agent training.
- **Test Suite:** Initial test suite to ensure code quality and functionality.

**Installation:**
You can install nanoPPO via PyPI:

bash
pip install nanoPPO

Or clone the repository and install from source:

bash
git clone https://github.com/jamesliu/nanoPPO.git
cd nanoPPO
pip install .

**Support & Contribution:**
We welcome feedback, issues, and contributions. Please refer to our [contribution guidelines](https://github.com/jamesliu/nanoPPO/blob/main/CONTRIBUTING.md) for more details.

Thank you for your interest in nanoPPO, and we look forward to hearing your feedback and seeing what you build with it!

Releases

Has known vulnerabilities