We are thrilled to introduce the mature release of [MO-Gymnasium](https://mo-gymnasium.farama.org/), a standardized API and collection of environments designed for Multi-Objective Reinforcement Learning (MORL).
MORL expands the capabilities of RL to scenarios where agents need to optimize multiple objectives, which may potentially conflict with each other. Each objective is represented by a distinct reward function. In this context, the agent learns to make trade-offs between these objectives based on a reward vector received after each step. For instance, in the well-known Mujoco halfcheetah environment, reward components are combined linearly using predefined weights as shown in the following code snippet from [Gymnasium](https://github.com/Farama-Foundation/Gymnasium/blob/main/gymnasium/envs/mujoco/half_cheetah_v4.py#LL201C9-L206C44):
python
ctrl_cost = self.control_cost(action)
forward_reward = self._forward_reward_weight * x_velocity
reward = forward_reward - ctrl_cost
With MORL, users have the flexibility to determine the compromises they desire based on their preferences for each objective. Consequently, the environments in MO-Gymnasium do not have predefined weights. Thus, MO-Gymnasium extends the capabilities of [Gymnasium](https://gymnasium.farama.org/) to the multi-objective setting, where the agents receives a vectorial reward.
For example, here is an illustration of the multiple policies learned by an MORL agent for the `mo-halfcheetah` domain, balancing between saving battery and speed:
<img src="https://github.com/Farama-Foundation/MO-Gymnasium/assets/11799929/10796cae-6f84-4690-8e17-d23f792c32c2" width=400 />
This release marks the first mature version of MO-Gymnasium within Farama, indicating that the API is stable, and we have achieved a high level of quality in this library.
API
python
import gymnasium as gym
import mo_gymnasium as mo_gym
import numpy as np
It follows the original Gymnasium API ...
env = mo_gym.make('minecart-v0')
obs, info = env.reset()
but vector_reward is a numpy array!
next_obs, vector_reward, terminated, truncated, info = env.step(your_agent.act(obs))
Optionally, you can scalarize the reward function with the LinearReward wrapper.
This allows to fall back to single objective RL