PyMAB (Python Multi-Armed Bandit) is an experimental framework for comparing multiple Multi-Armed Bandit algorithms and configurations. This alpha release provides researchers and practitioners with a flexible foundation for bandit-based experimentation and analysis. New algorithms and improvements coming soon.
Algorithms:
- Greedy Policy
- Epsilon-Greedy Policy
- UCB
- Bayesian UCB with Gaussian/Bernoulli distributions
- Thompson Sampling (Gaussian/Bernoulli variants)
- Contextual Bandits
Environment Types:
- Stationary
- Non-Stationary:
- Gradual distribution changes
- Abrupt distribution changes
- Random arm swapping
Known limitations:
- Some policies (Softmax, Gradient) pending implementation
- API may undergo changes based on feedback
- Lack of parallelisation and other optimisations