New
- add reward calculation and plot methods in each bandit policy (38)
- add trivis ci using tox (44)
- Python3 support (44)
- generating documentation using sphinx and host it on readthedoc (44)
- BaseBandit.get_action_with_id (63)
Change
- refactor bandit algorithms to allow multiple actions and rewards (37)
- use Action object instead of action_id during policy initialization (37)
- use "expert advice probability vectors" instead of "scikit-learn models" as input for Exp4p (37)
- better simulation coding style (47, 51, 53, 56, 72, 73, 74, 75, 76, 78)
- better coding style (50, 62, 66, 79)
Fix
- fix the parameter updating bugs (query_vector calculation) in Exp4p (37)
- fix bugs in Exp4.P (66)
- remove generator in LinUCB (54)
- remove generator in Exp4.P (67)
- remove generator in LinTompSamp (80)