rewardbench` CLI can be run on any instruction dataset with fancy logging of scores.
This makes it so `rewardbench` can be used to quickly throw together a rejection sampling pipeline once give generations.
Specifically, I think this type of logging is **_really great_** for evaluation. It’s something wandb does for training, but when using the CLI, you pass one arg that will save:
* All the scores, input text, etc to HuggingFace
* The command used to launch the eval
* The current python env for reproducibility
Examples are in the readme: https://github.com/allenai/reward-bench?tab=readme-ov-file#logging
What's Changed
* Clean, minor fixes, and release 0.1.2 by natolambert in https://github.com/allenai/reward-bench/pull/139
* Fix DPO prompts by natolambert in https://github.com/allenai/reward-bench/pull/142
* New super secret models by natolambert in https://github.com/allenai/reward-bench/pull/141
* Minor fixes, new dockerfile, new models by natolambert in https://github.com/allenai/reward-bench/pull/144
* Fix llama3 quantization for DPO models by natolambert in https://github.com/allenai/reward-bench/pull/145
* Fix small bugs by natolambert in https://github.com/allenai/reward-bench/pull/148
* Add GRM classes by YangRui2015 in https://github.com/allenai/reward-bench/pull/151
* New models + dockerfile by natolambert in https://github.com/allenai/reward-bench/pull/152
* Add Claude 3.5 Sonnet by natolambert in https://github.com/allenai/reward-bench/pull/153
* fix padding for GRM class by YangRui2015 in https://github.com/allenai/reward-bench/pull/154
* Add bfloat16 support natively by natolambert in https://github.com/allenai/reward-bench/pull/155
* Add generative models by natolambert in https://github.com/allenai/reward-bench/pull/156
* Add InternLM2 RMs by natolambert in https://github.com/allenai/reward-bench/pull/157
* Bump generative models by natolambert in https://github.com/allenai/reward-bench/pull/160
* added offsetbias execute prompt and judgement process code by sanghyuk-choi in https://github.com/allenai/reward-bench/pull/159
* small gen pr by natolambert in https://github.com/allenai/reward-bench/pull/161
* Bos fix by natolambert in https://github.com/allenai/reward-bench/pull/166
* Add automatic Beaker Images by natolambert in https://github.com/allenai/reward-bench/pull/167
* Small bumps by natolambert in https://github.com/allenai/reward-bench/pull/168
* Add attn_implementation support by chrisliu298 in https://github.com/allenai/reward-bench/pull/170
* Fixes in run_generative, new models by natolambert in https://github.com/allenai/reward-bench/pull/171
* fix vllm version by natolambert in https://github.com/allenai/reward-bench/pull/172
* Delete training by natolambert in https://github.com/allenai/reward-bench/pull/174
* Mirror change from leaderboard by natolambert in https://github.com/allenai/reward-bench/pull/175
* Add models by natolambert in https://github.com/allenai/reward-bench/pull/179
* Add o1 and other model by natolambert in https://github.com/allenai/reward-bench/pull/181
* Support loading model from wandb by vwxyzjn in https://github.com/allenai/reward-bench/pull/184
* add_con-j_support_code by YeZiyi1998 in https://github.com/allenai/reward-bench/pull/183
* Bump requirements and generative improvements by natolambert in https://github.com/allenai/reward-bench/pull/190
* Support upload metadata to hf by vwxyzjn in https://github.com/allenai/reward-bench/pull/188
* Bump Cuda version by natolambert in https://github.com/allenai/reward-bench/pull/191
* Typo and VLLM generalization by natolambert in https://github.com/allenai/reward-bench/pull/192
* Add better logging and functionality with instructions to CLI by natolambert in https://github.com/allenai/reward-bench/pull/193
* Tweak ArmorRM implementation, add args to CLI by natolambert in https://github.com/allenai/reward-bench/pull/194
New Contributors
* YangRui2015 made their first contribution in https://github.com/allenai/reward-bench/pull/151
* sanghyuk-choi made their first contribution in https://github.com/allenai/reward-bench/pull/159
* chrisliu298 made their first contribution in https://github.com/allenai/reward-bench/pull/170
* vwxyzjn made their first contribution in https://github.com/allenai/reward-bench/pull/184
* YeZiyi1998 made their first contribution in https://github.com/allenai/reward-bench/pull/183
**Full Changelog**: https://github.com/allenai/reward-bench/compare/v0.1.2...v0.1.3