Added
- Support for Azure OpenAI models! These can now be benchmarked as with any other
model, where either the environment variables `AZURE_OPENAI_API_KEY`,
`AZURE_OPENAI_ENDPOINT` and `AZURE_OPENAI_API_VERSION` need to have been set, or
alternatively through the `--azure-openai-api-key`, `--azure-openai-endpoint` and
`--azure-openai-api-version` arguments. Thanks to
[BramVanroy](https://github.com/BramVanroy) for all the help regarding the
implementation of this :tada:
- We now use the new JSON mode for newer OpenAI models for the NER task, to ensure
better JSON generation.
- If an error is thrown during generation with an OpenAI model, which for instance
happens when the prompt is caught by the content filter, then we simply return a
blank string instead.
Changed
- Updated `outlines` dependency to v0.0.37, which can now correctly deal with a larger
batch size when integrated with vLLM. This results in faster NER evaluation.
Fixed
- Move models to the device before running any inference with it, as this causes issues
when flash attention is enabled.
- When benchmarking instruction tuned models, we now ensure that generation stops when
the end-of-chat token is reached (such as `<|im_end|>` and `[/INST]`). This had a
negative performance impact on question answering and summarization, but the
remaining tasks were not affected.