Benchllm

Latest version: v0.3.0

Safety actively analyzes 722460 Python packages for vulnerabilities to keep your Python projects secure.

0.3.0

Mock calls
You can now mock functions that your chain or agent might be calling:

yml
input: I live in London, can I expect rain today?
expected: ["no"]
calls:
- name: forecast.get_n_day_weather_forecast
returns: It's sunny in London.
arguments:
location: London
num_days: 1

This will replace `get_n_day_weather_forecast` in `forecast` with a mocked function always returning `It's sunny in London.`
See `examples/weather_functions` for some examples.

Embedding Distance
New evalautor `EmbeddingEvaluator`, embeds both the model output and the expected values, and compare the cosine distance.
Currently the threshold is hardcoded and set to 0.9 but will be dynamic in the future.

bash
$ bench run . --evaluator embedding

Scoring
Evaluators now return `List[Evaluator.Candidate]` instead of `Optional[Evaluator.Match]`, this lets us inspect the score (for example cosine distance) for failed evaluations.

This is incompatible with the old caching format.

Multiple test functions in the same file
You can now have multiple `benchllm.test` in the same python file, the function name is also now shown in the benchllm output.

python
import benchllm

def my_model(input, model):
implementation

benchllm.test(suite=".")
def gpt_3_5(input: ChatInput):
return my_model(input)

benchllm.test(suite=".")
def gpt_4(input: ChatInput):
return my_model(input, model="gpt-4")

0.2.0

Caching
Added two new Evaluators for handling caching, `MemoryCache` and `FileCache`.

Using the API:

python
evaluator = MemoryCache(SemanticEvaluator())

Using the CLI:

bash
$ bench run --cache memory or `file` or `none`

For commandline caching is on by default.

Match object
Changed the signature of `evaluate_prediction` from `bool` to return `Optional[Evaluator.Match]`
Match carries information about which of the `expected` values matched with the `output` from the tested model.
This will likely be extended in the next release with even more information.

0.1.0

BenchLLM is now open source!

Releases

Has known vulnerabilities