Deepeval

Latest version: v2.0.1

Safety actively analyzes 685525 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 6

2.0

Here are the new features we're bringing to you in the latest release:
⚙️ Automated LLM red teaming, aka. vulnerability and security safety scanning. You can now scan for over 40+ vulnerabilities using 10+ SOTA attack enhancement techniques in <10 lines of python code.
🪄 Synthetic dataset generation with a highly customizable synthetic data generation pipeline to cover literally any use case.
🖼️ Multi-modal LLM evaluation - perfect for an image editing or text-image use cases.
💬 Conversational evaluation - perfect for evaluating LLM chatbots.
💥 More LLM system metrics: Prompt Alignment (to determine whether your LLM is able to follow instructions specified in your prompt template), Tool Correctness (for agents), and Json Correctness (to validate if LLM outputs conform to your desired schema)

1.4.7

In DeepEval 1.4.7, we're releasing:
- LLM red teaming. Safety test your LLM application for 40+ vulnerabilities with 10+ attack enhancements, docs here: https://docs.confident-ai.com/docs/red-teaming-introduction
- Improved synthetic data synthesizer, much more functionality and customizbility: https://docs.confident-ai.com/docs/evaluation-datasets-synthetic-data
- Conversational metrics: Dedicated metrics to evaluate LLM turns
- Multi-modal metrics: Image editing and text to image evaluation

0.21.74

In DeepEval v0.21.74, we have:
- Agnetic evaluation metric to evaluate tool calling correctness for LLM agents: https://docs.confident-ai.com/docs/metrics-tool-correctness
- Pydantic Schemas to enforce JSON outputs for custom, smaller LLMs: https://docs.confident-ai.com/docs/guides-using-custom-llms
- Asynchronous support for synthetic data generation: https://docs.confident-ai.com/docs/evaluation-datasets-synthetic-data
- Tracing integration for LLamaIndex and LangChain: https://docs.confident-ai.com/docs/confident-ai-tracing

0.21.62

In DeepEval v0.21.62, we:
- added an option to print out intermediate steps during metric execution, which can be configured via the `verbose_mode` parameter: https://docs.confident-ai.com/docs/metrics-answer-relevancy#example
- hyperparameters can be logged to Confident AI via the evaluate() function: https://docs.confident-ai.com/docs/getting-started#optimizing-hyperparameters
- Synthetic data generation now gives more realistic results and is more customizable: https://docs.confident-ai.com/docs/evaluation-datasets-synthetic-data

0.21.15

For deepeval's latest release v0.21.15, we release:
- Synthetic Data generation. Generate synthetic data from documents easily: https://docs.confident-ai.com/docs/evaluation-datasets-synthetic-data
- caching. If you're running 10k test cases and it fails at the 9999th test case, you no longer have to rerun the first 9999 test case as you can just read from cache using the `-c` flag: https://docs.confident-ai.com/docs/evaluation-introduction#cache
- repeats. If you want to repeat each test case for statistical significant, use the `-r` flag: https://docs.confident-ai.com/docs/evaluation-introduction#repeats
- LLM Benchmarks. Supporting popular benchmarks such as MMLU, HellaSwag, and BIG-BH so anyone can evaluate ANY model on research backed benchmarks in a few lines of code.
- G-Eval improvements. The G-Eval metric now supports using logprobs of tokens to find the weighted summed score.

0.20.85

- asynchronous support throughout deepeval, and no longer using threads. Users can also call individual metrics asynchronously: https://docs.confident-ai.com/docs/metrics-introduction#measuring-metrics-in-async
- improved the way in which you create a custom LLM for evaluation. You'll now have to implement an asynchronous generate() method to use deepeval's async features: https://docs.confident-ai.com/docs/metrics-introduction#using-a-custom-llm
- strict mode for all metrics!
- improve `evaluate()` function for more customizability: https://docs.confident-ai.com/docs/evaluation-introduction#evaluating-without-pytest

Page 1 of 6

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.