Deepeval

Latest version: v2.6.5

Safety actively analyzes 722861 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 7

2.6.5

What's New 🔥

- Migrated default provider models to support Synthesizer
- Default model providers are now in a different directory, those that are using `deepeval` < 2.5.6 might need to update imports

2.5.9

What's New 🔥

- Custom prompt template overriding for all RAG metrics. This was introduced for folks using weaker models for evaluation, or just models in general that don't fit too well with OpenAI's prompt formatting, which is what most of `deepeval`'s metrics are built around. You can still use your favorite metrics and algorithms, but now with a custom template if required. Example here: https://docs.confident-ai.com/docs/metrics-answer-relevancy#customize-your-template
- Fixes to our model providers. Now more stable and usable.
- Including `save_as()` for datasets to save test cases as well: https://docs.confident-ai.com/docs/evaluation-datasets#save-your-dataset
- Bug fixes for `Synthesizer`
- Improvements to prompt templates of `DAGMetric`: https://docs.confident-ai.com/docs/metrics-dag

2.3.9

🥳 Latest feature to allow users to inject the Faithfulness metric with their custom template. Most suited for custom LLMs where text data is highly formatted by data engineers and stored in databases according to different categories.

2.2.7

Here are the new features we're bringing to you in the latest release:
💥 Releasing beta for *Deep, Acyclic, Graph*. A new deterministic way in deepeval to build decision trees for deterministic outputs for LLM evaluation: https://docs.confident-ai.com/docs/metrics-dag
⚙️ Open-sourcing all LLM red teaming vulnerabilities: https://docs.confident-ai.com/docs/red-teaming-introduction
🪄 Fixes to synthetic dataset generation pipeline

2.0

Here are the new features we're bringing to you in the latest release:
⚙️ Automated LLM red teaming, aka. vulnerability and security safety scanning. You can now scan for over 40+ vulnerabilities using 10+ SOTA attack enhancement techniques in <10 lines of python code.
🪄 Synthetic dataset generation with a highly customizable synthetic data generation pipeline to cover literally any use case.
🖼️ Multi-modal LLM evaluation - perfect for an image editing or text-image use cases.
💬 Conversational evaluation - perfect for evaluating LLM chatbots.
💥 More LLM system metrics: Prompt Alignment (to determine whether your LLM is able to follow instructions specified in your prompt template), Tool Correctness (for agents), and Json Correctness (to validate if LLM outputs conform to your desired schema)

1.4.7

In DeepEval 1.4.7, we're releasing:
- LLM red teaming. Safety test your LLM application for 40+ vulnerabilities with 10+ attack enhancements, docs here: https://docs.confident-ai.com/docs/red-teaming-introduction
- Improved synthetic data synthesizer, much more functionality and customizbility: https://docs.confident-ai.com/docs/evaluation-datasets-synthetic-data
- Conversational metrics: Dedicated metrics to evaluate LLM turns
- Multi-modal metrics: Image editing and text to image evaluation

Page 1 of 7

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.