deepeval Changelog

0.20.68

For the latest release, DeepEval:

- Supports Hugging Face users by providing real-time evaluations during fine-tuning: https://docs.confident-ai.com/docs/integrations-huggingface
- Supports LlamaIndex users by allowing unit testing of LlamaIndex apps in CI/CD, and offer metrics in LlamaIndex's evaluators: https://docs.confident-ai.com/docs/integrations-llamaindex
- Improvements to accuracy and reliability in Faithfulness and Answer Relevancy
- Summarization Metric now offers explanation
- You can now use ANY LLM for evaluation: https://docs.confident-ai.com/docs/metrics-introduction#using-a-custom-llm

0.20.57

- LLM-Evals (LLM evaluated metrics) now support all of langchain's chat models.
- `LLMTestCase` now has `execution_time` and `cost`, useful for those looking to evaluate on these parameters
- `minimum_score` is now `threshold` instead, meaning you can now create custom metrics that either have a "minimum" or "maximum" threshold
- `LLMEvalMetric` is now `GEval`
- Llamaindex Tracing integration: (https://docs.llamaindex.ai/en/stable/module_guides/observability/observability.html#deepeval)

0.20.43

In this release:
- Faithfulness, Answer Relevancy, Contextual Relevancy, Contextual Precision, and Contextual Recall, all offer a reasoning for its given score.
- Azure OpenAI now supported via a single command in the CLI: https://docs.confident-ai.com/docs/metrics-introduction#using-azure-openai
- New Summarization Metric that uses the QAG framework for its implementation: https://docs.confident-ai.com/docs/metrics-summarization
- Pulling datasets from Confident AI now offers an intermediate step for additional data processing before evaluation: https://docs.confident-ai.com/docs/confident-ai-evaluate-datasets#pull-your-dataset-from-confident-ai
- Decoupled imports from `transformers`, `sentence_transformers`, and `pandas` to reduce package size

0.20.35

Lots of new features this release:

1. `JudgementalGPT` now allows for different languages - useful for our APAC and European friends
2. `RAGAS` metrics now supports all OpenAI models - useful for those running into context length issues
3. `LLMEvalMetric` now returns a reasoning for its score
4. `deepeval test run` now has hooks that call on test run completion
5. `evaluate` now displays `retrieval_context` for RAG evaluation
6. `RAGAS` metric now displays metric breakdown for all its distinct metrics

0.20.27

[Automatically integrated with Confident AI](https://app.confident-ai.com/) for continous evaluation throughout the lifetime of your LLM (app):

-log evaluation results and analyze metrics pass / fails
-compare and pick the optimal hyperparameters (eg. prompt templates, chunk size, models used, etc.) based on evaluation results
-debug evaluation results via LLM traces
-manage evaluation test cases / datasets in one place
-track events to identify live LLM responses in production
-add production events to existing evaluation datasets to strength evals over time

0.20.23

[Automatically integrated with Confident AI](https://app.confident-ai.com/) for continous evaluation throughout the lifetime of your LLM (app):

-log evaluation results and analyze metrics pass / fails
-compare and pick the optimal hyperparameters (eg. prompt templates, chunk size, models used, etc.) based on evaluation results
-debug evaluation results via LLM traces
-manage evaluation test cases / datasets in one place
-track events to identify live LLM responses in production
-add production events to existing evaluation datasets to strength evals over time

Deepeval

Page 3 of 7