In this release, RAGElo as a library was completely revamped, with a much easier to use unified interface, simpler to use commands (`evaluate` and `batch_evaluate`). Now using an Evaluator is a simple as calling `evaluator.evaluate("query", "document")`.
Custom Evaluators and metadata support
Not a fan of the existing evaluators? Now both Retrieval and Answer evaluators support fully custom promptings using the `RetrievalEvaluator.CustomPromptEvaluator` and `AnswerEvaluator.CustomPromptEvaluator`, respectively.
As part of the custom evaluators, now RAGElo also supports custom metadata injection into your prompts! Want to include the current timestamp into your evaluator? Add a `{today_date}` placeholder to the prompt and pass it as a metadata to the `evaluate` method:
python
from ragelo import get_retrieval_evaluator
prompt = """You are a helpful assistant for evaluating the relevance of a retrieved document to a user query.
You should pay extra attention to how **recent** a document is. A document older than 5 years is considered outdated.
The answer should be evaluated according tot its recency, truthfulness, and relevance to the user query.
User query: {q}
Retrieved document: {d}
The document has a date of {document_date}.
Today is {today_date}.
WRITE YOUR ANSWER ON A SINGLE LINE AS A JSON OBJECT WITH THE FOLLOWING KEYS:
- "relevance": 0 if the document is irrelevant, 1 if it is relevant.
- "recency": 0 if the document is outdated, 1 if it is recent.
- "truthfulness": 0 if the document is false, 1 if it is true.
- "reasoning": A short explanation of why you think the document is relevant or irrelevant.
"""
evaluator = get_retrieval_evaluator(
"custom_prompt", name of the retrieval evaluator
llm_provider="openai", Which LLM provider to use
prompt=prompt, your custom prompt
query_placeholder="q", the placeholder for the query in the prompt
document_placeholder="d", the placeholder for the document in the prompt
answer_format="multi_field_json", The format of the answer. In this case, a JSON object with multiple fields
scoring_keys=["relevance", "recency", "truthfulness", "reasoning"], Which keys to extract from the answer
)
raw_answer, answer = evaluator.evaluate(
query="What is the capital of Brazil?", The user query
document="Rio de Janeiro is the capital of Brazil.", The retrieved document
query_metadata={"today_date": "08-04-2024"}, Some metadata for the query
doc_metadata={"document_date": "04-03-1950"}, Some metadata for the document
)
CLI Interface changes
In the CLI front, each evaluator has its own subprogram now. Instead of calling `ragelo` with a long list of parameters, you can call `ragelo retrieval-evaluator <evaluator>` or `ragelo answer-evaluator <evaluator>` with your preferred evaluator. (We are big fans of the `ragelo retrieval-evaluator domain-expert` 😉 ).
Other changes:
- Moved from using `dataclasses` to Pydantic's `BaseModel`. The code should support Pydantic >=0.9, but let us know if it doesn't work for you.
- Calling `batch_evaluator` will now return both the existing and new annotations, instead of only writing to new annotations to a file.
- Interface of the `batch_evaluator` is much simplified. Now, instead of a dictionary of dictionaries, it requires a list of `Query`, and each query have its own list of documents and answers.
- `PairwiseAnswerEvaluator` is much simplified now. `k` is the number of games to generate per query, instead of the grand total.
- Many specific methods are simplified and moved upper in the class hierarchy. More code sharing and easier to maintain!
**Full Changelog**: https://github.com/zetaalphavector/RAGElo/compare/0.0.5...0.1.0