langcheck Changelog

0.9.0

Breaking Changes
- Updated the internal prompt used for the following functions. If you want to continue using these functions with previous prompts, specify `eval_prompt_version="v1"`.
- `langcheck.augment.en.rephrase`
- `langcheck.metrics.en.prompt_leakage`
- `langcheck.metrics.ja.prompt_leakage`

New Features
- Added two augmentations to `langcheck.augment.en` and `langcheck.augment.ja`
- `rephrase_with_system_role_context`
- `rephrase_with_user_role_context`

**Full Changelog**: https://github.com/citadel-ai/langcheck/compare/v0.8.1...v0.9.0

0.8.1

Bug fixes
* Fix the issue in slow async call of `OpenAISimilarityScorer`
* Updateded outdated developer documentation
* Unpin `openai` version to resolve the incompatibility with the latest `httpx`

**Full Changelog**: https://github.com/citadel-ai/langcheck/compare/v0.8.0...v0.8.1

0.8.0

Breaking Changes
- Updated the minimal supported Python version to 3.9.
- All prompts for our built-in LLM based metrics are updated. Now all of them have `Output your thought process first, and then provide your final answer` as the last sentence to make sure LLM evaluators actually do the chain-of-thought reasoning. This may affect the output scores as well.
- Fixed a typo in a module name. Now `langcheck.utils.progess_bar` is renamed to `langcheck.utils.progress_bar`.
- Default prompts for `langcheck.en.toxicity` and `langcheck.ja.toxicity` are updated. Refer to 136 for comparison with the original prompt. You can fallback to the old prompts by specifying `eval_prompt_version="v1"` as an argument.
- Updated the arguments for `langcheck.augment.rephrase`. Now they take `EvalClient`s instead of directly taking OpenAI parameters.

New Features
- Added [langcheck.metrics.custom_text_quality](https://langcheck.readthedocs.io/en/latest/langcheck.metrics.custom_text_quality.html#module-langcheck.metrics.custom_text_quality). With functions in this module, you can build your own LLM-based metrics with custom prompts. See the documentation for details.
- Added support of some local LLMs as evaluators
- [LlamaEvalClient](https://langcheck.readthedocs.io/en/latest/langcheck.metrics.eval_clients.html#langcheck.metrics.eval_clients.LlamaEvalClient)
- [PrometheusEvalClient](https://langcheck.readthedocs.io/en/latest/langcheck.metrics.eval_clients.html#langcheck.metrics.eval_clients.PrometheusEvalClient)
- Added new text augmentations
- `jailbreak_templates` augmentation with the following templates
- `basic`, `chatgpt_dan`, `chatgpt_good_vs_evil`, `john` and `universal_adversarial_suffix` (EN)
- `basic`, `chatgpt_good_vs_evil` and `john` (JA)
- `payload_splitting` (EN, JA)
- `to_full_width` (EN)
- `conv_kana` (JA)
- Added new LLM-based built-in metrics for both EN & JA languages
- `answer_correctness`
- `answer_safety`
- `personal_data_leakage`
- `hate_speech`
- `adult_content`
- `harmful_activity`
- Added "Simulated Annotators", a confidence score estimating method proposed in paper [Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement](https://arxiv.org/abs/2407.18370). You can use that by adding `calculated_confidence=True` for `langcheck.metrics.en.pairwise_comparison`.
- Supported embedding-based metrics (e.g. `semantic_similarity`) with async OpenAI-based eval clients.

Bug Fixes
- Added error handling code in `OpenAIEvalClient` and `GeminiAIEvalClient` so that they just return `None` even if they fail in the function calling step.
- Updated `langcheck.metrics.pairwise_comparison` to accept lists with `None` as source texts.
- Fixed an error in `langcheck.augment.synonym` caused by a missing `nltk` package.
- Fixed the issue on decoding UTF-8 texts in some environments.
- Fixed typos in documentation.

**Full Changelog**: https://github.com/citadel-ai/langcheck/compare/v0.7.1...v0.8.0

0.7.1

Breaking Changes

The interface for OpenAI metrics has been changed in LangCheck v0.7.0. See [Computing Metrics with Remote LLMs](https://langcheck.readthedocs.io/en/latest/metrics.html#computing-metrics-with-remote-llms) for examples.

New Features
* Added support for computing metrics with Claude & Gemini in addition to OpenAI

Bug Fixes
* Fixed an issue in VSCode dev containers for Linux ARM64 devices

-----

0.7.0

Breaking Changes

The interface for OpenAI metrics has been changed in LangCheck v0.7.0. See [Computing Metrics with Remote LLMs](https://langcheck.readthedocs.io/en/latest/metrics.html#computing-metrics-with-remote-llms) for examples.

New Features
* Added support for computing metrics with Claude & Gemini in addition to OpenAI

Bug Fixes
* Fixed an issue in VSCode dev containers for Linux ARM64 devices

**Full Changelog**: https://github.com/citadel-ai/langcheck/compare/v0.6.0...v0.7.0

0.6.0

Breaking Changes

* To simplify installation, LangCheck v0.6.0 now splits the package into multiple languages.
* `pip install langcheck` will only install English metrics
* `pip install langcheck[all]` will install metrics for all languages
* `pip install langcheck[ja]` will only install Japanese and English metrics (use `[de]` for German and `[zh]` for Chinese)
* See [the "Installation" guide](https://langcheck.readthedocs.io/en/latest/installation.html) for more details.

New Features
* [Added the `pairwise_comparison()` metric](https://langcheck.readthedocs.io/en/latest/metrics.html#pairwise-text-quality-metrics) to automatically rank the output of two models!
* Enabled running OpenAI metrics asynchronously. To do this, set the `use_async` parameter for OpenAI metric functions, such as [`factual_consistency()`](https://langcheck.readthedocs.io/en/latest/langcheck.metrics.en.source_based_text_quality.html#langcheck.metrics.en.source_based_text_quality.factual_consistency)

Bug Fixes
* Fixed the local Japanese, German, Chinese `factual_consistency()` metric to not throw an exception for long inputs (inputs longer than the model accepts are automatically truncated now)
* Updated the "Pytest" status badge in README.md
* Refactored implementation of local metrics

**Full Changelog**: https://github.com/citadel-ai/langcheck/compare/v0.5.0...v0.6.0

Langcheck

Page 1 of 2