Repoqa

Latest version: v0.1.2

Safety actively analyzes 722491 Python packages for vulnerabilities to keep your Python projects secure.

0.1.2

Notable updates

* Fixed wget dependency
* Propageted `trust_remote_code` for tokenizers

Resources

* PyPI: https://pypi.org/project/repoqa/0.1.2/
* Homepage: https://evalplus.github.io/repoqa.html
* Dataset release: https://github.com/evalplus/repoqa_release

0.1.1

Notable updates

* Trimming output before post-processing largely improved certain cases ganler
* Fixed HF backend zyzzzz-123 ganler
* HF backend supports `attn-implementation` to enable flash-attn 2 ganler
* Optimized the computation of trained context size JialeTomTian 38
* End-of-string optimization largely improved the inference speed ganler
* Optimized post-processing accuracy using a better regex expression ganler

Finished features/fixes are listed as noticeable.
WIP updates will be listed in subsequent releases when they are fully done.

Full changelog: https://github.com/evalplus/repoqa/compare/v0.1.0...v0.1.1

Quick examples

shell
pip install repoqa
repoqa.search_needle_function --model "gpt4-turbo" --backend openai
repoqa.search_needle_function --model "claude-3-haiku-20240307" --backend anthropic
repoqa.search_needle_function --model "Qwen/CodeQwen1.5-7B-Chat" --backend vllm
repoqa.search_needle_function --model "Qwen/CodeQwen1.5-7B-Chat" --backend hf --trust-remote-code --attn-implementation "flash_attention_2"
repoqa.search_needle_function --model "gemini-1.5-pro-latest" --backend google

Resources

* PyPI: https://pypi.org/project/repoqa/0.1.1/
* Homepage: https://evalplus.github.io/repoqa.html
* Dataset release: https://github.com/evalplus/repoqa_release

0.1.0

RepoQA for Long-Context Code Understanding

Introduction

RepoQA is a benchmark that aims to exercise LLM's long-context code understanding ability.

* **Multi-Lingual**: RepoQA now supports repositories from 5 programming languages:
* Python
* C++
* TypeScript
* Rust
* Java
* **Application-driven**: RepoQA aims to evaluate LLMs on long-context tasks that can reflect real-life uses. Before RepoQA, long-context evaluators mainly focus on using synthetic tasks to examine the vulnerable parts of the LLM's long context, such as *"Needle in the Code"* by [CodeQwen](https://qwenlm.github.io/blog/codeqwen1.5/) and *"Needle in a Haystack"*.
* The first RepoQA task we propose is [🔍 Searching Needle Function](https://evalplus.github.io/repoqa.html#task-snf):
* 500 sub-tasks = 5 PLs x 10 repos x 10 needles
* Asks the model to search the corresponding function (we call it needle function) given a precise natural language description

![](https://evalplus.github.io/assets/RepoQA-CTX.svg)

RepoQA is easy to use

* Supports following backends
* OpenAI
* Anthropic
* vLLM
* HuggingFace transformers
* Google Generative AI API (Gemini)
* 🚀 Evaluation can be done in one command
* 🏆 A leaderboard: https://evalplus.github.io/repoqa.html

Quick examples

shell
pip install repoqa
repoqa.search_needle_function --model "gpt4-turbo" --backend openai
repoqa.search_needle_function --model "claude-3-haiku-20240307" --backend anthropic
repoqa.search_needle_function --model "Qwen/CodeQwen1.5-7B-Chat" --backend vllm
repoqa.search_needle_function --model "Qwen/CodeQwen1.5-7B-Chat" --backend hf --trust-remote-code
repoqa.search_needle_function --model "gemini-1.5-pro-latest" --backend google

Resources

* PyPI: https://pypi.org/project/repoqa/0.1.0/
* Homepage: https://evalplus.github.io/repoqa.html
* Dataset release: https://github.com/evalplus/repoqa_release

0.1.0rc1

dev-dataset

dev-results
See attachment; some results might be incomplete.

dependency
We use this release to upload dependency files of different languages produced by https://github.com/evalplus/repoqa/tree/main/scripts/curate/dep_analysis

Releases

Has known vulnerabilities