Ragatouille

Latest version: v0.0.8.post4

Safety actively analyzes 682416 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

0.0.8post1

Minor fix: Corrects `from time import time` import introduced in indexing overhaul and causing crashing issues as `time` was then used improperly.

0.0.8

Major changes:

- Indexing overhaul contributed by jlscheerer https://github.com/bclavie/RAGatouille/pull/158
- Relaxed dependencies to ensure lower install load https://github.com/bclavie/RAGatouille/pull/173
- Indexing for under 100k documents will by default no longer use Faiss, performing K-Means in pure PyTorch instead. This is a bit of an experimental change, but benchmark results are encouraging and result in greatly increased compatibility. https://github.com/bclavie/RAGatouille/pull/173
- CRUD improvements by anirudhdharmarajan. Feature is still experimental/not fully supported, but rapidly improving!

Fixes:
- Many small bug fixes, mainly around typing
- Training triplets improvement (already present in 0.0.7 post versions) by JoshuaPurtell

0.0.7post3

- Improvements for data preprocessing issues and fixes for broken training example by jonppe (138) 🙏

0.0.7post2

Fixes & tweaks to the previous release:

- Automatically adjust batch size on longer contexts (32 for 512 tokens, 16 for 1024, 8 for 2048, decreasing like this until a minimum of 1)
- Apply dynamic max context length to reranking

0.0.7post1

Release focusing on length adjustments. Much more dynamism and on-the-fly adaptation, both for query length and maximum document length!


- Remove hardcoded maximum length: it is now inferred from your base model's maximum position encodings. This enables support for longer-context ColBERT, such as [Jina ColBERT](https://huggingface.co/jinaai/jina-colbert-v1-en)
- Upstream changes to `colbert-ai` to allow any base model to be used, rather than pre-defined ones.
- Query length now adjusts dynamically, from 32 (hardcoded minimum) to your model's maximum context window for longer queries.

0.0.6c2

(notes encompassing changes in the last few PyPi releases that were undocumented until now)

**Changes**:
- Query only a subset documents based on doc ids by PrimoUomo89 https://github.com/bclavie/RAGatouille/pull/94
- Return chunk ids in results thanks to PrimoUomo89 https://github.com/bclavie/RAGatouille/pull/125
- Lower kmeans iterations when not necessary to run more https://github.com/bclavie/RAGatouille/pull/129
- Properly license the library as Apache-2 on PyPi

**Fixes**:
- Dynamically increase search hyper parameters for large k values and lower doc counts, reducing the number of situations where the total number of documents return is substantially below `k` https://github.com/bclavie/RAGatouille/pull/131
- Fix enabling the use of Training data processing with hard negatives turned off by corrius https://github.com/bclavie/RAGatouille/pull/117
- Proper handling of different input types when pre-processing training triplets by GautamR-Samagra https://github.com/bclavie/RAGatouille/pull/115

Page 1 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.