txtai 3.0.0 is a major release with a significant number of new features. This release overhauls the project structure, consolidates logic into pipelines and introduces workflows.
Summary of txtai features:
- 🔎 Large-scale similarity search with multiple index backends ([Faiss](https://github.com/facebookresearch/faiss), [Annoy](https://github.com/spotify/annoy), [Hnswlib](https://github.com/nmslib/hnswlib))
- 📄 Create embeddings for text snippets, documents, audio and images. Supports transformers and word vectors.
- 💡 Machine-learning pipelines to run extractive question-answering, zero-shot labeling, transcription, translation, summarization and text extraction
- ↪️️ Workflows that join pipelines together to aggregate business logic. txtai processes can be microservices or full-fledged indexing workflows.
- 🔗 API bindings for [JavaScript](https://github.com/neuml/txtai.js), [Java](https://github.com/neuml/txtai.java), [Rust](https://github.com/neuml/txtai.rs) and [Go](https://github.com/neuml/txtai.go)
- ☁️ Cloud-native architecture that scales out with container orchestration systems (e.g. Kubernetes)
New Features
--------------------------
- Add Docker file for API (59)
- Require Faiss 1.7.0 (60)
- Add summary pipeline (65)
- Add text extraction pipeline (66)
- Add transcription pipeline (67)
- Add translation pipeline (68)
- Add workflow framework (69)
- Add additional pipeline abstraction layer for tensor frameworks (70)
- Add tests for new v3 functionality (71)
- Add notebooks covering new v3 functionality (73)
- Add Pipeline Factory (76)
- Add API extensions (77)
- Add workflow builder application (80)
- Add text segmentation pipeline (81)
- Add workflow to API (82)
- Add service workflow task (83)
- Add object storage workflow task (84)
- Add URL workflow task (85)
Improvements
--------------------------
- Refactor code into smaller components and modules (63)
- Modify pipeline to accept GPU device id (64)
- Allow direct download of sentence-transformer models (72)
- Update documentation, add site through GitHub pages (75)
- Modularize the API (78)
- Add default truncation to pipelines (79)
Bug Fixes
--------------------------
- Non intuitive behaviour of Tokenizer (61)
- [Python 3.9, Mac OS] Code hangs while building embedding index (62)
- embeddings.index Truncation RuntimeError: The size of tensor a (889) must match the size of tensor b (512) at non-singleton dimension 1 (74)