🎈🎉🥳 We're excited to announce the release of txtai 4.0! 🥳🎉🎈
_Thank you to the growing txtai community. This couldn't be done without you. Please remember to ⭐ txtai if it has been helpful._
txtai 4.0 is a major release with a significant number of new features. This release adds content storage, querying with sql, object storage, reindexing, index compression, external vectors and more!
To quantify the changes, the code base increased by 50% with 36 resolved issues, by far the biggest release of txtai. These changes were designed to be fully backward compatible but keep in mind it is a new major release.
[What's new in txtai 4.0](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/24_Whats_new_in_txtai_4_0.ipynb) covers all the changes with detailed examples. The [documentation site](https://neuml.github.io/txtai) has also been refreshed.
New Features
--------------------------
- Store text content (168)
- Add option to index dictionaries of content (169)
- Add SQL support for generating combined embeddings + database queries (170)
- Add reindex method to embeddings (171)
- Add index archive support (172)
- Add close method to embeddings (173)
- Update API to work with embeddings + database search (176)
- Add content option to tabular pipeline (177)
- Update workflow example to support embeddings content (179)
- Add index metadata to embeddings config (180)
- Add object storage (183)
- Aggregate partial query results when clustering (184)
- Add function parameter to embeddings reindex (185)
- Add support for user defined column aliases (186)
- Use SQL bracket notation to support multi word and more complex JSON path expressions (187)
- Support SQLite 3.22+ (190)
- Add pre-computed vector support (192)
- Change document/object inserts to only keep latest record (193)
- Update documentation with 4.0 changes (196)
Improvements
--------------------------
- Modify workflow to select batches with slices (158)
- Add tensor support to workflows (159)
- Read YAML config if provided as a file path (162)
- Make adding pipelines to API easier (163)
- Process task actions concurrently (164)
- Add tensor workflow notebook (167)
- Update default ANN parameters (174)
- Require Python 3.7+ (175)
- Consistently name embeddings id fields (178)
- Add txtai __version__ attribute (181)
- Refresh notebooks for 4.0 (188)
- Modify embeddings to only iterate over input documents once (189)
- Improve efficiency of vector transformations (191)
Bug Fixes
--------------------------
- Add thread lock around API write calls (160)
- Expose caption and objects pipeline via API (161)
- Change pickle calls to use protocol supporting lowest Python version (182)
- HFOnnx expects ORT provider bug (195)