Flagembedding

Latest version: v1.3.4

Safety actively analyzes 706267 Python packages for vulnerabilities to keep your Python projects secure.

1.3.4

What's Changed
* Inference docstring by ZiyiXia in https://github.com/FlagOpen/FlagEmbedding/pull/1186
* delete useless parameters for embedder classes by hanhainebula in https://github.com/FlagOpen/FlagEmbedding/pull/1189
* Bug of BGE M3 training by baochi0212 in https://github.com/FlagOpen/FlagEmbedding/pull/1183
* feat:add bce-embedding-base_v1 by zhudongwork in https://github.com/FlagOpen/FlagEmbedding/pull/1198
* Docstring by ZiyiXia in https://github.com/FlagOpen/FlagEmbedding/pull/1200
* Update AbsDataset.py by jhyeom1545 in https://github.com/FlagOpen/FlagEmbedding/pull/1204
* Fix bugs by hanhainebula in https://github.com/FlagOpen/FlagEmbedding/pull/1211
* fixed a bug in AbsReranker.py for mps device support by Swgj in https://github.com/FlagOpen/FlagEmbedding/pull/1216
* Fix bugs by hanhainebula in https://github.com/FlagOpen/FlagEmbedding/pull/1219
* update stop pool by 545999961 in https://github.com/FlagOpen/FlagEmbedding/pull/1221
* update mteb eval by 545999961 in https://github.com/FlagOpen/FlagEmbedding/pull/1227
* update adjust batch size by 545999961 in https://github.com/FlagOpen/FlagEmbedding/pull/1229
* update mteb eval by 545999961 in https://github.com/FlagOpen/FlagEmbedding/pull/1230
* fix bugs and refactor code by hanhainebula in https://github.com/FlagOpen/FlagEmbedding/pull/1231
* update mteb eval by 545999961 in https://github.com/FlagOpen/FlagEmbedding/pull/1235
* release training data for bge-multilingual-gemma2 by hanhainebula in https://github.com/FlagOpen/FlagEmbedding/pull/1245
* add missed trust_remote_code for finetune code by hanhainebula in https://github.com/FlagOpen/FlagEmbedding/pull/1248
* fix DecoderOnlyEmbedderICLSameDatasetTrainDataset category index error by billvsme in https://github.com/FlagOpen/FlagEmbedding/pull/1232
* Clean code by hanhainebula in https://github.com/FlagOpen/FlagEmbedding/pull/1250
* Fix bugs by hanhainebula in https://github.com/FlagOpen/FlagEmbedding/pull/1253
* update examples by 545999961 in https://github.com/FlagOpen/FlagEmbedding/pull/1254
* update examples by 545999961 in https://github.com/FlagOpen/FlagEmbedding/pull/1255
* Fix air-bench eval bugs: AIRBenchEvalArgs by hanhainebula in https://github.com/FlagOpen/FlagEmbedding/pull/1256
* Fix air-bench eval bugs: AIRBenchEvalArgs by hanhainebula in https://github.com/FlagOpen/FlagEmbedding/pull/1257
* update code and README for scripts by hanhainebula in https://github.com/FlagOpen/FlagEmbedding/pull/1258
* update examples by 545999961 in https://github.com/FlagOpen/FlagEmbedding/pull/1261
* update `C_MTEB` reference by emmanuel-ferdman in https://github.com/FlagOpen/FlagEmbedding/pull/1296
* [Bugfix] Typehint error on py38 by DrDavidS in https://github.com/FlagOpen/FlagEmbedding/pull/1300
* Update model_mapping.py by pengjunfeng11 in https://github.com/FlagOpen/FlagEmbedding/pull/1311
* fix bugs for embedder finetune by hanhainebula in https://github.com/FlagOpen/FlagEmbedding/pull/1328
* fix a bug in icl/dataset.py by hanhainebula in https://github.com/FlagOpen/FlagEmbedding/pull/1330
* Fix bugs by hanhainebula in https://github.com/FlagOpen/FlagEmbedding/pull/1340
* fix beir data_loader.py: dev -> validation by hanhainebula in https://github.com/FlagOpen/FlagEmbedding/pull/1341
* update embedder finetune code by hanhainebula in https://github.com/FlagOpen/FlagEmbedding/pull/1342
* Fix Bug: OOM by 545999961 in https://github.com/FlagOpen/FlagEmbedding/pull/1349
* fix transformers 4.48.0 by Hypothesis-Z in https://github.com/FlagOpen/FlagEmbedding/pull/1343
* Fix a bug in beir evaluation and release v1.3.4 by hanhainebula in https://github.com/FlagOpen/FlagEmbedding/pull/1359
* del dp code by hanhainebula in https://github.com/FlagOpen/FlagEmbedding/pull/1360
* support musa backend in FlagEmbedding by qiyulei-mt in https://github.com/FlagOpen/FlagEmbedding/pull/1350
* docs: fix link to https://bge-model.com/ within NEWS section by bufferoverflow in https://github.com/FlagOpen/FlagEmbedding/pull/1355
* fix/reranking tutorial typos by rendyfebry in https://github.com/FlagOpen/FlagEmbedding/pull/1313

New Contributors
* baochi0212 made their first contribution in https://github.com/FlagOpen/FlagEmbedding/pull/1183
* zhudongwork made their first contribution in https://github.com/FlagOpen/FlagEmbedding/pull/1198
* jhyeom1545 made their first contribution in https://github.com/FlagOpen/FlagEmbedding/pull/1204
* Swgj made their first contribution in https://github.com/FlagOpen/FlagEmbedding/pull/1216
* billvsme made their first contribution in https://github.com/FlagOpen/FlagEmbedding/pull/1232
* emmanuel-ferdman made their first contribution in https://github.com/FlagOpen/FlagEmbedding/pull/1296
* DrDavidS made their first contribution in https://github.com/FlagOpen/FlagEmbedding/pull/1300
* pengjunfeng11 made their first contribution in https://github.com/FlagOpen/FlagEmbedding/pull/1311
* Hypothesis-Z made their first contribution in https://github.com/FlagOpen/FlagEmbedding/pull/1343
* qiyulei-mt made their first contribution in https://github.com/FlagOpen/FlagEmbedding/pull/1350
* bufferoverflow made their first contribution in https://github.com/FlagOpen/FlagEmbedding/pull/1355
* rendyfebry made their first contribution in https://github.com/FlagOpen/FlagEmbedding/pull/1313

**Full Changelog**: https://github.com/FlagOpen/FlagEmbedding/compare/v1.3.2-BGE-Update...v1.3.4

v1.3.2-BGE-Update
We have completely updated the BGE code repository, including the following key improvements:

Inference Code

- Added `FlagAutoModel` and `FlagAutoReranker`, making it easier to utilize the models.

Inference Optimization

- Implemented multi-GPU support.
- Introduced dynamic batch sizing to prevent out-of-memory (OOM) issues.
- Optimized padding to improve efficiency.

Evaluation Code

- Integrated support for common evaluation datasets to enhance user convenience.
- Provided a custom evaluation interface, adhering to specified data organization standards, to simplify the evaluation process.

Project Structure Organization

- Reorganized the project to streamline processes related to **inference**, **fine-tuning**, and **evaluation**.

BGE-M3&Beacon
BGE-M3
A new member of the BGE model series! BGE-M3 stands for Multi-linguality, Multi-granularities (input length up to 8192), and Multi-Functionality (unification of dense, lexical, multi-vec retrieval). It is the first embedding model which supports all three retrieval methods.

For more details please refer to [Technical Report](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/BGE_M3/BGE_M3.pdf) and [Code](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/BGE_M3).

Activation Beacon
An effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM by **x100** times. We extend the context length of Llama-2-chat-7b from 4K to 400K.

For more details please refer to [paper](https://arxiv.org/abs/2401.03462) and [code](https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/activation_beacon)

Feedback is welcome

lm-cocktail

LM-Cocktail
Merge language models (e.g., Llama, bge) to improve the general ability of models.
This method can be used to:
- Mitigate the Problem of Catastrophic Forgetting
- Improve the performance of new tasks without fine-tuning
- Approximate multitask learning or model ensemble

More details please refer to [paper](https://arxiv.org/abs/2311.13534) and [code](https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail)

1.1

Create the first release 131
FlagEmbedding
- Update Embedding Models `bge-*-v1.5`:
- alleviate the issue of the similarity distribution
- the new models can do retrieval tasks without instruction, but still recommend using instruction which can have a better performance.
- New Models `bge-reranker-*`: cross-encoders that can rerank the top-k retrieved results
- Specify using normalization in the configuration for sentence-transformers, thanks to [skirres](https://huggingface.co/skirres).
Now users have no need to set `normalize_embeddings=True` manually when using sentence-transformers.

C-MTEB
- Add two cross-lingual retrieval tasks: T2RerankingZh2En and T2RerankingEn2Zh.

Releases

Has known vulnerabilities

1.3.4
1.1

Flagembedding

Page 1 of 1

1.3.4

1.1

Page 1 of 1

Links

Releases