We are very happy to announce this major new release of Curated Transformers! 🎉
Curated Transformers started as a small transformer library for spaCy pipelines. Over the last two months we made it a pure PyTorch library that is completely independent of spaCy and Thinc. We also added support for popular LLM models, generation, 8-bit/4-bit quantization, and many other features:
* Curated Transformers is now a pure PyTorch library.
* Support for popular LLMs such as Falcon, LLaMA, and Dolly v2.
* Greedy generation and generation with sampling.
* 8-bit and 4-bit quantization of models through [`bitsandbytes`](https://github.com/TimDettmers/bitsandbytes).
* Flash attention and other optimizations through PyTorch Scaled Dot Product Attention.
* Efficient model loading without unneeded allocations and initialization through the Torch `meta` devices.
* Support for modern `tokenizer.json` tokenizers.
* Load models from Hugging Face Hub without requiring the `transformers` package.
* Extensive [API documentation](https://curated-transformers.readthedocs.io/en/v0.9.x/) and examples.
Curated Transformers can be used in spaCy using the [`spacy-curated-transformers`](https://github.com/explosion/spacy-curated-transformers/tree/main) package.
👥 Contributors
danieldk, honnibal, ines, shadeMe