Overview
It’s been awhile since we’ve done a release and we have a ton of cool, new features in the torchtune library including distributed QLoRA support, new models, sample packing, and more! Checkout new-contributors for an exhaustive list of new contributors to the repo.
Enjoy the new release and happy tuning!
New Features
Here’s some highlights of our new features in v0.2.0.
Recipes
- We added support for QLoRA with FSDP2! This means users can now run 70B+ models on multiple GPUs. We provide example configs for Llama2 7B and 70B sizes. Note: this currently requires you to install PyTorch nightlies to access the FSDP2 methods. (909)
- Also by leveraging FSDP2, we see a speed up of 12% tokens/sec and a 3.2x speedup in model init over FSDP1 with LoRA (855)
- We added support for other variants of the Meta-Llama3 recipes including:
- 70B with LoRA (802)
- 70B full finetune (993)
- 8B memory-efficient full finetune which saves 46% peak memory over previous version (990)
- We introduce a quantization-aware training (QAT) recipe. Training with QAT shows significant improvement in model quality if you plan on quantizing your model post-training. (980)
- torchtune made updates to the eval recipe including:
- Batched inference for faster eval (947)
- Support for free generation tasks in EleutherAI Eval Harness (975)
- Support for custom eval configs (1055)
Models
- Phi-3 Mini-4K-Instruct from Microsoft (876)
- Gemma 7B from Google (971)
- Code Llama2: 7B, 13B, and 70B sizes from Meta (847)
- salman designed and implemented reward modeling for Mistral models (840, 991)
Perf, memory, and quantization
- We made improvements to our FSDP + Llama3 recipe, resulting in 13% more savings in allocated memory for the 8B model. (865)
- Added Int8 per token dynamic activation + int4 per axis grouped weight (8da4w) quantization (884)
Data/Datasets
- We added support for a widely requested feature - sample packing! This feature drastically speeds up model training - e.g. 2X faster with the alpaca dataset. (875, 1109)
- In addition to our instruct tuning, we now also support continued pretraining and include several example datasets like wikitext and CNN DailyMail. (868)
- Users can now train on multiple datasets using concat datasets (889)
- We now support OpenAI conversation style data (890)
Miscellaneous
- jeromeku added a much more advanced profiler so users can understand the exact bottlenecks in their LLM training. (1089)
- We made several metric logging improvements:
- Log tokens/sec, per-step logging, configurable memory logging (831)
- Better formatting for stdout memory logs (817)
- Users can now save models in a safetensor format. (1096)
- Updated activation checkpointing to support selective layer and selective op activation checkpointing (785)
- We worked with the Hugging Face team to provide support for loading adapter weights fine tuned via torchtune directly into the PEFT library. (933)
Documentation
- We wrote a new tutorial for fine-tuning Llama3 with chat data (823) and revamped the datasets tutorial (994)
- Looooooooong overdue, but we added proper documentation for the tune CLI (1052)
- Improved contributing guide (896)
Bug Fixes
- optimox found and fixed a bug to ensure that LoRA dropout was correctly applied (996)
- Fixed a broken link for Llama3 tutorial in 805
- Fixed Gemma model generation (1016)
- Bug workaround: to download CNN DailyMail, launch a single device recipe first and once it’s downloaded you can use the dataset for distributed recipes.
New Contributors
- supernovae made their first contribution in https://github.com/pytorch/torchtune/pull/803
- eltociear made their first contribution in https://github.com/pytorch/torchtune/pull/814
- Carolinabanana made their first contribution in https://github.com/pytorch/torchtune/pull/810
- musab-mk made their first contribution in https://github.com/pytorch/torchtune/pull/818
- apthagowda97 made their first contribution in https://github.com/pytorch/torchtune/pull/816
- lessw2020 made their first contribution in https://github.com/pytorch/torchtune/pull/785
- weifengpy made their first contribution in https://github.com/pytorch/torchtune/pull/843
- musabgultekin made their first contribution in https://github.com/pytorch/torchtune/pull/857
- xingyaoww made their first contribution in https://github.com/pytorch/torchtune/pull/890
- vmoens made their first contribution in https://github.com/pytorch/torchtune/pull/902
- andrewor14 made their first contribution in https://github.com/pytorch/torchtune/pull/884
- kunal-mansukhani made their first contribution in https://github.com/pytorch/torchtune/pull/926
- EvilFreelancer made their first contribution in https://github.com/pytorch/torchtune/pull/889
- water-vapor made their first contribution in https://github.com/pytorch/torchtune/pull/950
- Optimox made their first contribution in https://github.com/pytorch/torchtune/pull/995
- tambulkar made their first contribution in https://github.com/pytorch/torchtune/pull/1011
- christobill made their first contribution in https://github.com/pytorch/torchtune/pull/1004
- j-dominguez9 made their first contribution in https://github.com/pytorch/torchtune/pull/1056
- andyl98 made their first contribution in https://github.com/pytorch/torchtune/pull/1061
- hmosousa made their first contribution in https://github.com/pytorch/torchtune/pull/1065
- yasser-sulaiman made their first contribution in https://github.com/pytorch/torchtune/pull/1055
- parthsarthi03 made their first contribution in https://github.com/pytorch/torchtune/pull/1081
- mdeff made their first contribution in https://github.com/pytorch/torchtune/pull/1086
- jeffrey-fong made their first contribution in https://github.com/pytorch/torchtune/pull/1096
- jeromeku made their first contribution in https://github.com/pytorch/torchtune/pull/1089
- man-shar made their first contribution in https://github.com/pytorch/torchtune/pull/1126
**Full Changelog**: https://github.com/pytorch/torchtune/compare/v0.1.1...v0.2.0