Highlights
Today we release v0.4.0 of torchtune with some exciting new additions! Some notable ones include full support for activation offloading, recipes for Llama3.2V 90B and QLoRA variants, new documentation, and Qwen2.5 models!
Activation offloading (1443, 1645, 1847)
Activation offloading is a memory-saving technique that asynchronously moves checkpointed activations that are not currently running to the CPU. Right before the GPU needs the activations for the microbatch’s backward pass, this functionality prefetches the offloaded activations back from the CPU. Enabling this functionality is as easy as setting the following options in your config:
yaml
enable_activation_checkpointing: True
enable_activation_offloading: True
In [experiments with Llama3 8B](https://github.com/pytorch/torchtune/pull/1443), activation offloading used roughly 24% less memory while inflicting a performance slowdown of under 1%.
Llama3.2V 90B with QLoRA (1880, 1726)
We added model builders and configs for the 90B version of Llama3.2V, which [outperforms the 11B version of the model](https://huggingface.co/meta-llama/Llama-3.2-90B-Vision#base-pretrained-models) across common benchmarks. Because this model size is larger, we also added the ability to run the model using QLoRA and FSDP2.
bash
Download the model first
tune download meta-llama/Llama-3.2-90B-Vision-Instruct --ignore-patterns "original/consolidated*"
Run with e.g. 4 GPUs
tune run --nproc_per_node 4 lora_finetune_distributed --config llama3_2_vision/90B_qlora
Qwen2.5 model family has landed (1863)
We added builders for [Qwen2.5](https://qwenlm.github.io/blog/qwen2.5/), the cutting-edge models from the Qwen family of models! In their own words "Compared to Qwen2, Qwen2.5 has acquired significantly more knowledge (MMLU: 85+) and has greatly improved capabilities in coding (HumanEval 85+) and mathematics (MATH 80+)."
Get started with the models easily:
bash
tune download Qwen/Qwen2.5-1.5B-Instruct --ignore-patterns None
tune run lora_finetune_single_device --config qwen2_5/1.5B_lora_single_device
New documentation on using custom recipes, configs, and components (1910)
We heard your feedback and wrote up a simple page on how to customize configs, recipes, and individual components! Check it out [here](https://pytorch.org/torchtune/main/basics/custom_components.html)
What's Changed
* Fix PackedDataset bug for seq_len > 2 * max_seq_len setting. by mirceamironenco in https://github.com/pytorch/torchtune/pull/1697
* Bump version 0.3.1 by joecummings in https://github.com/pytorch/torchtune/pull/1720
* Add error propagation to distributed run. by mirceamironenco in https://github.com/pytorch/torchtune/pull/1719
* Update fusion layer counting logic for Llama 3.2 weight conversion by ebsmothers in https://github.com/pytorch/torchtune/pull/1722
* Resizable image positional embeddings by felipemello1 in https://github.com/pytorch/torchtune/pull/1695
* Unpin numpy by ringohoffman in https://github.com/pytorch/torchtune/pull/1728
* Add HF Checkpoint Format Support for Llama Vision by pbontrager in https://github.com/pytorch/torchtune/pull/1727
* config changes by felipemello1 in https://github.com/pytorch/torchtune/pull/1733
* Fix custom imports for both distributed and single device by RdoubleA in https://github.com/pytorch/torchtune/pull/1731
* Pin urllib3<2.0.0 to fix eleuther eval errors by RdoubleA in https://github.com/pytorch/torchtune/pull/1738
* Fixing recompiles in KV-cache + compile by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1663
* Fix CLIP pos embedding interpolation to work on DTensors by ebsmothers in https://github.com/pytorch/torchtune/pull/1739
* Bump version to 0.4.0 by RdoubleA in https://github.com/pytorch/torchtune/pull/1748
* [Feat] Activation offloading for distributed lora recipe by Jackmin801 in https://github.com/pytorch/torchtune/pull/1645
* Add LR Scheduler to single device full finetune by user074 in https://github.com/pytorch/torchtune/pull/1350
* Custom recipes use slash path by RdoubleA in https://github.com/pytorch/torchtune/pull/1760
* Adds __repr__ to Message by thomasjpfan in https://github.com/pytorch/torchtune/pull/1757
* Fix save adapter weights only by ebsmothers in https://github.com/pytorch/torchtune/pull/1764
* Set drop_last to always True by RdoubleA in https://github.com/pytorch/torchtune/pull/1761
* Remove nonexistent flag for acc offloading in memory_optimizations.rst by janeyx99 in https://github.com/pytorch/torchtune/pull/1772
* [BUGFIX] Adding sequence truncation to `max_seq_length` in eval recipe by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1773
* Add ROCm "support" by joecummings in https://github.com/pytorch/torchtune/pull/1765
* [BUG] Include system prompt in Phi3 by default by joecummings in https://github.com/pytorch/torchtune/pull/1778
* Fixing quantization in eval recipe by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1777
* Delete deprecated ChatDataset and InstructDataset by joecummings in https://github.com/pytorch/torchtune/pull/1781
* Add split argument to required builders and set it default value to "train" by krammnic in https://github.com/pytorch/torchtune/pull/1783
* Fix quantization with generate by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1784
* Fix typo in multimodal_datasets.rst by krammnic in https://github.com/pytorch/torchtune/pull/1787
* Make AlpacaToMessage public. by krammnic in https://github.com/pytorch/torchtune/pull/1785
* Fix misleading attn_dropout docstring by ebsmothers in https://github.com/pytorch/torchtune/pull/1792
* Add filter_fn to all generic dataset classes and builders API by krammnic in https://github.com/pytorch/torchtune/pull/1789
* Set dropout in SDPA to 0.0 when not in training mode by ebsmothers in https://github.com/pytorch/torchtune/pull/1803
* Skip entire header for llama3 decode by RdoubleA in https://github.com/pytorch/torchtune/pull/1656
* Remove unused bsz variable by zhangtemplar in https://github.com/pytorch/torchtune/pull/1805
* Adding `max_seq_length` to vision eval config by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1802
* Add check that there is no PackedDataset while building ConcatDataset by krammnic in https://github.com/pytorch/torchtune/pull/1796
* Add posibility to pack in _wikitext.py by krammnic in https://github.com/pytorch/torchtune/pull/1807
* Add evaluation configs under qwen2 dir by joecummings in https://github.com/pytorch/torchtune/pull/1809
* Fix eos_token problem in all required models by krammnic in https://github.com/pytorch/torchtune/pull/1806
* Deprecating `TiedEmbeddingTransformerDecoder` by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1815
* Torchao version check changes/BC import of TensorCoreTiledLayout by ebsmothers in https://github.com/pytorch/torchtune/pull/1812
* 1810 move gemma evaluation by malinjawi in https://github.com/pytorch/torchtune/pull/1819
* Consistent type checks for prepend and append tags. by krammnic in https://github.com/pytorch/torchtune/pull/1824
* Move schedulers to training from modules. by krammnic in https://github.com/pytorch/torchtune/pull/1801
* Update EleutherAI Eval Harness to v0.4.5 by joecummings in https://github.com/pytorch/torchtune/pull/1800
* 1810 Add evaluation configs under phi3 dir by Harthi7 in https://github.com/pytorch/torchtune/pull/1822
* Create CITATION.cff by joecummings in https://github.com/pytorch/torchtune/pull/1756
* fixed error message for GatedRepoError by DawiAlotaibi in https://github.com/pytorch/torchtune/pull/1832
* 1810 Move mistral evaluation by Yousof-kayal in https://github.com/pytorch/torchtune/pull/1829
* More consistent trace names. by krammnic in https://github.com/pytorch/torchtune/pull/1825
* fbcode using TensorCoreLayout by jerryzh168 in https://github.com/pytorch/torchtune/pull/1834
* Remove pad_max_tiles in CLIP by pbontrager in https://github.com/pytorch/torchtune/pull/1836
* Remove pad_max_tiles in CLIP inference by lucylq in https://github.com/pytorch/torchtune/pull/1853
* Add ``vqa_dataset``, update docs by krammnic in https://github.com/pytorch/torchtune/pull/1820
* Add offloading tests and fix obscure edge case by janeyx99 in https://github.com/pytorch/torchtune/pull/1860
* Toggling KV-caches by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1763
* Cacheing doc nits by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1876
* LoRA typo fix + bias=True by felipemello1 in https://github.com/pytorch/torchtune/pull/1881
* Correct `torchao` check for `TensorCoreTiledLayout` by joecummings in https://github.com/pytorch/torchtune/pull/1886
* Kd_loss avg over tokens by moussaKam in https://github.com/pytorch/torchtune/pull/1885
* Support Optimizer-in-the-backward by mori360 in https://github.com/pytorch/torchtune/pull/1833
* Remove deprecated `GemmaTransformerDecoder` by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1892
* Add PromptTemplate examples by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1891
* Temporarily disable building Python 3.13 version of torchtune by joecummings in https://github.com/pytorch/torchtune/pull/1896
* Block on Python 3.13 version by joecummings in https://github.com/pytorch/torchtune/pull/1898
* [bug] fix sharding multimodal by felipemello1 in https://github.com/pytorch/torchtune/pull/1889
* QLoRA with bias + Llama 3.2 Vision QLoRA configs by ebsmothers in https://github.com/pytorch/torchtune/pull/1726
* Block on Python 3.13 version by joecummings in https://github.com/pytorch/torchtune/pull/1899
* Normalize CE loss by total number of (non-padding) tokens by ebsmothers in https://github.com/pytorch/torchtune/pull/1875
* nit: remove (nightly) in recipes by krammnic in https://github.com/pytorch/torchtune/pull/1882
* Expose packed: False, set log_peak_memory_stats: True, set compile: False by krammnic in https://github.com/pytorch/torchtune/pull/1872
* Remove ChatFormat, InstructTemplate, old message converters by RdoubleA in https://github.com/pytorch/torchtune/pull/1895
* Make TensorCoreTiledLayout import more robust by andrewor14 in https://github.com/pytorch/torchtune/pull/1912
* [ez] Fix README download example by RdoubleA in https://github.com/pytorch/torchtune/pull/1915
* [docs] Custom components page by RdoubleA in https://github.com/pytorch/torchtune/pull/1910
* Update imports after QAT was moved out of prototype by andrewor14 in https://github.com/pytorch/torchtune/pull/1883
* Updating memory optimization overview by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1916
* Patch github link in torchtune docs header by ebsmothers in https://github.com/pytorch/torchtune/pull/1914
* Llama 3.2 Vision - 90B by felipemello1 in https://github.com/pytorch/torchtune/pull/1880
* Fixing DoRA docs, adding to mem opt tutorial by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1918
* Add KD distributed recipe by lindawangg in https://github.com/pytorch/torchtune/pull/1631
* add missing doc by felipemello1 in https://github.com/pytorch/torchtune/pull/1924
* [FIX] MM Eval Mask Sizes by pbontrager in https://github.com/pytorch/torchtune/pull/1920
* Activation offloading for fullfinetuning + fix tied embedding by felipemello1 in https://github.com/pytorch/torchtune/pull/1847
* Qwen2.5 by calvinpelletier in https://github.com/pytorch/torchtune/pull/1863
* Restore backward after each batch for grad accum by ebsmothers in https://github.com/pytorch/torchtune/pull/1917
* Fix lora single device fine tune checkpoint saving & nan loss when use_dora=True by mirceamironenco in https://github.com/pytorch/torchtune/pull/1909
New Contributors
* ringohoffman made their first contribution in https://github.com/pytorch/torchtune/pull/1728
* Jackmin801 made their first contribution in https://github.com/pytorch/torchtune/pull/1645
* user074 made their first contribution in https://github.com/pytorch/torchtune/pull/1350
* krammnic made their first contribution in https://github.com/pytorch/torchtune/pull/1783
* zhangtemplar made their first contribution in https://github.com/pytorch/torchtune/pull/1805
* malinjawi made their first contribution in https://github.com/pytorch/torchtune/pull/1819
* Harthi7 made their first contribution in https://github.com/pytorch/torchtune/pull/1822
* DawiAlotaibi made their first contribution in https://github.com/pytorch/torchtune/pull/1832
* Yousof-kayal made their first contribution in https://github.com/pytorch/torchtune/pull/1829
* moussaKam made their first contribution in https://github.com/pytorch/torchtune/pull/1885
* mori360 made their first contribution in https://github.com/pytorch/torchtune/pull/1833
**Full Changelog**: https://github.com/pytorch/torchtune/compare/v0.3.1...v0.4.0