Torchtune

Latest version: v0.6.0

Safety actively analyzes 722861 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 2

0.6.0

Highlights

We are releasing torchtune v0.6.0 with exciting new features and improved distributed training support! This release includes TensorParallel (TP) + FSDP training, TP inference, **multinode training**, and a full DPO distributed recipe. We also landed Phi4, logging with MLFlow, and improved support for NPUs.

Tensor Parallel training + inference (2245) (2330)

Tensor parallelism is a model parallelism technique for distributed training. When combined with FSDP, TP allows more efficient training of large models across many GPUs versus FSDP alone. While FSDP splits your dataset across different GPUs, TP splits each model layer across GPUs, allowing model layers to be computed much faster at larger scales. In addition to training, we've also enabled TP inference, which is crucial for generating text or doing reinforcement learning when your model doesn't fit on a single GPU. To learn more on how to define a TP model, take a look [here](https://pytorch.org/docs/stable/distributed.tensor.parallel.html).

Multinode training support (2301)

Multinode finetuning is now supported, allowing you to train larger models faster. Using SLURM you can launch `tune run` across multiple nodes and train just as you would now on a single machine. We include one example slurm recipe and a tutorial for getting started [here](https://pytorch.org/torchtune/main/tutorials/multinode.html).

Full Distributed DPO recipe (2275)

We've had DPO support for some time but we've now added the ability to train DPO using all of the distributed goodies that we've had and those listed above. This improves our coverage of recipes that you can use on the increasing number of 70B+ models. To finetune Llama 3.1 8B with Full Distributed DPO, you can run:

Download Llama 3.1 8B
tune download meta-llama/Meta-Llama-3.1-8B-Instruct --ignore-patterns "original/consolidated.00.pth"

Finetune on four devices
tune run --nnodes 1 --nproc_per_node 4 full_dpo_distributed --config llama3_1/8B_full_dpo

A special thanks to sam-pi for adding this recipe.

Phi 4 models (1835)

We now support Phi 4! This includes the 14B model for now, with recipes for full, LoRA, and QLoRA finetuning on one or more devices. For example, you can full finetune Phi 4 14B on a single GPU by running:

Download Phi 4 14B
tune download microsoft/phi-4

pip install bits and bytes
pip install bitsandbytes

Finetune on a single GPU
tune run full_finetune_single_device --config phi4/14B_full_low_memory

A huge thanks to krammnic for landing these models!

Improved NPU support (2234)

We are continuing to improve our support for Ascend NPU devices. This release includes fixes and enhancements to give you better performance with the NPU backend. Thank you to Nicorgi for the help!

What's Changed
* Small readme, config updates by ebsmothers in https://github.com/pytorch/torchtune/pull/2157
* Using `FormattedCheckpointFiles` in configs by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2147
* Move ``get_world_size_and_rank`` to utils by joecummings in https://github.com/pytorch/torchtune/pull/2155
* Faster intermediate checkpoints with DCP async save in TorchTune by saumishr in https://github.com/pytorch/torchtune/pull/2006
* torchdata integration - multi-dataset and streaming support by andrewkho in https://github.com/pytorch/torchtune/pull/1929
* Allow higher version of lm-eval by joecummings in https://github.com/pytorch/torchtune/pull/2165
* Using `FormattedCheckpointFiles` in configs... round 2 by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2167
* [EZ] Fix set_torch_num_threads in multi-node. by EugenHotaj in https://github.com/pytorch/torchtune/pull/2164
* Fix `adapter_config.json` saving in DPO recipes by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2162
* Fix excessive QAT warning by andrewor14 in https://github.com/pytorch/torchtune/pull/2174
* Add output dir to top of all configs by ebsmothers in https://github.com/pytorch/torchtune/pull/2183
* change saving logic by felipemello1 in https://github.com/pytorch/torchtune/pull/2182
* output_dir not in ckpt dir by felipemello1 in https://github.com/pytorch/torchtune/pull/2181
* Set teacher ckptr output_dir to match student in KD configs by ebsmothers in https://github.com/pytorch/torchtune/pull/2185
* raise compile error by felipemello1 in https://github.com/pytorch/torchtune/pull/2188
* Update DPO Max Seq Len by pbontrager in https://github.com/pytorch/torchtune/pull/2176
* Llama3.2 3B eval by ReemaAlzaid in https://github.com/pytorch/torchtune/pull/2186
* Update typo in docstring for _generation.get_causal_mask_from_padding… by psoulos in https://github.com/pytorch/torchtune/pull/2187
* new docs for checkpointing by felipemello1 in https://github.com/pytorch/torchtune/pull/2189
* Update E2E Tutorial w/ vLLM and HF Hub by joecummings in https://github.com/pytorch/torchtune/pull/2192
* pytorch/torchtune/tests/torchtune/modules/_export by gmagogsfm in https://github.com/pytorch/torchtune/pull/2179
* update torchtune version by felipemello1 in https://github.com/pytorch/torchtune/pull/2195
* [metric_logging][wandb] Fix wandb metric logger config save path by akashc1 in https://github.com/pytorch/torchtune/pull/2196
* Add evaluation file for code_llama2 model by ReemaAlzaid in https://github.com/pytorch/torchtune/pull/2209
* Adds message_transform link from SFTDataset docstring to docs by thomasjpfan in https://github.com/pytorch/torchtune/pull/2219
* Change alpaca_dataset train_on_input doc to match default value by mirceamironenco in https://github.com/pytorch/torchtune/pull/2227
* Set default value for 'subset' parameter in the_cauldron_dataset by Ankur-singh in https://github.com/pytorch/torchtune/pull/2228
* Add eval config for QWEN2_5 model using 0.5B variant by Ankur-singh in https://github.com/pytorch/torchtune/pull/2230
* T5 Encoder by calvinpelletier in https://github.com/pytorch/torchtune/pull/2069
* Migrate distributed state dict API by mori360 in https://github.com/pytorch/torchtune/pull/2138
* Flux Autoencoder by calvinpelletier in https://github.com/pytorch/torchtune/pull/2098
* Fix gradient scaling to account for world_size normalization by mirceamironenco in https://github.com/pytorch/torchtune/pull/2172
* [Small fix] Update CUDA version in README by acisseJZhong in https://github.com/pytorch/torchtune/pull/2242
* Adds clip_grad_norm to all recipe config that supports it by thomasjpfan in https://github.com/pytorch/torchtune/pull/2220
* llama 3.1 has correct `max_seq_len` for all versions by akashc1 in https://github.com/pytorch/torchtune/pull/2203
* Log grad norm aggregated over all ranks, not just rank zero by ebsmothers in https://github.com/pytorch/torchtune/pull/2248
* Remove example inputs from aoti_compile_and_package by angelayi in https://github.com/pytorch/torchtune/pull/2244
* Fix issue 2243, update the document to show correct usage by insop in https://github.com/pytorch/torchtune/pull/2252
* [EZ] Fix config bug where interpolation happens too early by EugenHotaj in https://github.com/pytorch/torchtune/pull/2236
* Small formatting fix by krammnic in https://github.com/pytorch/torchtune/pull/2256
* Multi-tile support in vision rope by RdoubleA in https://github.com/pytorch/torchtune/pull/2247
* Add AlpacaToMessages to message transforms doc page by AndrewMead10 in https://github.com/pytorch/torchtune/pull/2265
* Add a "division by zero" check in chunked loss handling in kd_losses.py by insop in https://github.com/pytorch/torchtune/pull/2239
* Fixing docstring linter by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2163
* PPO Performance Improvements by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2066
* Add Ascend NPU as a backend for single device recipes by Nicorgi in https://github.com/pytorch/torchtune/pull/2234
* Fix tests due to upgrade to cuda126 by acisseJZhong in https://github.com/pytorch/torchtune/pull/2260
* Fix a bug in set float32 precision by Nicorgi in https://github.com/pytorch/torchtune/pull/2271
* Construct EarlyFusion's encoder_token_ids on correct device by ebsmothers in https://github.com/pytorch/torchtune/pull/2276
* Sample packing for ConcatDataset by ebsmothers in https://github.com/pytorch/torchtune/pull/2278
* Added Distributed(Tensor Parallel) Inference Recipe by acisseJZhong in https://github.com/pytorch/torchtune/pull/2245
* Logging resolved config by Ankur-singh in https://github.com/pytorch/torchtune/pull/2274
* Removing `SimPOLoss` by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2290
* Proper prefix handling in EarlyFusion sd hooks by ebsmothers in https://github.com/pytorch/torchtune/pull/2291
* Remove deprecated components for 0.6.0 by RdoubleA in https://github.com/pytorch/torchtune/pull/2293
* Update the e2e flow tutorial to fix errors of generate by iseeyuan in https://github.com/pytorch/torchtune/pull/2251
* profiling ops on xpu by songhappy in https://github.com/pytorch/torchtune/pull/2249
* Refactored modules/tokenizers to be a subdir of modules/transforms by Ankur-singh in https://github.com/pytorch/torchtune/pull/2231
* Update model builders by Ankur-singh in https://github.com/pytorch/torchtune/pull/2282
* [EZ] Only log deprecation warning on rank zero by RdoubleA in https://github.com/pytorch/torchtune/pull/2308
* [ez] Add output_dir field to a couple configs by ebsmothers in https://github.com/pytorch/torchtune/pull/2309
* Disable DSD and fix bitsandbytes test by RdoubleA in https://github.com/pytorch/torchtune/pull/2314
* fix state dict hook for early fusion models by acisseJZhong in https://github.com/pytorch/torchtune/pull/2317
* Adding reverse and symmetric KLD losses by insop in https://github.com/pytorch/torchtune/pull/2094
* [WIP] 'tune cat' command for pretty printing configuration files by Ankur-singh in https://github.com/pytorch/torchtune/pull/2298
* Use checkoutv4 / uploadv4 for docs build by joecummings in https://github.com/pytorch/torchtune/pull/2322
* Fix stop tokens in PPO by RedTachyon in https://github.com/pytorch/torchtune/pull/2304
* Update PT pin for modules/_export by Jack-Khuu in https://github.com/pytorch/torchtune/pull/2336
* Update to proper EOS ids for Qwen2 and Qwen2.5 by joecummings in https://github.com/pytorch/torchtune/pull/2342
* Multinode support in torchtune by joecummings in https://github.com/pytorch/torchtune/pull/2301
* [Bug Fix]Disable DSD for saving ckpt by acisseJZhong in https://github.com/pytorch/torchtune/pull/2346
* Update README for multinode by joecummings in https://github.com/pytorch/torchtune/pull/2348
* added `tie_word_embeddings` to llama3_2 models by jingzhaoou in https://github.com/pytorch/torchtune/pull/2331
* Fix saving adapter weights after disabling DSD by acisseJZhong in https://github.com/pytorch/torchtune/pull/2351
* Remove "ft-" prefix from checkpoint shards. by EugenHotaj in https://github.com/pytorch/torchtune/pull/2354
* Full DPO Distributed by sam-pi in https://github.com/pytorch/torchtune/pull/2275
* [Fix Test] Fix failed generation test by pining pytorch nightlies by acisseJZhong in https://github.com/pytorch/torchtune/pull/2362
* TP + FSDP distributed training (full finetuning) by acisseJZhong in https://github.com/pytorch/torchtune/pull/2330
* Add max-autotune try/except if flex attn breaks by felipemello1 in https://github.com/pytorch/torchtune/pull/2357
* readme updates for full DPO distributed recipe by ebsmothers in https://github.com/pytorch/torchtune/pull/2363
* Fix Qwen config by acisseJZhong in https://github.com/pytorch/torchtune/pull/2377
* feat: Added cfg.cudnn_deterministic_mode flag by bogdansalyp in https://github.com/pytorch/torchtune/pull/2367
* Add Phi4 by krammnic in https://github.com/pytorch/torchtune/pull/2197
* Add tests and implementation for disabling dropout layers in models by Ankur-singh in https://github.com/pytorch/torchtune/pull/2378
* nit: Phi4 to readme by krammnic in https://github.com/pytorch/torchtune/pull/2383
* Implements MLFlowLogger by nathan-az in https://github.com/pytorch/torchtune/pull/2365
* 'ft-' prefix occurrence removal by rajuptvs in https://github.com/pytorch/torchtune/pull/2385
* check if log_dir is not none by felipemello1 in https://github.com/pytorch/torchtune/pull/2389
* HF tokenizers: initial base tokenizer support by ebsmothers in https://github.com/pytorch/torchtune/pull/2350
* Simplify README and prominently display recipes by joecummings in https://github.com/pytorch/torchtune/pull/2349
* Renamed parallelize_plan to tensor_parallel_plan by pbontrager in https://github.com/pytorch/torchtune/pull/2387
* Fix optimizer_in_backward at loading opt_state_dict in distributed recipes by mori360 in https://github.com/pytorch/torchtune/pull/2390
* Add core dependency on stable torchdata (2408) by pbontrager in https://github.com/pytorch/torchtune/pull/2509

New Contributors
* saumishr made their first contribution in https://github.com/pytorch/torchtune/pull/2006
* andrewkho made their first contribution in https://github.com/pytorch/torchtune/pull/1929
* EugenHotaj made their first contribution in https://github.com/pytorch/torchtune/pull/2164
* ReemaAlzaid made their first contribution in https://github.com/pytorch/torchtune/pull/2186
* psoulos made their first contribution in https://github.com/pytorch/torchtune/pull/2187
* gmagogsfm made their first contribution in https://github.com/pytorch/torchtune/pull/2179
* akashc1 made their first contribution in https://github.com/pytorch/torchtune/pull/2196
* acisseJZhong made their first contribution in https://github.com/pytorch/torchtune/pull/2242
* angelayi made their first contribution in https://github.com/pytorch/torchtune/pull/2244
* insop made their first contribution in https://github.com/pytorch/torchtune/pull/2252
* AndrewMead10 made their first contribution in https://github.com/pytorch/torchtune/pull/2265
* Nicorgi made their first contribution in https://github.com/pytorch/torchtune/pull/2234
* jingzhaoou made their first contribution in https://github.com/pytorch/torchtune/pull/2331
* sam-pi made their first contribution in https://github.com/pytorch/torchtune/pull/2275
* bogdansalyp made their first contribution in https://github.com/pytorch/torchtune/pull/2367
* nathan-az made their first contribution in https://github.com/pytorch/torchtune/pull/2365
* rajuptvs made their first contribution in https://github.com/pytorch/torchtune/pull/2385

**Full Changelog**: https://github.com/pytorch/torchtune/compare/v0.5.0...v0.6.0

0.5.0

Highlights

We are releasing torchtune v0.5.0 with lots of exciting new features! This includes Kaggle integration, a QAT + LoRA training recipe, improved integrations with Hugging Face and vLLM, Gemma2 models, a recipe enabling finetuning for LayerSkip via early exit, and support for NPU devices.

Kaggle integration (2002)

torchtune is proud to announce our integration with Kaggle! You can now finetune all your favorite models using torchtune in Kaggle notebooks with Kaggle model hub integration. Download a model from the Kaggle Hub, finetune on your dataset with any torchtune recipe, then pick your best model and upload your best checkpoint to the Kaggle Hub to share with the community. Check out our example Kaggle notebook [here](https://www.kaggle.com/code/felipemello/torchtune-in-kaggle) to get started!

QAT + LoRA training recipe (1931)

If you've seen the [Llama 3.2 quantized models](https://ai.meta.com/blog/meta-llama-quantized-lightweight-models/), you may know that they were trained using quantization-aware training with LoRA adapters. This is an effective way to maintain good model performance when you need to quantize for on-device inference. Now you can train your own quant-friendly LoRA models in torchtune with our QAT + LoRA recipe!

To finetune Llama 3.2 3B with QAT + LoRA, you can run:

Download Llama 3.2 3B
tune download meta-llama/Llama-3.2-3B-Instruct --ignore-patterns "original/consolidated.00.pth"

Finetune on two devices
tune run --nproc_per_node 2 qat_lora_finetune_distributed --config llama3_2/3B_qat_lora

Improved Hugging Face and vLLM integration (2074)

We heard your feedback, and we're happy to say that it's now easier than ever to load your torchtune models into Hugging Face or vLLM! It's as simple as:

from transformers import AutoModelForCausalLM

trained_model_path = "/path/to/my/torchtune/checkpoint"

model = AutoModelForCausalLM.from_pretrained(
pretrained_model_name_or_path=trained_model_path,
)

See the full examples in our docs! [Hugging Face](https://pytorch.org/torchtune/main/tutorials/e2e_flow.html#use-with-hugging-face-from-pretrained), [vLLM](https://pytorch.org/torchtune/main/tutorials/e2e_flow.html#use-with-vllm)

Gemma 2 models (1835)

We now support models from the Gemma 2 family! This includes the 2B, 9B, and 27B sizes, with recipes for full, LoRA, and QLoRA finetuning on one or more devices. For example, you can finetune Gemma 2 27B with QLoRA by running:

Download Gemma 2 27B
tune download google/gemma-2-27b --ignore-patterns "gemma-2-27b.gguf"

Finetune on a single GPU
tune run lora_finetune_single_device --config gemma2/27B_qlora_single_device

A huge thanks to Optimox for landing these models!

Early exit training recipe (1076)

[LayerSkip](https://arxiv.org/abs/2404.16710) is an end-to-end solution to speed up LLM inference. By combining layer dropout with an appropriate dropout schedule and using an early exit loss during training, you can increase the accuracy of early exit at inference time. You can use our early exit config to reproduce experiments from LayerSkip, LayerDrop, and other papers.

You can try torchtune's early exit recipe by running the following:

Download Llama2 7B
tune download meta-llama/Llama-2-7b-hf --output-dir /tmp/Llama-2-7b-hf

Finetune with early exit on four devices
tune run --nnodes 1 --nproc_per_node 4 dev/early_exit_finetune_distributed --config recipes/dev/7B_full_early_exit.yaml

NPU support (1826)

We are excited to share that torchtune can now be used on Ascend NPU devices! All your favorite single-device recipes can be run as-is, with support for distributed recipes coming later. A huge thanks to noemotiovon for their work to enable this!

What's Changed
* nit: Correct compile_loss return type hint by bradhilton in https://github.com/pytorch/torchtune/pull/1940
* Fix grad accum + FSDP CPU offload, pass None via CLI by ebsmothers in https://github.com/pytorch/torchtune/pull/1941
* QAT tutorial nit by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1945
* A more encompassing fix for offloading + ac by janeyx99 in https://github.com/pytorch/torchtune/pull/1936
* Add Qwen2.5 to live docs by RdoubleA in https://github.com/pytorch/torchtune/pull/1949
* [Bug] model_type argument as str for checkpoints classes by smujjiga in https://github.com/pytorch/torchtune/pull/1946
* llama3.2 90b config updates + nits by RdoubleA in https://github.com/pytorch/torchtune/pull/1950
* Add Ascend NPU as a backend by noemotiovon in https://github.com/pytorch/torchtune/pull/1826
* fix missing key by felipemello1 in https://github.com/pytorch/torchtune/pull/1952
* update memory optimization tutorial by felipemello1 in https://github.com/pytorch/torchtune/pull/1948
* update configs by felipemello1 in https://github.com/pytorch/torchtune/pull/1954
* add expandable segment to integration tests by felipemello1 in https://github.com/pytorch/torchtune/pull/1963
* Fix check in `load_from_full_state_dict` for modified state dicts by RylanC24 in https://github.com/pytorch/torchtune/pull/1967
* Update torchtune generation to be more flexible by RylanC24 in https://github.com/pytorch/torchtune/pull/1970
* feat: add gemma2b variants by Optimox in https://github.com/pytorch/torchtune/pull/1835
* typo by felipemello1 in https://github.com/pytorch/torchtune/pull/1972
* Update QAT: add grad clipping, torch.compile, collate fn by andrewor14 in https://github.com/pytorch/torchtune/pull/1854
* VQA Documentation by calvinpelletier in https://github.com/pytorch/torchtune/pull/1974
* Convert all non-rgb images to rgb by vancoykendall in https://github.com/pytorch/torchtune/pull/1976
* Early fusion multimodal models by RdoubleA in https://github.com/pytorch/torchtune/pull/1904
* Refactor Recipe State Dict Code by pbontrager in https://github.com/pytorch/torchtune/pull/1964
* Update KV Cache to use num_kv_heads instead of num_heads by mirceamironenco in https://github.com/pytorch/torchtune/pull/1961
* Migrate to ``epochs: 1`` in all configs by thomasjpfan in https://github.com/pytorch/torchtune/pull/1981
* Make sure CLIP resized pos_embed is contiguous by gau-nernst in https://github.com/pytorch/torchtune/pull/1986
* Add **quantization_kwargs to ``FrozenNF4Linear`` and ``LoRALinear`` and ``DoRALinear`` by joecummings in https://github.com/pytorch/torchtune/pull/1987
* Enables Python 3.13 for nightly builds by thomasjpfan in https://github.com/pytorch/torchtune/pull/1988
* DOC Fixes custom message transform example by thomasjpfan in https://github.com/pytorch/torchtune/pull/1983
* Pass quantization_kwargs to CLIP builders by joecummings in https://github.com/pytorch/torchtune/pull/1994
* Adding MM eval tests / attention bugfixes by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1989
* Update Qwen2.5 configs by joecummings in https://github.com/pytorch/torchtune/pull/1999
* nit: Fix/add some type annotations by bradhilton in https://github.com/pytorch/torchtune/pull/1982
* Fixing `special_tokens` arg in `Llama3VisionTransform` by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2000
* Recent updates to the README by joecummings in https://github.com/pytorch/torchtune/pull/1979
* Bump version to 0.5.0 by joecummings in https://github.com/pytorch/torchtune/pull/2009
* gemma2 had wrong path to scheduler by felipemello1 in https://github.com/pytorch/torchtune/pull/2013
* Create _export directory in torchtune by Jack-Khuu in https://github.com/pytorch/torchtune/pull/2011
* torchrun defaults for concurrent distributed training jobs by ebsmothers in https://github.com/pytorch/torchtune/pull/2015
* Remove unused FSDP components by ebsmothers in https://github.com/pytorch/torchtune/pull/2016
* 2D RoPE + CLIP updates by RdoubleA in https://github.com/pytorch/torchtune/pull/1973
* Some KD recipe cleanup by ebsmothers in https://github.com/pytorch/torchtune/pull/2020
* Remove lr_scheduler requirement in lora_dpo_single_device by thomasjpfan in https://github.com/pytorch/torchtune/pull/1991
* chore: remove PyTorch 2.5.0 checks by JP-sDEV in https://github.com/pytorch/torchtune/pull/1877
* Make tokenize tests readable by krammnic in https://github.com/pytorch/torchtune/pull/1868
* add flags to readme by felipemello1 in https://github.com/pytorch/torchtune/pull/2003
* Support for unsharded parameters in state_dict APIs by RdoubleA in https://github.com/pytorch/torchtune/pull/2023
* [WIP] Reducing eval vision tests runtime by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2022
* log rank zero everywhere by RdoubleA in https://github.com/pytorch/torchtune/pull/2030
* Add LR Scheduler to full finetune distributed by parthsarthi03 in https://github.com/pytorch/torchtune/pull/2017
* Fix Qlora/lora for 3.2 vision by felipemello1 in https://github.com/pytorch/torchtune/pull/2028
* CLIP Text Encoder by calvinpelletier in https://github.com/pytorch/torchtune/pull/1969
* feat(cli): allow users to download models from Kaggle by KeijiBranshi in https://github.com/pytorch/torchtune/pull/2002
* remove default to ignore safetensors by felipemello1 in https://github.com/pytorch/torchtune/pull/2042
* Remove deprecated `TiedEmbeddingTransformerDecoder` by EmilyIsCoding in https://github.com/pytorch/torchtune/pull/2047
* Use hf transfer as default by felipemello1 in https://github.com/pytorch/torchtune/pull/2046
* Fix issue in loading mixed precision vocab pruned models during torchtune generation for evaluation by ifed-ucsd in https://github.com/pytorch/torchtune/pull/2043
* [export] Add exportable attention and kv cache by larryliu0820 in https://github.com/pytorch/torchtune/pull/2049
* Switch to PyTorch's built-in RMSNorm by calvinpelletier in https://github.com/pytorch/torchtune/pull/2054
* [export] Add exportable position embedding by larryliu0820 in https://github.com/pytorch/torchtune/pull/2068
* MM Docs nits by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2067
* Add support for QAT + LoRA by andrewor14 in https://github.com/pytorch/torchtune/pull/1931
* Add ability to shard custom layers for DPO and LoRA distributed by joecummings in https://github.com/pytorch/torchtune/pull/2072
* [ez] remove stale pytorch version check by ebsmothers in https://github.com/pytorch/torchtune/pull/2075
* Fail early with `packed=True` on MM datasets. by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2080
* Error message on ``packed=True`` for stack exchange dataset by joecummings in https://github.com/pytorch/torchtune/pull/2079
* Fix nightly tests for qat_lora_fintune_distributed by andrewor14 in https://github.com/pytorch/torchtune/pull/2085
* Update build_linux_wheels.yaml - Pass test-infra input params by atalman in https://github.com/pytorch/torchtune/pull/2086
* DPO Activation Offloading by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2087
* Deprecate `SimpoLoss` by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2063
* DPO Recipe Doc by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2091
* initial commit by songhappy in https://github.com/pytorch/torchtune/pull/1953
* Vector Quantized Embeddings by RdoubleA in https://github.com/pytorch/torchtune/pull/2040
* Fix bug in loading multimodal datasets and update tests accordingly by joecummings in https://github.com/pytorch/torchtune/pull/2110
* Set gloo process group for FSDP with CPU offload by ebsmothers in https://github.com/pytorch/torchtune/pull/2108
* Llama 3.3 70B by pbontrager in https://github.com/pytorch/torchtune/pull/2124
* Llama 3.3 readme updates by ebsmothers in https://github.com/pytorch/torchtune/pull/2125
* update configs by felipemello1 in https://github.com/pytorch/torchtune/pull/2107
* Reduce logging output for distributed KD by joecummings in https://github.com/pytorch/torchtune/pull/2120
* Support Early Exit Loss and/or Layer Dropout by mostafaelhoushi in https://github.com/pytorch/torchtune/pull/1076
* Update checkpointing directory -> using vLLM and from_pretrained by felipemello1 in https://github.com/pytorch/torchtune/pull/2074
* pass correct arg by felipemello1 in https://github.com/pytorch/torchtune/pull/2127
* update configs by felipemello1 in https://github.com/pytorch/torchtune/pull/2128
* fix qat_lora_test by felipemello1 in https://github.com/pytorch/torchtune/pull/2131
* guard ckpt imports by felipemello1 in https://github.com/pytorch/torchtune/pull/2133
* [bug fix] add parents=True by felipemello1 in https://github.com/pytorch/torchtune/pull/2136
* [bug fix] re-add model by felipemello1 in https://github.com/pytorch/torchtune/pull/2135
* Update save sizes into GiB by joecummings in https://github.com/pytorch/torchtune/pull/2143
* [bug fix] remove config download when source is kaggle by felipemello1 in https://github.com/pytorch/torchtune/pull/2144
* [fix] remove "with_suffix" by felipemello1 in https://github.com/pytorch/torchtune/pull/2146
* DoRA fixes by ebsmothers in https://github.com/pytorch/torchtune/pull/2139
* [Fix] Llama 3.2 Vision decoder_trainable flag fixed by pbontrager in https://github.com/pytorch/torchtune/pull/2150

New Contributors
* bradhilton made their first contribution in https://github.com/pytorch/torchtune/pull/1940
* smujjiga made their first contribution in https://github.com/pytorch/torchtune/pull/1946
* noemotiovon made their first contribution in https://github.com/pytorch/torchtune/pull/1826
* RylanC24 made their first contribution in https://github.com/pytorch/torchtune/pull/1967
* vancoykendall made their first contribution in https://github.com/pytorch/torchtune/pull/1976
* Jack-Khuu made their first contribution in https://github.com/pytorch/torchtune/pull/2011
* JP-sDEV made their first contribution in https://github.com/pytorch/torchtune/pull/1877
* KeijiBranshi made their first contribution in https://github.com/pytorch/torchtune/pull/2002
* EmilyIsCoding made their first contribution in https://github.com/pytorch/torchtune/pull/2047
* ifed-ucsd made their first contribution in https://github.com/pytorch/torchtune/pull/2043
* larryliu0820 made their first contribution in https://github.com/pytorch/torchtune/pull/2049
* atalman made their first contribution in https://github.com/pytorch/torchtune/pull/2086
* songhappy made their first contribution in https://github.com/pytorch/torchtune/pull/1953
* mostafaelhoushi made their first contribution in https://github.com/pytorch/torchtune/pull/1076

**Full Changelog**: https://github.com/pytorch/torchtune/compare/v0.4.0...v0.5.0

0.4.0

Highlights

Today we release v0.4.0 of torchtune with some exciting new additions! Some notable ones include full support for activation offloading, recipes for Llama3.2V 90B and QLoRA variants, new documentation, and Qwen2.5 models!

Activation offloading (1443, 1645, 1847)

Activation offloading is a memory-saving technique that asynchronously moves checkpointed activations that are not currently running to the CPU. Right before the GPU needs the activations for the microbatch’s backward pass, this functionality prefetches the offloaded activations back from the CPU. Enabling this functionality is as easy as setting the following options in your config:

yaml
enable_activation_checkpointing: True
enable_activation_offloading: True

In [experiments with Llama3 8B](https://github.com/pytorch/torchtune/pull/1443), activation offloading used roughly 24% less memory while inflicting a performance slowdown of under 1%.
Llama3.2V 90B with QLoRA (1880, 1726)

We added model builders and configs for the 90B version of Llama3.2V, which [outperforms the 11B version of the model](https://huggingface.co/meta-llama/Llama-3.2-90B-Vision#base-pretrained-models) across common benchmarks. Because this model size is larger, we also added the ability to run the model using QLoRA and FSDP2.

bash
Download the model first
tune download meta-llama/Llama-3.2-90B-Vision-Instruct --ignore-patterns "original/consolidated*"
Run with e.g. 4 GPUs
tune run --nproc_per_node 4 lora_finetune_distributed --config llama3_2_vision/90B_qlora

Qwen2.5 model family has landed (1863)

We added builders for [Qwen2.5](https://qwenlm.github.io/blog/qwen2.5/), the cutting-edge models from the Qwen family of models! In their own words "Compared to Qwen2, Qwen2.5 has acquired significantly more knowledge (MMLU: 85+) and has greatly improved capabilities in coding (HumanEval 85+) and mathematics (MATH 80+)."

Get started with the models easily:

bash
tune download Qwen/Qwen2.5-1.5B-Instruct --ignore-patterns None
tune run lora_finetune_single_device --config qwen2_5/1.5B_lora_single_device

New documentation on using custom recipes, configs, and components (1910)

We heard your feedback and wrote up a simple page on how to customize configs, recipes, and individual components! Check it out [here](https://pytorch.org/torchtune/main/basics/custom_components.html)

What's Changed
* Fix PackedDataset bug for seq_len > 2 * max_seq_len setting. by mirceamironenco in https://github.com/pytorch/torchtune/pull/1697
* Bump version 0.3.1 by joecummings in https://github.com/pytorch/torchtune/pull/1720
* Add error propagation to distributed run. by mirceamironenco in https://github.com/pytorch/torchtune/pull/1719
* Update fusion layer counting logic for Llama 3.2 weight conversion by ebsmothers in https://github.com/pytorch/torchtune/pull/1722
* Resizable image positional embeddings by felipemello1 in https://github.com/pytorch/torchtune/pull/1695
* Unpin numpy by ringohoffman in https://github.com/pytorch/torchtune/pull/1728
* Add HF Checkpoint Format Support for Llama Vision by pbontrager in https://github.com/pytorch/torchtune/pull/1727
* config changes by felipemello1 in https://github.com/pytorch/torchtune/pull/1733
* Fix custom imports for both distributed and single device by RdoubleA in https://github.com/pytorch/torchtune/pull/1731
* Pin urllib3<2.0.0 to fix eleuther eval errors by RdoubleA in https://github.com/pytorch/torchtune/pull/1738
* Fixing recompiles in KV-cache + compile by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1663
* Fix CLIP pos embedding interpolation to work on DTensors by ebsmothers in https://github.com/pytorch/torchtune/pull/1739
* Bump version to 0.4.0 by RdoubleA in https://github.com/pytorch/torchtune/pull/1748
* [Feat] Activation offloading for distributed lora recipe by Jackmin801 in https://github.com/pytorch/torchtune/pull/1645
* Add LR Scheduler to single device full finetune by user074 in https://github.com/pytorch/torchtune/pull/1350
* Custom recipes use slash path by RdoubleA in https://github.com/pytorch/torchtune/pull/1760
* Adds __repr__ to Message by thomasjpfan in https://github.com/pytorch/torchtune/pull/1757
* Fix save adapter weights only by ebsmothers in https://github.com/pytorch/torchtune/pull/1764
* Set drop_last to always True by RdoubleA in https://github.com/pytorch/torchtune/pull/1761
* Remove nonexistent flag for acc offloading in memory_optimizations.rst by janeyx99 in https://github.com/pytorch/torchtune/pull/1772
* [BUGFIX] Adding sequence truncation to `max_seq_length` in eval recipe by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1773
* Add ROCm "support" by joecummings in https://github.com/pytorch/torchtune/pull/1765
* [BUG] Include system prompt in Phi3 by default by joecummings in https://github.com/pytorch/torchtune/pull/1778
* Fixing quantization in eval recipe by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1777
* Delete deprecated ChatDataset and InstructDataset by joecummings in https://github.com/pytorch/torchtune/pull/1781
* Add split argument to required builders and set it default value to "train" by krammnic in https://github.com/pytorch/torchtune/pull/1783
* Fix quantization with generate by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1784
* Fix typo in multimodal_datasets.rst by krammnic in https://github.com/pytorch/torchtune/pull/1787
* Make AlpacaToMessage public. by krammnic in https://github.com/pytorch/torchtune/pull/1785
* Fix misleading attn_dropout docstring by ebsmothers in https://github.com/pytorch/torchtune/pull/1792
* Add filter_fn to all generic dataset classes and builders API by krammnic in https://github.com/pytorch/torchtune/pull/1789
* Set dropout in SDPA to 0.0 when not in training mode by ebsmothers in https://github.com/pytorch/torchtune/pull/1803
* Skip entire header for llama3 decode by RdoubleA in https://github.com/pytorch/torchtune/pull/1656
* Remove unused bsz variable by zhangtemplar in https://github.com/pytorch/torchtune/pull/1805
* Adding `max_seq_length` to vision eval config by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1802
* Add check that there is no PackedDataset while building ConcatDataset by krammnic in https://github.com/pytorch/torchtune/pull/1796
* Add posibility to pack in _wikitext.py by krammnic in https://github.com/pytorch/torchtune/pull/1807
* Add evaluation configs under qwen2 dir by joecummings in https://github.com/pytorch/torchtune/pull/1809
* Fix eos_token problem in all required models by krammnic in https://github.com/pytorch/torchtune/pull/1806
* Deprecating `TiedEmbeddingTransformerDecoder` by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1815
* Torchao version check changes/BC import of TensorCoreTiledLayout by ebsmothers in https://github.com/pytorch/torchtune/pull/1812
* 1810 move gemma evaluation by malinjawi in https://github.com/pytorch/torchtune/pull/1819
* Consistent type checks for prepend and append tags. by krammnic in https://github.com/pytorch/torchtune/pull/1824
* Move schedulers to training from modules. by krammnic in https://github.com/pytorch/torchtune/pull/1801
* Update EleutherAI Eval Harness to v0.4.5 by joecummings in https://github.com/pytorch/torchtune/pull/1800
* 1810 Add evaluation configs under phi3 dir by Harthi7 in https://github.com/pytorch/torchtune/pull/1822
* Create CITATION.cff by joecummings in https://github.com/pytorch/torchtune/pull/1756
* fixed error message for GatedRepoError by DawiAlotaibi in https://github.com/pytorch/torchtune/pull/1832
* 1810 Move mistral evaluation by Yousof-kayal in https://github.com/pytorch/torchtune/pull/1829
* More consistent trace names. by krammnic in https://github.com/pytorch/torchtune/pull/1825
* fbcode using TensorCoreLayout by jerryzh168 in https://github.com/pytorch/torchtune/pull/1834
* Remove pad_max_tiles in CLIP by pbontrager in https://github.com/pytorch/torchtune/pull/1836
* Remove pad_max_tiles in CLIP inference by lucylq in https://github.com/pytorch/torchtune/pull/1853
* Add ``vqa_dataset``, update docs by krammnic in https://github.com/pytorch/torchtune/pull/1820
* Add offloading tests and fix obscure edge case by janeyx99 in https://github.com/pytorch/torchtune/pull/1860
* Toggling KV-caches by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1763
* Cacheing doc nits by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1876
* LoRA typo fix + bias=True by felipemello1 in https://github.com/pytorch/torchtune/pull/1881
* Correct `torchao` check for `TensorCoreTiledLayout` by joecummings in https://github.com/pytorch/torchtune/pull/1886
* Kd_loss avg over tokens by moussaKam in https://github.com/pytorch/torchtune/pull/1885
* Support Optimizer-in-the-backward by mori360 in https://github.com/pytorch/torchtune/pull/1833
* Remove deprecated `GemmaTransformerDecoder` by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1892
* Add PromptTemplate examples by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1891
* Temporarily disable building Python 3.13 version of torchtune by joecummings in https://github.com/pytorch/torchtune/pull/1896
* Block on Python 3.13 version by joecummings in https://github.com/pytorch/torchtune/pull/1898
* [bug] fix sharding multimodal by felipemello1 in https://github.com/pytorch/torchtune/pull/1889
* QLoRA with bias + Llama 3.2 Vision QLoRA configs by ebsmothers in https://github.com/pytorch/torchtune/pull/1726
* Block on Python 3.13 version by joecummings in https://github.com/pytorch/torchtune/pull/1899
* Normalize CE loss by total number of (non-padding) tokens by ebsmothers in https://github.com/pytorch/torchtune/pull/1875
* nit: remove (nightly) in recipes by krammnic in https://github.com/pytorch/torchtune/pull/1882
* Expose packed: False, set log_peak_memory_stats: True, set compile: False by krammnic in https://github.com/pytorch/torchtune/pull/1872
* Remove ChatFormat, InstructTemplate, old message converters by RdoubleA in https://github.com/pytorch/torchtune/pull/1895
* Make TensorCoreTiledLayout import more robust by andrewor14 in https://github.com/pytorch/torchtune/pull/1912
* [ez] Fix README download example by RdoubleA in https://github.com/pytorch/torchtune/pull/1915
* [docs] Custom components page by RdoubleA in https://github.com/pytorch/torchtune/pull/1910
* Update imports after QAT was moved out of prototype by andrewor14 in https://github.com/pytorch/torchtune/pull/1883
* Updating memory optimization overview by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1916
* Patch github link in torchtune docs header by ebsmothers in https://github.com/pytorch/torchtune/pull/1914
* Llama 3.2 Vision - 90B by felipemello1 in https://github.com/pytorch/torchtune/pull/1880
* Fixing DoRA docs, adding to mem opt tutorial by SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1918
* Add KD distributed recipe by lindawangg in https://github.com/pytorch/torchtune/pull/1631
* add missing doc by felipemello1 in https://github.com/pytorch/torchtune/pull/1924
* [FIX] MM Eval Mask Sizes by pbontrager in https://github.com/pytorch/torchtune/pull/1920
* Activation offloading for fullfinetuning + fix tied embedding by felipemello1 in https://github.com/pytorch/torchtune/pull/1847
* Qwen2.5 by calvinpelletier in https://github.com/pytorch/torchtune/pull/1863
* Restore backward after each batch for grad accum by ebsmothers in https://github.com/pytorch/torchtune/pull/1917
* Fix lora single device fine tune checkpoint saving & nan loss when use_dora=True by mirceamironenco in https://github.com/pytorch/torchtune/pull/1909

New Contributors
* ringohoffman made their first contribution in https://github.com/pytorch/torchtune/pull/1728
* Jackmin801 made their first contribution in https://github.com/pytorch/torchtune/pull/1645
* user074 made their first contribution in https://github.com/pytorch/torchtune/pull/1350
* krammnic made their first contribution in https://github.com/pytorch/torchtune/pull/1783
* zhangtemplar made their first contribution in https://github.com/pytorch/torchtune/pull/1805
* malinjawi made their first contribution in https://github.com/pytorch/torchtune/pull/1819
* Harthi7 made their first contribution in https://github.com/pytorch/torchtune/pull/1822
* DawiAlotaibi made their first contribution in https://github.com/pytorch/torchtune/pull/1832
* Yousof-kayal made their first contribution in https://github.com/pytorch/torchtune/pull/1829
* moussaKam made their first contribution in https://github.com/pytorch/torchtune/pull/1885
* mori360 made their first contribution in https://github.com/pytorch/torchtune/pull/1833

**Full Changelog**: https://github.com/pytorch/torchtune/compare/v0.3.1...v0.4.0

0.3.1

Overview
**We've added full support for Llama 3.2** after it was [announced](https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/), and this includes full/LoRA fine-tuning on the Llama3.2-1B, Llama3.2-3B base and instruct text models and Llama3.2-11B-Vision base and instruct text models. **This means we now support the full end-to-end development of VLMs - fine-tuning, inference, and eval!** We've also included a lot more goodies in a few short weeks:

* Llama 3.2 1B/3B/11B Vision configs for full/LoRA fine-tuning
* Updated recipes to support VLMs
* Multimodal eval via EleutherAI
* Support for torch.compile for VLMs
* Revamped generation utilities for multimodal support + batched inference for text only
* New knowledge distillation recipe with configs for Llama3.2 and Qwen2
* Llama 3.1 405B QLoRA fine-tuning on 8xA100s
* MPS support (beta) - you can now use torchtune on Mac!

New Features
Models
* QLoRA with Llama 3.1 405B (1232)
* Llama 3.2 (1679, 1688, 1661)

Multimodal
* Update recipes for multimodal support (1548, 1628)
* Multimodal eval via EleutherAI (1669, 1660)
* Multimodal compile support (1670)
* Exportable multimodal models (1541)

Generation
* Revamped generate recipe with multimodal support (1559, 1563, 1674, 1686)
* Batched inference for text-only models (1424, 1449, 1603, 1622)

Knowledge Distillation
* Add single device KD recipe and configs for Llama 3.2, Qwen2 (1539, 1690)

Memory and Performance
* Compile FFT FSDP (1573)
* Apply rope on k earlier for efficiency (1558)
* Streaming offloading in (q)lora single device (1443)

Quantization
* Update quantization to use tensor subclasses (1403)
* Add int4 weight-only QAT flow targeting tinygemm kernel (1570)

RLHF
* Adding generic preference dataset builder (1623)

Miscellaneous
* Add drop_last to dataloader (1654)
* Add low_cpu_ram config to qlora (1580)
* MPS support (1706)

Documentation
* nits in memory optimizations doc (1585)
* Tokenizer and prompt template docs (1567)
* Latexifying IPOLoss docs (1589)
* modules doc updates (1588)
* More doc nits (1611)
* update docs (1602)
* Update llama3 chat tutorial (1608)
* Instruct and chat datasets docs (1571)
* Preference dataset docs (1636)
* Messages and message transforms docs (1574)
* Readme Updates (1664)
* Model transform docs (1665)
* Multimodal dataset builder + docs (1667)
* Datasets overview docs (1668)
* Update README.md (1676)
* Readme updates for Llama 3.2 (1680)
* Add 3.2 models to README (1683)
* Knowledge distillation tutorial (1698)
* Text completion dataset docs (1696)

Quality-of-Life Improvements
* Set possible resolutions to debug, not info (1560)
* Remove TiedEmbeddingTransformerDecoder from Qwen (1547)
* Make Gemma use regular TransformerDecoder (1553)
* llama 3_1 instantiate pos embedding only once (1554)
* Run unit tests against PyTorch nightlies as part of our nightly CI (1569)
* Support load_dataset kwargs in other dataset builders (1584)
* add fused = true to adam, except pagedAdam (1575)
* Move RLHF out of modules (1591)
* Make logger only log on rank0 for Phi3 loading errors (1599)
* Move rlhf tests out of modules (1592)
* Update PR template (1614)
* Update ``get_unmasked_sequence_lengths`` example 4 release (1613)
* remove ipo loss + small fixed (1615)
* Fix dora configs (1618)
* Remove unused var in generate (1612)
* remove deprecated message (1619)
* Fix qwen2 config (1620)
* Proper names for dataset types (1625)
* Make ``q`` optional in ``sample`` (1637)
* Rename `JSONToMessages` to `OpenAIToMessages` (1643)
* update gemma to ignore gguf (1655)
* Add Pillow >= 9.4 requirement (1671)
* guard import (1684)
* add upgrade to pip command (1687)
* Do not run CI on forked repos (1681)

Bug Fixes
* Fix flex attention test (1568)
* Add `eom_id` to Llama3 Tokenizer (1586)
* Only merge model weights in LoRA recipe when `save_adapter_weights_only=False` (1476)
* Hotfix eval recipe (1594)
* Fix typo in PPO recipe (1607)
* Fix lora_dpo_distributed recipe (1609)
* Fixes for MM Masking and Collation (1601)
* delete duplicate LoRA dropout fields in DPO configs (1583)
* Fix tune download command in PPO config (1593)
* Fix tune run not identifying custom components (1617)
* Fix compile error in `get_causal_mask_from_padding_mask` (1627)
* Fix eval recipe bug for group tasks (1642)
* Fix basic tokenizer no special tokens (1640)
* add BlockMask to batch_to_device (1651)
* Fix PACK_TYPE import in collate (1659)
* Fix llava_instruct_dataset (1658)
* convert rgba to rgb (1678)

New Contributors (auto-generated by GitHub)
* dvorjackz made their first contribution (1558)

**Full Changelog**: https://github.com/pytorch/torchtune/compare/v0.3.0...v0.3.1

0.3.0

Overview
We haven’t had a new release for a little while now, so there is a lot in this one. Some highlights include FSDP2 recipes for full finetune and LoRA(/QLoRA), support for DoRA fine-tuning, a PPO recipe for RLHF, Qwen2 models of various sizes, a ton of improvements to memory and performance (try our recipes with torch compile! try our sample packing with flex attention!), and Comet ML integration. For the full set of perf and memory improvements, we recommend installing with the PyTorch nightlies.

New Features
Here are highlights of some of our new features in 0.3.0.

Recipes
- Full finetune FSDP2 recipe (1287)
- LoRA FSDP2 recipe with faster training than FSDP1 (1517)
- RLHF with PPO (1005)
- DoRA (1115)
- SimPO (1223)

Models
- Qwen2 0.5B, 1.5B, 7B model (1143, 1247)
- Flamingo model components (1357)
- CLIP encoder and vision transform (1127)

Perf, memory, and quantization
- Per-layer compile: 90% faster compile time and 75% faster training time (1419)
- Sample packing with flex attention: 80% faster training time with compile vs unpacked (1193)
- Chunked cross-entropy to reduce peak memory (1390)
- Make KV cache optional (1207)
- Option to save adapter checkpoint only (1220)
- Delete logits before bwd, saving ~4 GB (1235)
- Quantize linears without LoRA applied to NF4 (1119)
- Compile model and loss (1296, 1319)
- Speed up QLoRA initialization (1294)
- Set LoRA dropout to 0.0 to save memory (1492)

Data/Datasets
- Multimodal datasets: The Cauldron and LLaVA-Instruct-150K (1158)
- Multimodal collater (1156)
- Tokenizer redesign for better model-specific feature support (1082)
- Create general SFTDataset combining instruct and chat (1234)
- Interleaved image support in tokenizers (1138)
- Image transforms for CLIP encoder (1084)
- Vision cross-attention mask transform (1141)
- Support images in messages (1504)

Miscellaneous
- Deep fusion modules (1338)
- CometLogger integration (1221)
- Add profiler to full finetune recipes (1288)
- Support memory viz tool through the profiler (1382, 1384)
- Add RSO loss (1197)
- Add support for non-incremental decoding (973)
- Move utils directory to training (1432, 1519, …)
- Add bf16 dtype support on CPU (1218)
- Add grad norm logging (1451)

Documentation
- QAT tutorial (1105)
- Recipe docs pages and memory optimizations tutorial (1230)
- Add download commands to model API docs (1167)
- Updates to utils API docs (1170)

Bug Fixes
- Prevent pad ids, special tokens displaying in generate (1211)
- Reverting Gemma checkpoint logic causing missing head weight (1168)
- Fix compile on PyTorch 2.4 (1512)
- Fix Llama 3.1 RoPE init for compile (1544)
- Fix checkpoint load for FSDP2 with CPU offload (1495)
- Add missing quantization to Llama 3.1 layers (1485)
- Fix accuracy number parsing in Eleuther eval test (1135)
- Allow adding custom system prompt to messages (1366)
- Cast DictConfig -> dict in instantiate (1450)

New Contributors (Auto generated by Github)

sanchitintel made their first contribution in https://github.com/pytorch/torchtune/pull/1218
lulmer made their first contribution in https://github.com/pytorch/torchtune/pull/1134
stsouko made their first contribution in https://github.com/pytorch/torchtune/pull/1238
spider-man-tm made their first contribution in https://github.com/pytorch/torchtune/pull/1220
winglian made their first contribution in https://github.com/pytorch/torchtune/pull/1119
fyabc made their first contribution in https://github.com/pytorch/torchtune/pull/1143
mreso made their first contribution in https://github.com/pytorch/torchtune/pull/1274
gau-nernst made their first contribution in https://github.com/pytorch/torchtune/pull/1288
lucylq made their first contribution in https://github.com/pytorch/torchtune/pull/1269
dzheng256 made their first contribution in https://github.com/pytorch/torchtune/pull/1221
ChinoUkaegbu made their first contribution in https://github.com/pytorch/torchtune/pull/1310
janeyx99 made their first contribution in https://github.com/pytorch/torchtune/pull/1382
Gasoonjia made their first contribution in https://github.com/pytorch/torchtune/pull/1385
shivance made their first contribution in https://github.com/pytorch/torchtune/pull/1417
yf225 made their first contribution in https://github.com/pytorch/torchtune/pull/1419
thomasjpfan made their first contribution in https://github.com/pytorch/torchtune/pull/1363
AnuravModak made their first contribution in https://github.com/pytorch/torchtune/pull/1429
lindawangg made their first contribution in https://github.com/pytorch/torchtune/pull/1451
andrewldesousa made their first contribution in https://github.com/pytorch/torchtune/pull/1470
mirceamironenco made their first contribution in https://github.com/pytorch/torchtune/pull/1523
mikaylagawarecki made their first contribution in https://github.com/pytorch/torchtune/pull/1315

0.2.1

Overview

This patch includes support for fine-tuning [Llama3.1](https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1/) with torchtune as well as various improvements to the library.

New Features & Improvements

Models
* Added support for Llama3.1 (1208)

Modules
* Tokenizer refactor to improve the extensibility of our tokenizer components (1082)

Page 1 of 2

Releases

Has known vulnerabilities

Torchtune

Page 1 of 2

0.6.0

0.5.0

0.4.0

0.3.1

0.3.0

0.2.1

Page 1 of 2

Links

Releases