New Features
PyTorch 2.4 (1505)
This release updates LLM Foundry to the PyTorch 2.4 release, bringing with it support for the new features and optimizations in PyTorch 2.4
Extensibility improvements (1450, 1449, 1468, 1467, 1478, 1493, 1495, 1511, 1512, 1527)
Numerous improvements to the extensibility of the modeling and data loading code, enabling easier reuse for subclassing and extending. Please see the linked PRs for more details on each change.
Improved error messages (1457, 1459, 1519, 1518, 1522, 1534, 1548, 1551)
Various improved error messages, making debugging user errors more clear.
Sliding window in torch attention (1455)
We've added support for sliding window attention to the reference attention implementation, allowing easier testing and comparison against more optimized attention variants.
Bug fixes
Extra BOS token for llama 3.1 with completion data (1476)
A bug resulted in an extra BOS token being added between prompt and response during finetuning. This is fixed so that the prompt and response supplied by the user are concatenated without any extra tokens put between them.
What's Changed
* Add test for logged_config transforms by b-chu in https://github.com/mosaicml/llm-foundry/pull/1441
* Bump version to 0.12.0.dev0. by irenedea in https://github.com/mosaicml/llm-foundry/pull/1447
* Update pytest-codeblocks requirement from <0.17,>=0.16.1 to >=0.16.1,<0.18 by dependabot in https://github.com/mosaicml/llm-foundry/pull/1445
* Bump coverage[toml] from 7.4.4 to 7.6.1 by dependabot in https://github.com/mosaicml/llm-foundry/pull/1442
* Enabled generalizing build_inner_model in ComposerHFCausalLM by gupta-abhay in https://github.com/mosaicml/llm-foundry/pull/1450
* Update llm foundry version in mcli yamls by irenedea in https://github.com/mosaicml/llm-foundry/pull/1451
* merge to main by XiaohanZhangCMU in https://github.com/mosaicml/llm-foundry/pull/865
* allow embedding resizing passed through by jdchang1 in https://github.com/mosaicml/llm-foundry/pull/1449
* Update packaging requirement from <23,>=21 to >=21,<25 by dependabot in https://github.com/mosaicml/llm-foundry/pull/1444
* Update pytest requirement from <8,>=7.2.1 to >=7.2.1,<9 by dependabot in https://github.com/mosaicml/llm-foundry/pull/1443
* Implement ruff rules enforcing PEP 585 by snarayan21 in https://github.com/mosaicml/llm-foundry/pull/1453
* Adding sliding window attn to scaled_multihead_dot_product_attention by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/1455
* Add user error for UnicodeDeocdeError in convert text to mds by irenedea in https://github.com/mosaicml/llm-foundry/pull/1457
* Fix log_config by josejg in https://github.com/mosaicml/llm-foundry/pull/1432
* Add EnvironmentLogger Callback by josejg in https://github.com/mosaicml/llm-foundry/pull/1350
* Update mosaicml/ci-testing to 0.1.2 by irenedea in https://github.com/mosaicml/llm-foundry/pull/1458
* Correct error message for inference wrapper by josejg in https://github.com/mosaicml/llm-foundry/pull/1459
* Update CI tests to v0.1.2 by KuuCi in https://github.com/mosaicml/llm-foundry/pull/1466
* Bump onnxruntime from 1.18.1 to 1.19.0 by dependabot in https://github.com/mosaicml/llm-foundry/pull/1461
* Update tenacity requirement from <9,>=8.2.3 to >=8.2.3,<10 by dependabot in https://github.com/mosaicml/llm-foundry/pull/1460
* Simple change to enable mapping functions for ft constructor by gupta-abhay in https://github.com/mosaicml/llm-foundry/pull/1468
* use default eval interval from composer by milocress in https://github.com/mosaicml/llm-foundry/pull/1369
* Consistent Naming EnviromentLoggingCallback by josejg in https://github.com/mosaicml/llm-foundry/pull/1470
* Register NaN Monitor Callback by josejg in https://github.com/mosaicml/llm-foundry/pull/1471
* Add train subset num batches by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/1472
* Parent class hf models by jdchang1 in https://github.com/mosaicml/llm-foundry/pull/1467
* Remove extra bos for prompt/response data with llama3.1 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1476
* Add prepare fsdp back by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1477
* Add date_string when applying tokenizer chat template by snarayan21 in https://github.com/mosaicml/llm-foundry/pull/1474
* Make sample tokenization extensible by gupta-abhay in https://github.com/mosaicml/llm-foundry/pull/1478
* Use Streaming version 0.8.1 by snarayan21 in https://github.com/mosaicml/llm-foundry/pull/1479
* Bump hf-transfer from 0.1.3 to 0.1.8 by dependabot in https://github.com/mosaicml/llm-foundry/pull/1480
* fix hf checkpointer by milocress in https://github.com/mosaicml/llm-foundry/pull/1489
* Fix device mismatch when running hf.generate by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/1486
* Bump composer to 0.24.1 + FSDP config device_mesh deprecation by snarayan21 in https://github.com/mosaicml/llm-foundry/pull/1487
* master_weights_dtype not supported by ComposerHFCausalLM.__init__() by eldarkurtic in https://github.com/mosaicml/llm-foundry/pull/1485
* Detect loss spikes and high losses during training by joyce-chen-uni in https://github.com/mosaicml/llm-foundry/pull/1473
* Enable passing in external position ids by gupta-abhay in https://github.com/mosaicml/llm-foundry/pull/1493
* Align logged attributes for errors and run metadata in kill_loss_spike_callback.py by joyce-chen-uni in https://github.com/mosaicml/llm-foundry/pull/1494
* tokenizer is never built when converting finetuning dataset by eldarkurtic in https://github.com/mosaicml/llm-foundry/pull/1496
* Removing error message for reusing kv cache with torch attn by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/1497
* Fix formatting of loss spike & high loss error messages by joyce-chen-uni in https://github.com/mosaicml/llm-foundry/pull/1498
* Enable cross attention layers by gupta-abhay in https://github.com/mosaicml/llm-foundry/pull/1495
* Update to ci-testing 0.2.0 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1500
* [WIP] Torch 2.4 in docker images by snarayan21 in https://github.com/mosaicml/llm-foundry/pull/1491
* [WIP] Only torch 2.4.0 compatible by snarayan21 in https://github.com/mosaicml/llm-foundry/pull/1505
* Update mlflow requirement from <2.16,>=2.14.1 to >=2.14.1,<2.17 by dependabot in https://github.com/mosaicml/llm-foundry/pull/1506
* Update ci-testing to 0.2.2 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1503
* Allow passing key_value_statest for x-attn through MPT Block by gupta-abhay in https://github.com/mosaicml/llm-foundry/pull/1511
* Fix cross attention for blocks by gupta-abhay in https://github.com/mosaicml/llm-foundry/pull/1512
* Put 2.3 image back in release examples by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1513
* Sort callbacks so that CheckpointSaver goes before HuggingFaceCheckpointer by irenedea in https://github.com/mosaicml/llm-foundry/pull/1515
* Raise MisconfiguredDatasetError from original error by irenedea in https://github.com/mosaicml/llm-foundry/pull/1519
* Peft fsdp by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1520
* Raise DatasetTooSmall exception if canonical nodes is less than num samples by irenedea in https://github.com/mosaicml/llm-foundry/pull/1518
* Add permissions check for delta table reading by irenedea in https://github.com/mosaicml/llm-foundry/pull/1522
* Add HuggingFaceCheckpointer option for only registering final checkpoint by irenedea in https://github.com/mosaicml/llm-foundry/pull/1516
* Replace FSDP args by KuuCi in https://github.com/mosaicml/llm-foundry/pull/1517
* enable correct padding_idx for embedding layers by gupta-abhay in https://github.com/mosaicml/llm-foundry/pull/1527
* Revert "Replace FSDP args" by KuuCi in https://github.com/mosaicml/llm-foundry/pull/1533
* Delete unneeded inner base model in PEFT HF Checkpointer by snarayan21 in https://github.com/mosaicml/llm-foundry/pull/1532
* Add deprecation warning to fsdp_config by KuuCi in https://github.com/mosaicml/llm-foundry/pull/1530
* Fix reuse kv cache for torch attention by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/1539
* Error on text dataset file not found by milocress in https://github.com/mosaicml/llm-foundry/pull/1534
* Make ICL tasks not required for eval by snarayan21 in https://github.com/mosaicml/llm-foundry/pull/1540
* Bumping flash attention version to 2.6.3 and adding option for softcap in attention and lm_head logits. by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/1374
* Register mosaic logger by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1542
* Hfcheckpointer optional generation config by KuuCi in https://github.com/mosaicml/llm-foundry/pull/1543
* Bump composer version to 0.25.0 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1546
* Bump streaming version to 0.9.0 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1550
* Bump version to 0.13.0.dev0 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1549
* Add proper user error for accessing schema by KuuCi in https://github.com/mosaicml/llm-foundry/pull/1548
* Validate Cluster Access Mode by KuuCi in https://github.com/mosaicml/llm-foundry/pull/1551
New Contributors
* jdchang1 made their first contribution in https://github.com/mosaicml/llm-foundry/pull/1449
* joyce-chen-uni made their first contribution in https://github.com/mosaicml/llm-foundry/pull/1473
**Full Changelog**: https://github.com/mosaicml/llm-foundry/compare/v0.11.0...v0.12.0