llm-foundry Changelog

0.8.0

New Features

Megablocks support (1102)
Support for training optimized MoE models at large scale.

Check out the [megablocks documentation](https://github.com/databricks/megablocks) for more information on building state of the art MoE models.

Expanded Registries (1080, 1093, 1094, 1095, 1096, 1165)
We've expanded support for registries to include, dataloaders, FFN layers, attention layers, norms, and parameter initialization functions.

Check out the [README](https://github.com/mosaicml/llm-foundry?tab=readme-ov-file#registry) for detailed instructions and code examples!

Support for ShareGPT chat format (1098)
We now support the [ShareGPT](https://github.com/domeccleston/sharegpt) format for finetuning.

Breaking Changes and Deprecations
We have updated the minimum supported PyTorch version to torch 2.3 (1152).

In Context Learning Code Evaluation (1181)
We've removed the `code_evaluation` task from the allowed in context learning task types, and we've deleted the `InContextLearningCodeEvaluationDataset` and `InContextLearningCodeEvalAccuracy` classes.

Question-Answering
We've removed the `question_answering` task type. Please use the `generation_task_with_answers` task instead.

What's Changed
* Update README by hanlint in https://github.com/mosaicml/llm-foundry/pull/1069
* Expose more exception attributes by jjanezhang in https://github.com/mosaicml/llm-foundry/pull/1071
* Output eval logging batch by maxisawesome in https://github.com/mosaicml/llm-foundry/pull/961
* Add expandeable segments flag by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1075
* Check the user provided eos / bos token id against the tokenizer eos / bos token id by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/1039
* Triton RMSNorm by josejg in https://github.com/mosaicml/llm-foundry/pull/1050
* Fix tiktoken vocab size by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1081
* Doing the loss reduction in foundry instead of in the loss functions. by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/1079
* Decrease log verbosity with no bias by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/1082
* Upgrade hf chat by j316chuck in https://github.com/mosaicml/llm-foundry/pull/1061
* Fixes for streaming and auto packing by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1083
* Background mlflow model registration by irenedea in https://github.com/mosaicml/llm-foundry/pull/1078
* Update README.md to include DBRX blog under "Latest News" by lupesko in https://github.com/mosaicml/llm-foundry/pull/1085
* Decrease transformers file size for mlflow by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1087
* log packing ratio progress by milocress in https://github.com/mosaicml/llm-foundry/pull/1070
* Bump HF version by b-chu in https://github.com/mosaicml/llm-foundry/pull/1091
* Fix typo in expandable_segments by mammothb in https://github.com/mosaicml/llm-foundry/pull/1088
* Bump transformers to 4.39.3 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1086
* Fix yaml typo by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1092
* Fix for overriding nested configs by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1089
* cleaned up HF/MPT conversion test by milocress in https://github.com/mosaicml/llm-foundry/pull/1048
* Update yamls for 0.7.0 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1097
* Norms registry by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1080
* fixing evaluator microbatch size by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/1100
* Updating the streaming version in setup.py by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/1103
* MegaBlocks release by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/1102
* Remove torch compile from GLU by josejg in https://github.com/mosaicml/llm-foundry/pull/1101
* Update config_moe_args.py by vchiley in https://github.com/mosaicml/llm-foundry/pull/1104
* Add remote code option to allow execution of DBRX tokenizer by b-chu in https://github.com/mosaicml/llm-foundry/pull/1106
* Fix overwriting FP8 act ckpt flag in the train script by cli99 in https://github.com/mosaicml/llm-foundry/pull/1107
* Support ShareGPT chat format by samhavens in https://github.com/mosaicml/llm-foundry/pull/1098
* FC layer registry by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1093
* Attention layer registry by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1094
* Dbrx finetune yaml requires save folder specified to enable autoresume by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/1108
* Revert "Update config_moe_args.py" by vchiley in https://github.com/mosaicml/llm-foundry/pull/1111
* rm new_group todo by vchiley in https://github.com/mosaicml/llm-foundry/pull/1112
* Migrate ICL classes to foundry by bmosaicml in https://github.com/mosaicml/llm-foundry/pull/936
* FFN layer registry by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1095
* Param init registry by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1096
* Add missing init file by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1113
* Update tests to not rely on mistral by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1117
* Bump transformers to 4.40 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1118
* add `.json` to SUPPORTED_EXTENSIONS by eitanturok in https://github.com/mosaicml/llm-foundry/pull/1114
* Add option for subclasses to convert model and tokenizer in hf checkpointer by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1121
* Bump Composer to 0.21.3 by b-chu in https://github.com/mosaicml/llm-foundry/pull/1122
* catch misconfigured hf dataset by milocress in https://github.com/mosaicml/llm-foundry/pull/1123
* Pin mlflow by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1124
* Change main to a dev version by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1126
* Fix deprecation versions by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1129
* Clean up the publicly exported API by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1128
* Fix HF checkpointer + mlflow bugs by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1125
* Update JSONL sources in eval README by emmanuel-ferdman in https://github.com/mosaicml/llm-foundry/pull/1110
* Mlflow datasets by KuuCi in https://github.com/mosaicml/llm-foundry/pull/1119
* Strict key checking for dataset by b-chu in https://github.com/mosaicml/llm-foundry/pull/1131
* First initialize dist with gloo by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1133
* Fix saving of generation_config for Llama-3 by eldarkurtic in https://github.com/mosaicml/llm-foundry/pull/1134
* Bump datasets version by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1138
* Revert "First initialize dist with gloo (1133)" by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1139
* Barrier immediately after initialize dist with logs by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1140
* Add new FT instructions by b-chu in https://github.com/mosaicml/llm-foundry/pull/1143
* Upgrade ci-testing by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/1145
* Fix typos in callbacks with configs by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1146
* Remove olmo as a dependency by snarayan21 in https://github.com/mosaicml/llm-foundry/pull/1148
* build inner model by milocress in https://github.com/mosaicml/llm-foundry/pull/1147
* fix DatasetConstants.splints default value to protect dictionary overwriting by ivan-kud in https://github.com/mosaicml/llm-foundry/pull/1144
* Bump flash attention version by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1150
* Torch 2.3 part 1 - build the images by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1149
* Torch 2.3 upgrade Part 2 - CI by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1151
* Comment out 2.3 tests by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1155
* Fix yaml lint by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1156
* Move sentencepiece import by aspfohl in https://github.com/mosaicml/llm-foundry/pull/1157
* Bump composer version to 0.22.0 by snarayan21 in https://github.com/mosaicml/llm-foundry/pull/1160
* Uncomment GPU tests by milocress in https://github.com/mosaicml/llm-foundry/pull/1162
* Depend on coverage by milocress in https://github.com/mosaicml/llm-foundry/pull/1163
* fix dep group in torch 2.3 ci by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1164
* Bump min torch version to 2.3.0 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1152
* Add line splitting and other linting by b-chu in https://github.com/mosaicml/llm-foundry/pull/1161
* refactoring dataloader into registries. by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/1165
* Migrate eval output logging to foundry by maxisawesome in https://github.com/mosaicml/llm-foundry/pull/1166
* Fix import and mocking by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1169
* minor fix to `llmfoundry.data.utils.get_text_collator` by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/1170
* Fix config access for DBRX by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1177

New Contributors
* lupesko made their first contribution in https://github.com/mosaicml/llm-foundry/pull/1085
* mammothb made their first contribution in https://github.com/mosaicml/llm-foundry/pull/1088
* eitanturok made their first contribution in https://github.com/mosaicml/llm-foundry/pull/1114
* emmanuel-ferdman made their first contribution in https://github.com/mosaicml/llm-foundry/pull/1110
* ivan-kud made their first contribution in https://github.com/mosaicml/llm-foundry/pull/1144

**Full Changelog**: https://github.com/mosaicml/llm-foundry/compare/v0.7.0...v0.8.0

0.7.0

LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series.

In addition to the usual bug fixes and performance improvements, we've made foundry more customizable and extensible!

New Features
Registerable Components (975, 1043, 1052, 1057)
We've made key components of LLM Foundry registrable, such as models, loggers, and callbacks. You can use the registry to easily customize and extend your training workflows.

This means that you can register new options for these components, and then use them in your yaml config.

Check out the [README](https://github.com/mosaicml/llm-foundry?tab=readme-ov-file#registry) for detailed instructions and code examples!

Breaking Changes and Deprecations
Deprecated Feature Removals (1063)
We've removed support for deprecated features: triton attention, Prefix LMs, Llama attention patch, z-loss, and text denoising. These features were little used, and we removed them to focus on the core features that are heavily used.

If you were using these features please let us know how you were using them in a GitHub issue. We're happy to add things back that are in heavy usage.

What's Changed
* Fix typo in monolithic chkpt callback docs by sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/1024
* Allow code-quality workflow to be callable by b-chu in https://github.com/mosaicml/llm-foundry/pull/1026
* Fix llama attention patch by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1036
* Adds a decorator for experimental features by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1038
* Finish 0.6.0 release by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1040
* Remove reference to attn_impl: triton by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1041
* Registry based config - Part 1 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/975
* Deprecate attention patching for llama by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1047
* Compile GLU by josejg in https://github.com/mosaicml/llm-foundry/pull/1049
* log details to metadata for run analytics by angel-ruiz7 in https://github.com/mosaicml/llm-foundry/pull/992
* Update README.md by dennyglee in https://github.com/mosaicml/llm-foundry/pull/1056
* Add chat schema example for mlflow by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1054
* Metrics registry by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1052
* LLM Foundry CLI (just registry) by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1043
* Bump Composer to 0.21.1 by jjanezhang in https://github.com/mosaicml/llm-foundry/pull/1053
* Dataloaders registry by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1044
* Fix multi model eval by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1055
* Remove unnecessary test workflow by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1058
* Fix peft llama test by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1059
* Models registry by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1057
* Remove under construction from registry by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1060
* Custom Exceptions for Mosaic Logger by jjanezhang in https://github.com/mosaicml/llm-foundry/pull/1014
* Bump version to 0.7.0 by irenedea in https://github.com/mosaicml/llm-foundry/pull/1063
* Fix file filter by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1067
* Fix context printing by irenedea in https://github.com/mosaicml/llm-foundry/pull/1068

New Contributors
* angel-ruiz7 made their first contribution in https://github.com/mosaicml/llm-foundry/pull/992
* dennyglee made their first contribution in https://github.com/mosaicml/llm-foundry/pull/1056

**Full Changelog**: https://github.com/mosaicml/llm-foundry/compare/v0.6.0...v0.7.0

0.6.0

LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series.

In addition to the usual bug fixes and performance improvements, we've added lots of new features!

New Features

Configurable loss for chat-formatted data (985)
For chat-formatted data, you can now specify which tokens should be loss-generating in a configurable way.

This can be specified in the `train_loader.dataset` section of your yaml as follows:

yaml
...
train_loader:
dataset:
...
target_prompts: <FILL IN>
target_reseponses: <FILL IN>

See the [docstring](https://github.com/mosaicml/llm-foundry/blob/257c25d5c9af61e8e36e10cf8805c3144093ffd1/llmfoundry/data/finetuning/collator.py#L189-L203) for a description of the options.

Olmo support (1016)
We've added support for the [OLMo](https://allenai.org/olmo) model from AI2.

To use OLMo, there are a few configuration parameters you need to set. First of all, you will need to install LLM Foundry with the extra package for OLMo (`pip install .[gpu,olmo]`).

Then you will need to adjust the tokenizer section of your config as follows:

yaml
tokenizer:
name: allenai/OLMo-7B
kwargs:
revision: main
model_max_length: 2048
model_input_names:
- input_ids
- attention_mask
trust_remote_code: true

Token accuracy (983)
We've added a new, on-by-default metric to compute token accuracy in addition to cross entropy and perplexity.

Configurable activation checkpointing (951)
More configurable activation checkpointing for MPT allows finer granular control over memory usage when training MPT. See the [docstring](https://github.com/mosaicml/llm-foundry/blob/257c25d5c9af61e8e36e10cf8805c3144093ffd1/llmfoundry/models/mpt/modeling_mpt.py#L911-L942) for more details.

Finetuning with multiple streams, and pretokenized data (933, 945, 946)
We've brought the finetuning dataloader up to speed with the pretraining dataloader to support mixing multiple streams, and pretokenizing finetuning data. See the [yaml](https://github.com/mosaicml/llm-foundry/blob/main/scripts/train/yamls/finetune/gpt2-arc-easy-cpu-streaming-dataset.yaml) for a full example.

Eval Gauntlet v0.3 (824)
We've release v0.3 of our Evaluation gauntlet. See the [README](https://github.com/mosaicml/llm-foundry/blob/main/scripts/eval/local_data/EVAL_GAUNTLET.md) for a full description.

Breaking changes and deprecations

Flash attention v1 removal (1023)
Support for flash attention v1 has now been removed.

Extra BOS token removed (1003)
When tokenizing prompt/response and chat data, for some tokenizers, we were mistakenly adding an extra BOS token between the prompt and the response. This has now been removed.

Deprecation of triton flash attention, prefixLM, and text denoising (1007)
We've deprecated use of the triton version of flash attention, prefixLM, and text denoising, as these features were not heavily used or actively maintained.

What's Changed
* Gauntlet v0.3: Fix chain-of-thought tasks by bmosaicml in https://github.com/mosaicml/llm-foundry/pull/824
* Add finetuning streaming dataset conversion by bigning in https://github.com/mosaicml/llm-foundry/pull/933
* Add default signature to mlflow saved model by dakinggg in https://github.com/mosaicml/llm-foundry/pull/952
* allow te to use meta device with deferred init by cli99 in https://github.com/mosaicml/llm-foundry/pull/958
* Update TUTORIAL.md by sdonoso in https://github.com/mosaicml/llm-foundry/pull/957
* Update mcli yamls to use v0.5.0 by irenedea in https://github.com/mosaicml/llm-foundry/pull/959
* add finutuning with streaming dataset example by bigning in https://github.com/mosaicml/llm-foundry/pull/945
* Add fully configurable activation checkpointing by cli99 in https://github.com/mosaicml/llm-foundry/pull/951
* Use create_model_version instead of register_model by dakinggg in https://github.com/mosaicml/llm-foundry/pull/953
* Add streams support by bigning in https://github.com/mosaicml/llm-foundry/pull/946
* Fix typo by irenedea in https://github.com/mosaicml/llm-foundry/pull/966
* Fix eval.py with lora by dakinggg in https://github.com/mosaicml/llm-foundry/pull/965
* add memory snapshot to callbacks by cli99 in https://github.com/mosaicml/llm-foundry/pull/810
* Adding curriculum learning callback (experimental) by snarayan21 in https://github.com/mosaicml/llm-foundry/pull/954
* strengthened chat formatting validation by milocress in https://github.com/mosaicml/llm-foundry/pull/960
* Add new base images and remove fa1 images by dakinggg in https://github.com/mosaicml/llm-foundry/pull/970
* Add new ICL kwargs in eval.py and long_context yamls by maxisawesome in https://github.com/mosaicml/llm-foundry/pull/925
* Make composer pins consistent with each other by dakinggg in https://github.com/mosaicml/llm-foundry/pull/972
* Make turbo an optional dependency by snarayan21 in https://github.com/mosaicml/llm-foundry/pull/964
* Fix fewshot_random_seed default setting by maxisawesome in https://github.com/mosaicml/llm-foundry/pull/974
* Improve error msg when checking target_blocks overlap by cli99 in https://github.com/mosaicml/llm-foundry/pull/977
* Torch 2.2 upgrade - Part 1 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/976
* Torch 2.2 - Part 2 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/979
* PyTorch 2.2 - Part 3 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/981
* Remove torch 2.1 from docker workflow by dakinggg in https://github.com/mosaicml/llm-foundry/pull/982
* Async callback: Don't skip checkpoints, reliably only launch async eval when the checkpoint is ready by aspfohl in https://github.com/mosaicml/llm-foundry/pull/813
* Token accuracy metrics by dakinggg in https://github.com/mosaicml/llm-foundry/pull/983
* Update readme to not mention 1.13_cu117 by irenedea in https://github.com/mosaicml/llm-foundry/pull/988
* Patch test, lock mcli version by aspfohl in https://github.com/mosaicml/llm-foundry/pull/990
* Bump gha timeouts by aspfohl in https://github.com/mosaicml/llm-foundry/pull/991
* Fix readme typo by dakinggg in https://github.com/mosaicml/llm-foundry/pull/993
* if condition in tie weights added by megha95 in https://github.com/mosaicml/llm-foundry/pull/989
* Bump Composer to 0.20 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/995
* Trim examples ahead of time for auto packing by irenedea in https://github.com/mosaicml/llm-foundry/pull/994
* add oom observer callback by cli99 in https://github.com/mosaicml/llm-foundry/pull/932
* Use ci-testing repo for tests by b-chu in https://github.com/mosaicml/llm-foundry/pull/1000
* Make CodeEval respect device_eval_batch_size by josejg in https://github.com/mosaicml/llm-foundry/pull/956
* Remove try except around imports by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1004
* Deprecate triton, prefix lm, llama attention patch, and text denoising; Make ComposerHFT5 experimental by irenedea in https://github.com/mosaicml/llm-foundry/pull/1007
* add magic filename for sharded state dicts by milocress in https://github.com/mosaicml/llm-foundry/pull/1001
* Bump CI/CD to v3 by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/1009
* Fix evaluators actually pulling eval metrics by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/1006
* Build torch 2.2.1 images by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1010
* Add torch 2.2.1 tests by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1011
* Bump min torch pin to 2.2.1 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1013
* Fix extra BOS token in front of response for some tokenizers by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1003
* Bump min composer pin by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1015
* Add default for eval interval by irenedea in https://github.com/mosaicml/llm-foundry/pull/987
* Add support for olmo by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1016
* Add deeper support for multi-turn chats and loss-generating tokens in finetuning by alextrott16 in https://github.com/mosaicml/llm-foundry/pull/985
* Add explicit packing ratio of 1 for profiling by irenedea in https://github.com/mosaicml/llm-foundry/pull/1019
* Bump transformers to 4.38.2 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1018
* Making sure `MemoryMonitor` takes in kwargs. by snarayan21 in https://github.com/mosaicml/llm-foundry/pull/1020
* Update readme for torch version 2.2.1 by irenedea in https://github.com/mosaicml/llm-foundry/pull/1021
* Add code import to train/eval scripts by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1002
* Bump version in readme by bmosaicml in https://github.com/mosaicml/llm-foundry/pull/1022
* Bump version to 0.6.0 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/1023

New Contributors
* bigning made their first contribution in https://github.com/mosaicml/llm-foundry/pull/933
* sdonoso made their first contribution in https://github.com/mosaicml/llm-foundry/pull/957
* josejg made their first contribution in https://github.com/mosaicml/llm-foundry/pull/956

**Full Changelog**: https://github.com/mosaicml/llm-foundry/compare/v0.5.0...v0.6.0

0.5.0

LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series.

In addition to the usual bug fixes and performance improvements, we've added lots of new features!

New Features

LoRA Support (with FSDP!) (886)
LLM Foundry now supports LoRA via an integration with the [PEFT library](https://github.com/huggingface/peft). Within LLM Foundry, run `train.py`, adding peft_config arguments to the model section of the config `.yaml`, like so:

model:
...
peft_config:
r: 16
peft_type: LORA
task_type: CAUSAL_LM
lora_alpha: 32
lora_dropout: 0.05
target_modules:
- q_proj
- k_proj

Read more about it in the [tutorial](https://github.com/mosaicml/llm-foundry/blob/main/TUTORIAL.md#can-i-finetune-using-peft--lora).

ALiBi for Flash Attention (820)
We've added support for using ALiBi with Flash Attention (v2.4.2 or higher).

model:
...
attn_config:
attn_impl: flash
alibi: True

Chat Data for Finetuning (884)
We now support finetuning on chat data, with automatic formatting applied using Hugging Face tokenizer [chat templates](https://huggingface.co/docs/transformers/main/en/chat_templating).

Each sample requires a single key `"messages"` that maps to an array of message objects. Each message object in the array represents a single message in the conversation and must contain the following keys:
* `role` : A string indicating the author of the message. Possible values are `"system"` ,`"user"` , and `"assistant"` .
* `content` : A string containing the text of the message.

We require that there must be at least one message with the role "assistant", and the last message in the "messages" array must have the role "assistant" .

Here's an example `.jsonl` with chat data:

{ "messages": [ { "role": "user", "content": "Hi, MPT!" }, { "role": "assistant", "content": "Hi, user!" } ]}
{ "messages": [
{ "role": "system": "A conversation between a user and a helpful and honest assistant"}
{ "role": "user", "content": "Hi, MPT!" },
{ "role": "assistant", "content": "Hi, user!" },
{ "role": "user", "content": "Is multi-turn chat supported?"},
{ "role": "assistant", "content": "Yes, we can chat for as long as my context length allows." }
]}
...

Safe Load for HuggingFace Datasets (798)

We now provide a `safe_load` option when loading HuggingFace datasets for finetuning.

This restricts loaded files to `.jsonl`, `.csv`, or `.parquet` extensions to prevent arbitrary code execution.

To use, set `safe_load` to `true` in your dataset configuration:

train_loader:
name: finetuning
dataset:
safe_load: true
...

New PyTorch, Composer, Streaming, and Transformers versions
As always, we've updated to new versions of the core dependencies of LLM Foundry, bringing better performance, new features, and support for new models (mixtral in particular).

Deprecations

Support for Flash Attention v1 (921)
Will be removed in v0.6.0.

Breaking Changes
Removed support for PyTorch versions before 2.1 (787)
We no longer support PyTorch versions before 2.1.

Removed Deprecated Features (948)
We've removed features that have been deprecated for at least one release.

What's Changed
* Small test fix to have right padding by sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/757
* Release 040 back to main by dakinggg in https://github.com/mosaicml/llm-foundry/pull/758
* Bump composer version to 0.17.1 by irenedea in https://github.com/mosaicml/llm-foundry/pull/762
* Docker release on workflow_dispatch by bandish-shah in https://github.com/mosaicml/llm-foundry/pull/763
* Fix tiktoken wrapper by dakinggg in https://github.com/mosaicml/llm-foundry/pull/761
* enable param group configuration in llm-foundry by vchiley in https://github.com/mosaicml/llm-foundry/pull/760
* Add script for doing bulk generation against an endpoint by aspfohl in https://github.com/mosaicml/llm-foundry/pull/765
* Only strip object names when creating new output path by irenedea in https://github.com/mosaicml/llm-foundry/pull/766
* Add eval loader to eval script by aspfohl in https://github.com/mosaicml/llm-foundry/pull/742
* Support inputs_embeds by samhavens in https://github.com/mosaicml/llm-foundry/pull/687
* Better error message when test does not complete by aspfohl in https://github.com/mosaicml/llm-foundry/pull/769
* Add codeowners by dakinggg in https://github.com/mosaicml/llm-foundry/pull/770
* add single value support to activation_checkpointing_target by cli99 in https://github.com/mosaicml/llm-foundry/pull/772
* Reorganize tests to make them easier to find by aspfohl in https://github.com/mosaicml/llm-foundry/pull/768
* Add "completion" alias for response key by dakinggg in https://github.com/mosaicml/llm-foundry/pull/771
* Shashank/seq id flash attn by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/738
* Fix SIQA gold indices by bmosaicml in https://github.com/mosaicml/llm-foundry/pull/774
* Add missing load_weights_only to example yamls by dakinggg in https://github.com/mosaicml/llm-foundry/pull/776
* Patch flash attn in test to simulate environment without it installed by dakinggg in https://github.com/mosaicml/llm-foundry/pull/778
* Update .gitignore by aspfohl in https://github.com/mosaicml/llm-foundry/pull/781
* Disable mosaicml logger in foundry CI/CD by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/788
* Chat fomating template changes by rajammanabrolu in https://github.com/mosaicml/llm-foundry/pull/784
* Remove tests and support for torch <2.1 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/787
* Fix utf-8 decode errors in tiktoken wrapper by dakinggg in https://github.com/mosaicml/llm-foundry/pull/792
* Update gauntlet v0.2 to reflect results of calibration by bmosaicml in https://github.com/mosaicml/llm-foundry/pull/791
* Remove from mcli.sdk imports by aspfohl in https://github.com/mosaicml/llm-foundry/pull/793
* Auto packing fixes by irenedea in https://github.com/mosaicml/llm-foundry/pull/783
* Enable flag to not pass PAD tokens in ffwd by bcui19 in https://github.com/mosaicml/llm-foundry/pull/775
* Adding a fix for Cross Entropy Loss for long sequence lengths. by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/795
* Minor readme updates and bump min python version by dakinggg in https://github.com/mosaicml/llm-foundry/pull/799
* Enable GLU FFN type by vchiley in https://github.com/mosaicml/llm-foundry/pull/796
* clean up resolve_ffn_hidden_and_exp_ratio by vchiley in https://github.com/mosaicml/llm-foundry/pull/801
* Fix token counting to use attention mask instead of ids by dakinggg in https://github.com/mosaicml/llm-foundry/pull/802
* update openai wrapper to work with tiktoken interface and newest openai version by bmosaicml in https://github.com/mosaicml/llm-foundry/pull/794
* Fix openai not conditioned imports by dakinggg in https://github.com/mosaicml/llm-foundry/pull/806
* Make the ffn activation func configurable by vchiley in https://github.com/mosaicml/llm-foundry/pull/805
* Clean up the logs, bump datasets and transformers by dakinggg in https://github.com/mosaicml/llm-foundry/pull/804
* Fix remote path check for UC volumes by irenedea in https://github.com/mosaicml/llm-foundry/pull/809
* Expand options for MMLU. by mansheej in https://github.com/mosaicml/llm-foundry/pull/811
* Async eval callback by aspfohl in https://github.com/mosaicml/llm-foundry/pull/702
* Updating the Flash Attention version to fix cross entropy loss by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/812
* Remove redundant transposes for rope rotation by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/807
* Add generic flatten imports to HF checkpointer by b-chu in https://github.com/mosaicml/llm-foundry/pull/814
* Fix token counting to allow there to be no attention mask by dakinggg in https://github.com/mosaicml/llm-foundry/pull/818
* Default to using tokenizer eos and bos in convert_text_to_mds.py by irenedea in https://github.com/mosaicml/llm-foundry/pull/823
* Revert "Default to using tokenizer eos and bos in convert_text_to_mds.py" by irenedea in https://github.com/mosaicml/llm-foundry/pull/825
* Bump turbo version to 0.0.7 by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/827
* Align GLU implementation with LLaMa by vchiley in https://github.com/mosaicml/llm-foundry/pull/829
* Use `sync_module_states: True` when using HSDP by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/830
* Update composer to 0.17.2 and streaming to 0.7.2 by irenedea in https://github.com/mosaicml/llm-foundry/pull/822
* zero bias conversion corrected by megha95 in https://github.com/mosaicml/llm-foundry/pull/624
* Bump einops version, which has improved support for torch compile by sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/832
* Update README with links to ML HW resources by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/833
* Add safe_load option to restrict HF dataset downloads to allowed file types by irenedea in https://github.com/mosaicml/llm-foundry/pull/798
* Adding support for alibi when using flash attention by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/820
* Shashank/new benchmarks by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/838
* Fix error when decoding a token in the id gap (or out of range) in a tiktoken tokenizer by dakinggg in https://github.com/mosaicml/llm-foundry/pull/841
* Add use_tokenizer_eos option to convert text to mds script by irenedea in https://github.com/mosaicml/llm-foundry/pull/843
* Disable Environment Variable Resolution by irenedea in https://github.com/mosaicml/llm-foundry/pull/845
* Bump pre-commit version by b-chu in https://github.com/mosaicml/llm-foundry/pull/847
* Fix typo kwargs=>hf_kwargs by irenedea in https://github.com/mosaicml/llm-foundry/pull/853
* Remove foundry time wrangling by aspfohl in https://github.com/mosaicml/llm-foundry/pull/855
* Minor cleanups by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/858
* Read UC delta table by XiaohanZhangCMU in https://github.com/mosaicml/llm-foundry/pull/773
* Remove fused layernorm (deprecated in composer) by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/859
* Remove hardcoded combined.jsonl with a flag by XiaohanZhangCMU in https://github.com/mosaicml/llm-foundry/pull/861
* Bump to turbo v8 by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/828
* Always initialize dist by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/864
* Logs upload URI by milocress in https://github.com/mosaicml/llm-foundry/pull/850
* Delta to JSONL conversion script cleanup and bug fix by nancyhung in https://github.com/mosaicml/llm-foundry/pull/868
* Fix MLFlowLogger mock in tests by jerrychen109 in https://github.com/mosaicml/llm-foundry/pull/872
* [XS] Fix delta conversion script regex bug by nancyhung in https://github.com/mosaicml/llm-foundry/pull/877
* Precompute flash attention padding info by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/880
* Add GQA to __init__.py by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/882
* fsdp wrap refac by vchiley in https://github.com/mosaicml/llm-foundry/pull/883
* Update model download utils to support ORAS by jerrychen109 in https://github.com/mosaicml/llm-foundry/pull/881
* Update license by b-chu in https://github.com/mosaicml/llm-foundry/pull/887
* Fix tiktoken tokenizer add_generation_prompt by irenedea in https://github.com/mosaicml/llm-foundry/pull/890
* Upgrade `datasets` version by dakinggg in https://github.com/mosaicml/llm-foundry/pull/892
* Bump transformers version to support Mixtral by dakinggg in https://github.com/mosaicml/llm-foundry/pull/894
* Add `tokenizer-only` flag to only download tokenizers from HF or oras by irenedea in https://github.com/mosaicml/llm-foundry/pull/895
* Foundational Model API eval wrapper by aspfohl in https://github.com/mosaicml/llm-foundry/pull/849
* Add better error for non-empty local output folder in convert_text_to_mds.py by irenedea in https://github.com/mosaicml/llm-foundry/pull/891
* Allow bool input for loggers by ngcgarcia in https://github.com/mosaicml/llm-foundry/pull/897
* Enable QK Group Norm by vchiley in https://github.com/mosaicml/llm-foundry/pull/869
* Workflow should not have leading ./ by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/905
* Add new GC option by dakinggg in https://github.com/mosaicml/llm-foundry/pull/907
* No symlinks at all for HF download by jerrychen109 in https://github.com/mosaicml/llm-foundry/pull/908
* Adds support for chat formatted finetuning input data. by milocress in https://github.com/mosaicml/llm-foundry/pull/884
* Add flag to enable/disable param upload by ngcgarcia in https://github.com/mosaicml/llm-foundry/pull/912
* Add support for eval_loader & eval_subset_num_batches in async callback by aspfohl in https://github.com/mosaicml/llm-foundry/pull/834
* Add the model license file for mlflow by dakinggg in https://github.com/mosaicml/llm-foundry/pull/915
* Warn instead of error on tokenizer-only download with http by jerrychen109 in https://github.com/mosaicml/llm-foundry/pull/904
* Fix fmapi_chat for instruct models and custom tokenizers by aspfohl in https://github.com/mosaicml/llm-foundry/pull/914
* Make yamllint consistent with Composer by b-chu in https://github.com/mosaicml/llm-foundry/pull/918
* Create HF checkpointer model on meta device by dakinggg in https://github.com/mosaicml/llm-foundry/pull/916
* Tiktoken chat format fix by rajammanabrolu in https://github.com/mosaicml/llm-foundry/pull/893
* fix dash issue by milocress in https://github.com/mosaicml/llm-foundry/pull/919
* Fix yaml linting by b-chu in https://github.com/mosaicml/llm-foundry/pull/920
* Adding deprecation warning for Flash Attention 1 and user warning against using Triton attention. by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/921
* Add rich formatting to tracebacks by jjanezhang in https://github.com/mosaicml/llm-foundry/pull/927
* Fix docker workflow caching by irenedea in https://github.com/mosaicml/llm-foundry/pull/930
* Remove .ci folder and move FILE_HEADER by irenedea in https://github.com/mosaicml/llm-foundry/pull/931
* Throw error when no EOS by KuuCi in https://github.com/mosaicml/llm-foundry/pull/922
* Bump composer to 0.19 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/934
* Update eval_gauntlet_callback.py with math.log2 by Skylion007 in https://github.com/mosaicml/llm-foundry/pull/821
* Switch to the Composer integration of LoRA (works with FSDP) by dakinggg in https://github.com/mosaicml/llm-foundry/pull/886
* Refactoring the `add_metrics_to_eval_loaders` function to accept list of metric names instead of a dictionary of metrics. by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/938
* Fix an extra call to load state dict and type cast in hf checkpointer by dakinggg in https://github.com/mosaicml/llm-foundry/pull/939
* Fixing the gen_attention_mask_in_length function to handle the case when sequence id is -1 due to attention masking by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/940
* Update lora docs by dakinggg in https://github.com/mosaicml/llm-foundry/pull/941
* Bump FAv2 setup.py by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/942
* Retrieve license information when local files are provided for a pretrained model by jerrychen109 in https://github.com/mosaicml/llm-foundry/pull/943
* Add and use VersionedDeprecationWarning by irenedea in https://github.com/mosaicml/llm-foundry/pull/944
* Bump llm-foundry version to 0.5.0 by irenedea in https://github.com/mosaicml/llm-foundry/pull/948

New Contributors
* megha95 made their first contribution in https://github.com/mosaicml/llm-foundry/pull/624
* milocress made their first contribution in https://github.com/mosaicml/llm-foundry/pull/850
* nancyhung made their first contribution in https://github.com/mosaicml/llm-foundry/pull/868
* ngcgarcia made their first contribution in https://github.com/mosaicml/llm-foundry/pull/897
* KuuCi made their first contribution in https://github.com/mosaicml/llm-foundry/pull/922
* Skylion007 made their first contribution in https://github.com/mosaicml/llm-foundry/pull/821

**Full Changelog**: https://github.com/mosaicml/llm-foundry/compare/v0.4.0...v0.5.0

0.4.0

LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT-7B and MPT-30B models.

In addition to the usual bug fixes and performance improvements, we've added lots of new features!

New Features

Automatic sequence packing ([683](https://github.com/mosaicml/llm-foundry/pull/683))

You can now specify `packing_ratio: auto` under your finetuning dataset, to automatically profile and select a good packing ratio to efficiently pack your sequences together on the fly during finetuning. This can dramatically reduce the amount of compute wasted on padding tokens.

Flash Attention 2 ([651](https://github.com/mosaicml/llm-foundry/pull/651), [#666](https://github.com/mosaicml/llm-foundry/pull/666), [#672](https://github.com/mosaicml/llm-foundry/pull/672))

We now support using [Flash Attention 2](https://arxiv.org/abs/2307.08691) both in MPT and in any model that supports Flash Attention 2 via the Transformers library. See the [training instructions](https://github.com/mosaicml/llm-foundry/tree/main/scripts/train#using-flash-attention-) to learn how to use the different versions of Flash Attention.

New PyTorch, Composer, Streaming, and Transformers versions ([648](https://github.com/mosaicml/llm-foundry/pull/648), [#672](https://github.com/mosaicml/llm-foundry/pull/672), [#736](https://github.com/mosaicml/llm-foundry/pull/736))

As always, we've updated to new versions of the core dependencies of LLM Foundry, bringing better performance, new features, and support for new models (codellama and mistral in particular).

Easy Databricks model deployment ([618](https://github.com/mosaicml/llm-foundry/pull/618))

We've made it much easier to go from a training run to a served model using Databricks model serving. To make use of this feature, you need to specify both an `MLFlowLogger` and a `HuggingFaceCheckpointer` for your run.

The `MLFlowLogger` should have a Unity Catalog model registry prefix in the form of `catalog.schema`. This specifies where to register your models to. For example,

loggers:
mlflow:
experiment_name: /Users/first.lastemail.com/my_experiment_name,
tracking_uri: databricks,
model_registry_prefix: catalog.schema,
model_registry_uri: databricks-uc,

The `HuggingFaceCheckpointer` should specify the name you want to register the model under. For example,

callbacks:
hf_checkpointer:
save_interval: 1ep Save Hugging Face formatted checkpoints each epoch
save_folder: s3://bucket/path/to/my/checkpoints
mlflow_registered_model_name: my_model_name Final model will be registered to catalog.schema.my_model_name

MPT model configurations

We've added a few new options when training with the MPT architecture in LLM Foundry.

- Rotary embeddings ([675](https://github.com/mosaicml/llm-foundry/pull/675))
- (Un)Tied word embeddings ([728](https://github.com/mosaicml/llm-foundry/pull/728))
- Fine grained activation checkpointing ([720](https://github.com/mosaicml/llm-foundry/pull/720))

Evaluation Improvements

We've released v0.1 of our Eval Gauntlet ([674](https://github.com/mosaicml/llm-foundry/pull/674), [#748](https://github.com/mosaicml/llm-foundry/pull/748))! This adds many new benchmarks, chain-of-thought, and a new safety category. Check out the [README](https://github.com/mosaicml/llm-foundry/blob/main/scripts/eval/local_data/EVAL_GAUNTLET.md) for full details!

In addition, we've made a few improvements to our evaluation options, with more to come!
- Allow specifying multiple evaluation datasets to compute cross entropy and perplexity on during training ([603](https://github.com/mosaicml/llm-foundry/pull/603))
- Easier versions of the HumanEval dataset, which can be useful for comparing smaller models ([645](https://github.com/mosaicml/llm-foundry/pull/645))
- More options for averaging the results of the Eval Gauntlet ([640](https://github.com/mosaicml/llm-foundry/pull/640))

New pretraining benchmarks ([543](https://github.com/mosaicml/llm-foundry/pull/543))

Added H100 profiling results to our [benchmarking table](https://github.com/mosaicml/llm-foundry/blob/main/scripts/train/benchmarking/README.md).

Quality of life improvements

- Improved [Generate](https://github.com/mosaicml/composer/blob/dev/composer/callbacks/generate.py) callback with more logging options. Use the `Generate` callback to log generations from your model over the course of training. ([#631](https://github.com/mosaicml/llm-foundry/pull/631))
- Count number of tokens during training _excluding_ padding tokens. Previously this count _included_ padding tokens. ([676](https://github.com/mosaicml/llm-foundry/pull/676))
- Use the PyTorch profiler to profile your training runs. ([678](https://github.com/mosaicml/llm-foundry/pull/678))
- A [convenience script](https://github.com/mosaicml/llm-foundry/blob/main/scripts/misc/download_hf_model.py) for using the much faster Hugging Face `snapshot_download` to download models from the Hugging Face Hub. ([#708](https://github.com/mosaicml/llm-foundry/pull/708))
- New [AWS specific Docker images](https://github.com/mosaicml/llm-foundry#mosaicml-docker-images) with LLM Foundry dependencies pre-installed. ([731](https://github.com/mosaicml/llm-foundry/pull/731))

Experimental features

Inverse square root learning rate scheduler ([657](https://github.com/mosaicml/llm-foundry/pull/657))

We've added experimental support for the [inverse square root learning rate scheduler](https://github.com/mosaicml/llm-foundry/blob/1793c366fd49f96685481e086d7584f21afef450/llmfoundry/optim/scheduler.py#L40).

Breaking changes

Updated Streaming defaults ([723](https://github.com/mosaicml/llm-foundry/pull/723))

We've upgraded to the latest Streaming version, including vastly improved default settings for partitioning and shuffling. This means that if you were using the defaults, you will get different results after upgrading. The new defaults should be more performant for the large majority of use cases. See the [Streaming release notes](https://github.com/mosaicml/streaming/releases/tag/v0.7.0) for more details.

Removed support for PrefixLM for Bloom and OPT models ([704](https://github.com/mosaicml/llm-foundry/pull/704))

We occasionally remove unused experimental parts of the code base to focus on new features and better support for existing features, and we've removed support for PrefixLM applied to Bloom and OPT models in this release.

What's Changed
* Multi eval dataset logging by snarayan21 in https://github.com/mosaicml/llm-foundry/pull/603
* Merge release 0.3.0 back to main by dakinggg in https://github.com/mosaicml/llm-foundry/pull/635
* Add tmp path retention policy by j316chuck in https://github.com/mosaicml/llm-foundry/pull/641
* Add flag to disable train metrics by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/642
* Update pins to latest version that were missed by dakinggg in https://github.com/mosaicml/llm-foundry/pull/646
* Fix overriding of rope_scaling config by dakinggg in https://github.com/mosaicml/llm-foundry/pull/644
* Add 2.1 images to docker workflow and tests by dakinggg in https://github.com/mosaicml/llm-foundry/pull/648
* Fixes to lion8b test for torch 2.1 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/649
* Only log "changing autoresume" when actually changing by aspfohl in https://github.com/mosaicml/llm-foundry/pull/653
* Fix lion8b error correction with torch 2.1 by dblalock in https://github.com/mosaicml/llm-foundry/pull/656
* Clean up processes between distributed gpu tests by j316chuck in https://github.com/mosaicml/llm-foundry/pull/660
* Revert "Clean up processes between distributed gpu tests (660)" by j316chuck in https://github.com/mosaicml/llm-foundry/pull/662
* Switch ordering of foundry gpu tests by j316chuck in https://github.com/mosaicml/llm-foundry/pull/665
* Change batch size on coding tasks to 1 to avoid OOM by bmosaicml in https://github.com/mosaicml/llm-foundry/pull/654
* Add images with flash attention 2 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/651
* Fix yaml change by dakinggg in https://github.com/mosaicml/llm-foundry/pull/667
* Revert actions change by dakinggg in https://github.com/mosaicml/llm-foundry/pull/668
* Inverse Square Root LR Schedule by mansheej in https://github.com/mosaicml/llm-foundry/pull/657
* Add test suite for flash attention 2 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/666
* Adding Simplified Coding Tasks by mcarbin in https://github.com/mosaicml/llm-foundry/pull/645
* Fix typo in image name by dakinggg in https://github.com/mosaicml/llm-foundry/pull/669
* Point to composer.callback.Generate by aspfohl in https://github.com/mosaicml/llm-foundry/pull/631
* Do not update past_key_values in place by irenedea in https://github.com/mosaicml/llm-foundry/pull/652
* Fix small typos in the eval readme by maxisawesome in https://github.com/mosaicml/llm-foundry/pull/671
* Convert to DataSpec and add token counts that include padding by dakinggg in https://github.com/mosaicml/llm-foundry/pull/676
* Add support for automatically registering models to UC at the end of training by dakinggg in https://github.com/mosaicml/llm-foundry/pull/618
* add `load_strict_model_weights` as an optional config parameter by AllenHW in https://github.com/mosaicml/llm-foundry/pull/655
* Small changes to HF repo update script by dakinggg in https://github.com/mosaicml/llm-foundry/pull/680
* Add profiler support in llm foundry by j316chuck in https://github.com/mosaicml/llm-foundry/pull/678
* Update_pretrain_benchmarks by crinard in https://github.com/mosaicml/llm-foundry/pull/543
* add |---| to render tables correctly by crinard in https://github.com/mosaicml/llm-foundry/pull/686
* Adding Mosaic logger + logging data validated event by jjanezhang in https://github.com/mosaicml/llm-foundry/pull/670
* Tiktoken wrapper add_eos_token option by rajammanabrolu in https://github.com/mosaicml/llm-foundry/pull/681
* Attempt to fix flaky test by dakinggg in https://github.com/mosaicml/llm-foundry/pull/688
* Allow flash attention 2 and upgrade to transformers 4.34.1 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/672
* Fix mlflow model logging bug by dakinggg in https://github.com/mosaicml/llm-foundry/pull/692
* Add fixtures by irenedea in https://github.com/mosaicml/llm-foundry/pull/673
* Make default for cuda_load_lazy false by irenedea in https://github.com/mosaicml/llm-foundry/pull/694
* Update README.md by j316chuck in https://github.com/mosaicml/llm-foundry/pull/693
* Pad tiktoken vocab so that additional_special_tokens works by dakinggg in https://github.com/mosaicml/llm-foundry/pull/695
* Remove live logs to be consistent with Composer by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/698
* Change gauntlet avging by bmosaicml in https://github.com/mosaicml/llm-foundry/pull/640
* Remove prefixlm support for OPT and Bloom by dakinggg in https://github.com/mosaicml/llm-foundry/pull/704
* Fix attention patch compatibility for llama2 by irenedea in https://github.com/mosaicml/llm-foundry/pull/705
* Add test coverage for lion and lion8b checkpoint interop by dblalock in https://github.com/mosaicml/llm-foundry/pull/679
* Improvement in README.md and TUTORIAL.md by tmsagarofficial in https://github.com/mosaicml/llm-foundry/pull/699
* Make TiktokenTokenizerWrapper picklable by irenedea in https://github.com/mosaicml/llm-foundry/pull/700
* Add num_proc to map and filter calls by dakinggg in https://github.com/mosaicml/llm-foundry/pull/706
* Fix HF local module copy contention with a meta init on local rank 0 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/710
* Add support for auto packing ratio by irenedea in https://github.com/mosaicml/llm-foundry/pull/683
* Remove HumanEval tasks from ICL eval by tbarton16 in https://github.com/mosaicml/llm-foundry/pull/715
* Allow logging metadata by dakinggg in https://github.com/mosaicml/llm-foundry/pull/714
* Run HF dataset processing on local rank 0 first by dakinggg in https://github.com/mosaicml/llm-foundry/pull/716
* Add Hugging Face model download script by jerrychen109 in https://github.com/mosaicml/llm-foundry/pull/708
* Adding support for Rotary Position Embeddings by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/675
* Add databricks dependency by irenedea in https://github.com/mosaicml/llm-foundry/pull/717
* Set persistent_workers = False for packing profiling by dakinggg in https://github.com/mosaicml/llm-foundry/pull/718
* raise timeout for GPU tests by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/719
* change default overwrite to True by dakinggg in https://github.com/mosaicml/llm-foundry/pull/724
* Attempt to fix a very occasional hang in datasets map/filter by dakinggg in https://github.com/mosaicml/llm-foundry/pull/725
* Add Unity Catalog support to HF checkpointer by dakinggg in https://github.com/mosaicml/llm-foundry/pull/721
* Combine filters into one, to avoid datasets error by dakinggg in https://github.com/mosaicml/llm-foundry/pull/729
* Fix logging verbosity in HF model download script and repair symlinks by jerrychen109 in https://github.com/mosaicml/llm-foundry/pull/727
* Gate the dist calls in build_tokenizer by dakinggg in https://github.com/mosaicml/llm-foundry/pull/732
* Create AWS docker image for fine tuning by j316chuck in https://github.com/mosaicml/llm-foundry/pull/731
* Make TiktokenTokenizerWrapper compatible with convert_composer_to_hf.py by irenedea in https://github.com/mosaicml/llm-foundry/pull/730
* Enable `tie_word_embeddings` config setting to enable / disable weight tied embeddings by vchiley in https://github.com/mosaicml/llm-foundry/pull/728
* add act checkpoint at sub layer level by cli99 in https://github.com/mosaicml/llm-foundry/pull/720
* Better defaults for StreamingDataset subclasses by snarayan21 in https://github.com/mosaicml/llm-foundry/pull/723
* Rename log message by b-chu in https://github.com/mosaicml/llm-foundry/pull/734
* Remove tokenizer_name field by dakinggg in https://github.com/mosaicml/llm-foundry/pull/735
* Fix pairwise attention comparison in test by sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/737
* Fix passed metadata to mlflow logging by wenfeiy-db in https://github.com/mosaicml/llm-foundry/pull/713
* HF script explicitly casts precision by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/741
* Bump to composer 0.17 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/736
* Patch os cpu count to avoid extra multiprocessing inside pytest which sometimes hangs by dakinggg in https://github.com/mosaicml/llm-foundry/pull/745
* Reenable tests that were accidentally disabled by dakinggg in https://github.com/mosaicml/llm-foundry/pull/746
* Gauntlet v0.1 by bmosaicml in https://github.com/mosaicml/llm-foundry/pull/674
* Remove extra test suite by dakinggg in https://github.com/mosaicml/llm-foundry/pull/743
* Fix typo in workflow file by dakinggg in https://github.com/mosaicml/llm-foundry/pull/750
* Fix 1.13 tests by dakinggg in https://github.com/mosaicml/llm-foundry/pull/751
* Pin Chat format to TiktokenTokenizerWrapper by rajammanabrolu in https://github.com/mosaicml/llm-foundry/pull/752
* Catch exception raised in hf prep properly by j316chuck in https://github.com/mosaicml/llm-foundry/pull/749
* Gauntlet v0.1.0 yaml fixes by bmosaicml in https://github.com/mosaicml/llm-foundry/pull/748
* Fix flash attention GQA bug to use the dynamic size of the key/value tensors - used for eval/inference by sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/756

New Contributors
* mansheej made their first contribution in https://github.com/mosaicml/llm-foundry/pull/657
* mcarbin made their first contribution in https://github.com/mosaicml/llm-foundry/pull/645
* maxisawesome made their first contribution in https://github.com/mosaicml/llm-foundry/pull/671
* AllenHW made their first contribution in https://github.com/mosaicml/llm-foundry/pull/655
* crinard made their first contribution in https://github.com/mosaicml/llm-foundry/pull/543
* jjanezhang made their first contribution in https://github.com/mosaicml/llm-foundry/pull/670
* rajammanabrolu made their first contribution in https://github.com/mosaicml/llm-foundry/pull/681
* tmsagarofficial made their first contribution in https://github.com/mosaicml/llm-foundry/pull/699
* tbarton16 made their first contribution in https://github.com/mosaicml/llm-foundry/pull/715
* ShashankMosaicML made their first contribution in https://github.com/mosaicml/llm-foundry/pull/675
* cli99 made their first contribution in https://github.com/mosaicml/llm-foundry/pull/720
* b-chu made their first contribution in https://github.com/mosaicml/llm-foundry/pull/734
* wenfeiy-db made their first contribution in https://github.com/mosaicml/llm-foundry/pull/713

**Full Changelog**: https://github.com/mosaicml/llm-foundry/compare/v0.3.0...v0.4.0

0.3.0

LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series. This release includes lots of bug fixes, stability improvements, and improved error messages, in addition to all the new features listed below!

Features

Llama-2 ([485](https://github.com/mosaicml/llm-foundry/pull/485), [#520](https://github.com/mosaicml/llm-foundry/pull/520), [#533](https://github.com/mosaicml/llm-foundry/pull/533))

Adds support for training Llama-2 models with optimized flash attention. To enable flash attention, set the `attention_patch_type` in your yaml like so:

model:
...
attention_patch_type: triton
...

See the [example yaml](https://github.com/mosaicml/llm-foundry/blob/main/mcli/mcli-llama2-finetune.yaml) for a full example of how to finetune Llama-2 on the MosaicML platform.

8-bit Lion ([514](https://github.com/mosaicml/llm-foundry/pull/514))

We have implemented an 8-bit version of the Lion optimizer. This reduces the memory needed per parameter from 12 bits to 9 bits. To switch from Lion to 8-bit Lion, simply change the optimizer name from `decoupled_lionw` to `decoupled_lionw_8b`!

Checkpoint conversion ([526](https://github.com/mosaicml/llm-foundry/pull/526), [#519](https://github.com/mosaicml/llm-foundry/pull/519), [#594](https://github.com/mosaicml/llm-foundry/pull/594))

We've greatly improved our utilities for checkpoint conversion, including generalizing the Composer to Hugging Face conversion script to support all causal LMs, adding a callback to perform the conversion to Hugging Face format during the training job, and support for Faster Transformer conversion from a Composer MPT checkpoint.

To enable the new callback, add the `hf_checkpointer` callback to your yaml like so:

callbacks:
...
hf_checkpointer:
Save a Hugging Face formatted checkpoint at the end of each epoch
save_interval: 1ep
The Hugging Face formatted checkpoints will be saved inside a subfolder called huggingface,
so this folder will likely be the same as your overall save_folder
save_folder: ./{run_name}/checkpoints
Set the precision you want the checkpoint saved in
precision: bfloat16

Code evaluation ([587](https://github.com/mosaicml/llm-foundry/pull/587))

We have added support for running HumanEval (code evaluation) using LLM Foundry! See the [evaluation readme](https://github.com/mosaicml/llm-foundry/tree/main/scripts/eval#incontextlearningcodeevaldataset) for a more detailed description and the [tasks yaml](https://github.com/mosaicml/llm-foundry/blob/main/scripts/eval/yamls/coding_tasks.yaml) for an ICL yaml that can be used to run the HumanEval evaluation task.

Transformer Engine support ([432](https://github.com/mosaicml/llm-foundry/pull/432))

Adds support for using NVIDIA's [Transformer Enginer](https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html) to enable FP8 training. To enable, set `fc_type='te'` and/or `ffn_config['ffn_type']='te_ln_mlp'` and `precision='amp_fp8'`.

MLFlow ([475](https://github.com/mosaicml/llm-foundry/pull/475))

Adds support for using MLFlow as an experiment tracker. To enable, simply add `mlflow` to the `loggers` section of your yaml. See the [Composer docs](https://docs.mosaicml.com/projects/composer/en/stable/api_reference/generated/composer.loggers.MLFlowLogger.html) for more configuration options for MLFlow. Stay tuned for automatic model logging to MLFlow for easy deployment.

Updated streaming version/defaults ([503](https://github.com/mosaicml/llm-foundry/pull/503), [#573](https://github.com/mosaicml/llm-foundry/pull/573), [#580](https://github.com/mosaicml/llm-foundry/pull/580), [#602](https://github.com/mosaicml/llm-foundry/pull/602))

Updates to the latest release of MosaicML [Streaming](https://github.com/mosaicml/streaming/releases/tag/v0.6.0) and sets better defaults for improved shuffling quality and training throughput. Check out the Streaming release notes for the full details of all the new options!

Grouped Query Attention ([492](https://github.com/mosaicml/llm-foundry/pull/492))

Implements Grouped Query Attention, which can strike a good balance between the quality of Multi Head Attention and the speed of Multi Query Attention. To enable, set `attn_config['attn_type']='grouped_query_attention'` and `attn_config['kv_n_heads']` to the desired number of kv heads.

MPT quality of life improvements ([559](https://github.com/mosaicml/llm-foundry/pull/559), [#599](https://github.com/mosaicml/llm-foundry/pull/599))

Thanks to tdoublep and lorabit110 for making MPT a bit easier to use with other parts of the NLP ecosystem!

Eval gauntlet during training, inference API eval wrapper ([501](https://github.com/mosaicml/llm-foundry/pull/501), [#494](https://github.com/mosaicml/llm-foundry/pull/494))

Improvements to our evaluation setup, including the ability to run the eval gauntlet during training, and a wrapper to allow using inference APIs with our eval gauntlet. The ICL tasks and gauntlet can be specified as shown [here](https://github.com/mosaicml/llm-foundry/blob/fd36398dad5ac9fde085af679514189ce9439be4/scripts/eval/yamls/hf_eval.yaml#L46-L47.

tiktoken support ([610](https://github.com/mosaicml/llm-foundry/pull/610))

We have enabled training with tiktoken tokenizers with a thin wrapper around the tiktoken library for compatibility with all the tooling built around Hugging Face tokenizers. You can enable this with a simple change to the tokenizer section of your yaml:

tokenizer:
name: tiktoken
kwargs:
model_name: gpt-4

LoRA eval ([515](https://github.com/mosaicml/llm-foundry/pull/515))

Allows the use of our evaluation script with a model trained using LoRA. Coming soon, full support for LoRA with FSDP! See [this yaml](https://github.com/mosaicml/llm-foundry/blob/main/scripts/eval/yamls/hf_lora_eval.yml) for an example of evaluating a model trained using LoRA. Stay tuned for full LoRA support with FSDP!

Finetuning API

Lastly, we are building a [finetuning API](https://docs.mosaicml.com/projects/mcli/en/latest/training/finetuning.html#finetuning-private-preview) on top of LLM Foundry, Composer, and Streaming. Please [reach out](https://www.mosaicml.com/get-started) if you might be interested in using this API as a customer!

What's Changed

Llm-foundry

Page 1 of 2