LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series.
In addition to the usual bug fixes and performance improvements, we've added lots of new features!
New Features
LoRA Support (with FSDP!) (886)
LLM Foundry now supports LoRA via an integration with the [PEFT library](https://github.com/huggingface/peft). Within LLM Foundry, run `train.py`, adding peft_config arguments to the model section of the config `.yaml`, like so:
model:
...
peft_config:
r: 16
peft_type: LORA
task_type: CAUSAL_LM
lora_alpha: 32
lora_dropout: 0.05
target_modules:
- q_proj
- k_proj
Read more about it in the [tutorial](https://github.com/mosaicml/llm-foundry/blob/main/TUTORIAL.md#can-i-finetune-using-peft--lora).
ALiBi for Flash Attention (820)
We've added support for using ALiBi with Flash Attention (v2.4.2 or higher).
model:
...
attn_config:
attn_impl: flash
alibi: True
Chat Data for Finetuning (884)
We now support finetuning on chat data, with automatic formatting applied using Hugging Face tokenizer [chat templates](https://huggingface.co/docs/transformers/main/en/chat_templating).
Each sample requires a single key `"messages"` that maps to an array of message objects. Each message object in the array represents a single message in the conversation and must contain the following keys:
* `role` : A string indicating the author of the message. Possible values are `"system"` ,`"user"` , and `"assistant"` .
* `content` : A string containing the text of the message.
We require that there must be at least one message with the role "assistant", and the last message in the "messages" array must have the role "assistant" .
Here's an example `.jsonl` with chat data:
{ "messages": [ { "role": "user", "content": "Hi, MPT!" }, { "role": "assistant", "content": "Hi, user!" } ]}
{ "messages": [
{ "role": "system": "A conversation between a user and a helpful and honest assistant"}
{ "role": "user", "content": "Hi, MPT!" },
{ "role": "assistant", "content": "Hi, user!" },
{ "role": "user", "content": "Is multi-turn chat supported?"},
{ "role": "assistant", "content": "Yes, we can chat for as long as my context length allows." }
]}
...
Safe Load for HuggingFace Datasets (798)
We now provide a `safe_load` option when loading HuggingFace datasets for finetuning.
This restricts loaded files to `.jsonl`, `.csv`, or `.parquet` extensions to prevent arbitrary code execution.
To use, set `safe_load` to `true` in your dataset configuration:
train_loader:
name: finetuning
dataset:
safe_load: true
...
New PyTorch, Composer, Streaming, and Transformers versions
As always, we've updated to new versions of the core dependencies of LLM Foundry, bringing better performance, new features, and support for new models (mixtral in particular).
Deprecations
Support for Flash Attention v1 (921)
Will be removed in v0.6.0.
Breaking Changes
Removed support for PyTorch versions before 2.1 (787)
We no longer support PyTorch versions before 2.1.
Removed Deprecated Features (948)
We've removed features that have been deprecated for at least one release.
What's Changed
* Small test fix to have right padding by sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/757
* Release 040 back to main by dakinggg in https://github.com/mosaicml/llm-foundry/pull/758
* Bump composer version to 0.17.1 by irenedea in https://github.com/mosaicml/llm-foundry/pull/762
* Docker release on workflow_dispatch by bandish-shah in https://github.com/mosaicml/llm-foundry/pull/763
* Fix tiktoken wrapper by dakinggg in https://github.com/mosaicml/llm-foundry/pull/761
* enable param group configuration in llm-foundry by vchiley in https://github.com/mosaicml/llm-foundry/pull/760
* Add script for doing bulk generation against an endpoint by aspfohl in https://github.com/mosaicml/llm-foundry/pull/765
* Only strip object names when creating new output path by irenedea in https://github.com/mosaicml/llm-foundry/pull/766
* Add eval loader to eval script by aspfohl in https://github.com/mosaicml/llm-foundry/pull/742
* Support inputs_embeds by samhavens in https://github.com/mosaicml/llm-foundry/pull/687
* Better error message when test does not complete by aspfohl in https://github.com/mosaicml/llm-foundry/pull/769
* Add codeowners by dakinggg in https://github.com/mosaicml/llm-foundry/pull/770
* add single value support to activation_checkpointing_target by cli99 in https://github.com/mosaicml/llm-foundry/pull/772
* Reorganize tests to make them easier to find by aspfohl in https://github.com/mosaicml/llm-foundry/pull/768
* Add "completion" alias for response key by dakinggg in https://github.com/mosaicml/llm-foundry/pull/771
* Shashank/seq id flash attn by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/738
* Fix SIQA gold indices by bmosaicml in https://github.com/mosaicml/llm-foundry/pull/774
* Add missing load_weights_only to example yamls by dakinggg in https://github.com/mosaicml/llm-foundry/pull/776
* Patch flash attn in test to simulate environment without it installed by dakinggg in https://github.com/mosaicml/llm-foundry/pull/778
* Update .gitignore by aspfohl in https://github.com/mosaicml/llm-foundry/pull/781
* Disable mosaicml logger in foundry CI/CD by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/788
* Chat fomating template changes by rajammanabrolu in https://github.com/mosaicml/llm-foundry/pull/784
* Remove tests and support for torch <2.1 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/787
* Fix utf-8 decode errors in tiktoken wrapper by dakinggg in https://github.com/mosaicml/llm-foundry/pull/792
* Update gauntlet v0.2 to reflect results of calibration by bmosaicml in https://github.com/mosaicml/llm-foundry/pull/791
* Remove from mcli.sdk imports by aspfohl in https://github.com/mosaicml/llm-foundry/pull/793
* Auto packing fixes by irenedea in https://github.com/mosaicml/llm-foundry/pull/783
* Enable flag to not pass PAD tokens in ffwd by bcui19 in https://github.com/mosaicml/llm-foundry/pull/775
* Adding a fix for Cross Entropy Loss for long sequence lengths. by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/795
* Minor readme updates and bump min python version by dakinggg in https://github.com/mosaicml/llm-foundry/pull/799
* Enable GLU FFN type by vchiley in https://github.com/mosaicml/llm-foundry/pull/796
* clean up resolve_ffn_hidden_and_exp_ratio by vchiley in https://github.com/mosaicml/llm-foundry/pull/801
* Fix token counting to use attention mask instead of ids by dakinggg in https://github.com/mosaicml/llm-foundry/pull/802
* update openai wrapper to work with tiktoken interface and newest openai version by bmosaicml in https://github.com/mosaicml/llm-foundry/pull/794
* Fix openai not conditioned imports by dakinggg in https://github.com/mosaicml/llm-foundry/pull/806
* Make the ffn activation func configurable by vchiley in https://github.com/mosaicml/llm-foundry/pull/805
* Clean up the logs, bump datasets and transformers by dakinggg in https://github.com/mosaicml/llm-foundry/pull/804
* Fix remote path check for UC volumes by irenedea in https://github.com/mosaicml/llm-foundry/pull/809
* Expand options for MMLU. by mansheej in https://github.com/mosaicml/llm-foundry/pull/811
* Async eval callback by aspfohl in https://github.com/mosaicml/llm-foundry/pull/702
* Updating the Flash Attention version to fix cross entropy loss by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/812
* Remove redundant transposes for rope rotation by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/807
* Add generic flatten imports to HF checkpointer by b-chu in https://github.com/mosaicml/llm-foundry/pull/814
* Fix token counting to allow there to be no attention mask by dakinggg in https://github.com/mosaicml/llm-foundry/pull/818
* Default to using tokenizer eos and bos in convert_text_to_mds.py by irenedea in https://github.com/mosaicml/llm-foundry/pull/823
* Revert "Default to using tokenizer eos and bos in convert_text_to_mds.py" by irenedea in https://github.com/mosaicml/llm-foundry/pull/825
* Bump turbo version to 0.0.7 by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/827
* Align GLU implementation with LLaMa by vchiley in https://github.com/mosaicml/llm-foundry/pull/829
* Use `sync_module_states: True` when using HSDP by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/830
* Update composer to 0.17.2 and streaming to 0.7.2 by irenedea in https://github.com/mosaicml/llm-foundry/pull/822
* zero bias conversion corrected by megha95 in https://github.com/mosaicml/llm-foundry/pull/624
* Bump einops version, which has improved support for torch compile by sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/832
* Update README with links to ML HW resources by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/833
* Add safe_load option to restrict HF dataset downloads to allowed file types by irenedea in https://github.com/mosaicml/llm-foundry/pull/798
* Adding support for alibi when using flash attention by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/820
* Shashank/new benchmarks by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/838
* Fix error when decoding a token in the id gap (or out of range) in a tiktoken tokenizer by dakinggg in https://github.com/mosaicml/llm-foundry/pull/841
* Add use_tokenizer_eos option to convert text to mds script by irenedea in https://github.com/mosaicml/llm-foundry/pull/843
* Disable Environment Variable Resolution by irenedea in https://github.com/mosaicml/llm-foundry/pull/845
* Bump pre-commit version by b-chu in https://github.com/mosaicml/llm-foundry/pull/847
* Fix typo kwargs=>hf_kwargs by irenedea in https://github.com/mosaicml/llm-foundry/pull/853
* Remove foundry time wrangling by aspfohl in https://github.com/mosaicml/llm-foundry/pull/855
* Minor cleanups by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/858
* Read UC delta table by XiaohanZhangCMU in https://github.com/mosaicml/llm-foundry/pull/773
* Remove fused layernorm (deprecated in composer) by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/859
* Remove hardcoded combined.jsonl with a flag by XiaohanZhangCMU in https://github.com/mosaicml/llm-foundry/pull/861
* Bump to turbo v8 by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/828
* Always initialize dist by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/864
* Logs upload URI by milocress in https://github.com/mosaicml/llm-foundry/pull/850
* Delta to JSONL conversion script cleanup and bug fix by nancyhung in https://github.com/mosaicml/llm-foundry/pull/868
* Fix MLFlowLogger mock in tests by jerrychen109 in https://github.com/mosaicml/llm-foundry/pull/872
* [XS] Fix delta conversion script regex bug by nancyhung in https://github.com/mosaicml/llm-foundry/pull/877
* Precompute flash attention padding info by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/880
* Add GQA to __init__.py by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/882
* fsdp wrap refac by vchiley in https://github.com/mosaicml/llm-foundry/pull/883
* Update model download utils to support ORAS by jerrychen109 in https://github.com/mosaicml/llm-foundry/pull/881
* Update license by b-chu in https://github.com/mosaicml/llm-foundry/pull/887
* Fix tiktoken tokenizer add_generation_prompt by irenedea in https://github.com/mosaicml/llm-foundry/pull/890
* Upgrade `datasets` version by dakinggg in https://github.com/mosaicml/llm-foundry/pull/892
* Bump transformers version to support Mixtral by dakinggg in https://github.com/mosaicml/llm-foundry/pull/894
* Add `tokenizer-only` flag to only download tokenizers from HF or oras by irenedea in https://github.com/mosaicml/llm-foundry/pull/895
* Foundational Model API eval wrapper by aspfohl in https://github.com/mosaicml/llm-foundry/pull/849
* Add better error for non-empty local output folder in convert_text_to_mds.py by irenedea in https://github.com/mosaicml/llm-foundry/pull/891
* Allow bool input for loggers by ngcgarcia in https://github.com/mosaicml/llm-foundry/pull/897
* Enable QK Group Norm by vchiley in https://github.com/mosaicml/llm-foundry/pull/869
* Workflow should not have leading ./ by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/905
* Add new GC option by dakinggg in https://github.com/mosaicml/llm-foundry/pull/907
* No symlinks at all for HF download by jerrychen109 in https://github.com/mosaicml/llm-foundry/pull/908
* Adds support for chat formatted finetuning input data. by milocress in https://github.com/mosaicml/llm-foundry/pull/884
* Add flag to enable/disable param upload by ngcgarcia in https://github.com/mosaicml/llm-foundry/pull/912
* Add support for eval_loader & eval_subset_num_batches in async callback by aspfohl in https://github.com/mosaicml/llm-foundry/pull/834
* Add the model license file for mlflow by dakinggg in https://github.com/mosaicml/llm-foundry/pull/915
* Warn instead of error on tokenizer-only download with http by jerrychen109 in https://github.com/mosaicml/llm-foundry/pull/904
* Fix fmapi_chat for instruct models and custom tokenizers by aspfohl in https://github.com/mosaicml/llm-foundry/pull/914
* Make yamllint consistent with Composer by b-chu in https://github.com/mosaicml/llm-foundry/pull/918
* Create HF checkpointer model on meta device by dakinggg in https://github.com/mosaicml/llm-foundry/pull/916
* Tiktoken chat format fix by rajammanabrolu in https://github.com/mosaicml/llm-foundry/pull/893
* fix dash issue by milocress in https://github.com/mosaicml/llm-foundry/pull/919
* Fix yaml linting by b-chu in https://github.com/mosaicml/llm-foundry/pull/920
* Adding deprecation warning for Flash Attention 1 and user warning against using Triton attention. by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/921
* Add rich formatting to tracebacks by jjanezhang in https://github.com/mosaicml/llm-foundry/pull/927
* Fix docker workflow caching by irenedea in https://github.com/mosaicml/llm-foundry/pull/930
* Remove .ci folder and move FILE_HEADER by irenedea in https://github.com/mosaicml/llm-foundry/pull/931
* Throw error when no EOS by KuuCi in https://github.com/mosaicml/llm-foundry/pull/922
* Bump composer to 0.19 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/934
* Update eval_gauntlet_callback.py with math.log2 by Skylion007 in https://github.com/mosaicml/llm-foundry/pull/821
* Switch to the Composer integration of LoRA (works with FSDP) by dakinggg in https://github.com/mosaicml/llm-foundry/pull/886
* Refactoring the `add_metrics_to_eval_loaders` function to accept list of metric names instead of a dictionary of metrics. by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/938
* Fix an extra call to load state dict and type cast in hf checkpointer by dakinggg in https://github.com/mosaicml/llm-foundry/pull/939
* Fixing the gen_attention_mask_in_length function to handle the case when sequence id is -1 due to attention masking by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/940
* Update lora docs by dakinggg in https://github.com/mosaicml/llm-foundry/pull/941
* Bump FAv2 setup.py by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/942
* Retrieve license information when local files are provided for a pretrained model by jerrychen109 in https://github.com/mosaicml/llm-foundry/pull/943
* Add and use VersionedDeprecationWarning by irenedea in https://github.com/mosaicml/llm-foundry/pull/944
* Bump llm-foundry version to 0.5.0 by irenedea in https://github.com/mosaicml/llm-foundry/pull/948
New Contributors
* megha95 made their first contribution in https://github.com/mosaicml/llm-foundry/pull/624
* milocress made their first contribution in https://github.com/mosaicml/llm-foundry/pull/850
* nancyhung made their first contribution in https://github.com/mosaicml/llm-foundry/pull/868
* ngcgarcia made their first contribution in https://github.com/mosaicml/llm-foundry/pull/897
* KuuCi made their first contribution in https://github.com/mosaicml/llm-foundry/pull/922
* Skylion007 made their first contribution in https://github.com/mosaicml/llm-foundry/pull/821
**Full Changelog**: https://github.com/mosaicml/llm-foundry/compare/v0.4.0...v0.5.0