LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLM). LLM Foundry serves as the efficient training codebase for the [MPT-7B](https://www.mosaicml.com/blog/mpt-7b) and [MPT-30B](https://www.mosaicml.com/blog/mpt-30b) models. Our emphasis is on efficiency, scalability, and ease-of-use, to enable fast iteration and prototyping.
We are excited to share the release of `v0.2.0`, packed with support for new hardware, features, and tutorials.
📖 Tutorials
We have released new tutorial content and helper scripts for dataset preparation, pre-training, fine-tuning, and inference!
To start off, a basic walkthrough and answers to FAQs can be found in our [Basic Tutorial](https://github.com/mosaicml/llm-foundry/blob/v0.2.0/TUTORIAL.md).
Next, detailed guides for different workflows are linked below:
Training
1. [Part 1: LLM Pretraining](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/train#llmpretraining)
1. [Installation](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/train#installation)
2. [Dataset Preparation](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/train#datasetpreparation)
3. [How to start single and multi-node pretraining](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/train#howtostartpretraining)
2. [Part 2: LLM Finetuning](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/train#llmfinetuning)
1. [Using a dataset on the HuggingFace Hub](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/train#hfdataset)
2. [Using a local dataset](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/train#2-using-a-local-dataset-)
3. [Using a StreamingDataset (MDS) formatted dataset locally or in an object store](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/train#mdsdataset)
In addition, for a more advanced and self-contained example of finetuning the MPT-7B model, see [Finetune Example](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/train/finetune_example).
Inference
The inference tutorials cover several new features we've added that improve integration with HuggingFace and FasterTransformer libraries:
- [Converting a Composer checkpoint to an HF checkpoint folder](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/inference#converting-a-composer-checkpoint-to-an-hf-checkpoint-folder)
- [Interactive Generation with HF models](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/inference#interactive-generation-with-hf-models)
- [Interactive Chat with HF models](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/inference#interactive-chat-with-hf-models)
- [Converting an HF model to ONNX](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/inference#converting-an-hf-model-to-onnx)
- [Converting an HF MPT to FasterTransformer](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/inference#converting-an-hf-mpt-to-fastertransformer)
- [Running MPT with FasterTransformer](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/inference#running-mpt-with-fastertransformer)
Major Features
LLM Foundry now uses Composer `v0.15.0` and Streaming `v0.5.1` as minimum requirements. For more details, see their release notes for [Composer](https://github.com/mosaicml/composer/releases) and [Streaming](https://github.com/mosaicml/streaming/releases) for all the improvements.
⚠️ The new Streaming release includes a few API changes, see the [Streaming v0.5](https://github.com/mosaicml/streaming/releases/tag/v0.5.0) release notes for more details. Our API have also been changed to reflect these API modifications.
1. 🆕 **Torch 2.0 support**
LLM Foundry is now Torch 2.0 compatible!
Note: we have not tested `torch.compile`, but do not expect significant performance improvements.
2. ⚡ **H100 Support**
We now support NVIDIA H100 systems! See our blog post on [Benchmarking LLMs on H100 GPUs](https://www.mosaicml.com/blog/coreweave-nvidia-h100-part-1) for initial performance and convergence details.
To run LLM Foundry with NVIDIA H100 systems, be sure to use a docker images that has CUDA 11.8+ and PyTorch 2.0+ versions.
For example, `mosaicml/pytorch:2.0.1_cu118-python3.10-ubuntu20.04` from our dockerhub has been tested with NVIDIA H100 systems.
No code changes should be required.
3. 📈 **AMD MI250 GPU Support**
With the release of PyTorch 2.0 and ROCm 5.4+, excited to share that LLM training now works out of the box on AMD Datacenter GPUs! Read our blog post on [Training LLMs with AMD MI250 GPUs](https://www.mosaicml.com/blog/amd-mi250) for more details.
Running with our stack was straightforward: use the ROCm 5.4 docker image `rocm/dev-ubuntu-20.04:5.4.3-complete`; and install PyTorch for ROCm 5.4 and [install Flash Attention](https://github.com/ROCmSoftwarePlatform/flash-attention/tree/flash_attention_for_rocm2#amd-gpurocm-support).
Modify your configuration settings:
* `attn_impl=flash` instead of the default `triton`
* Note: ALiBi is currently not supported with `attn_impl=flash`.
* `loss_fn=torch_crossentropy` instead of the default `fused_crossentropy`.
4. 🚧 **LoRA finetuning** (*Preview*)
We have included a *preview* release of Low Rank Adaptation (LoRA) support for memory-efficient fine-tuning of LLMs ([Shen et al, 2021](https://arxiv.org/abs/2106.09685)).
To use LoRA, follow the instructions found [here](https://github.com/mosaicml/llm-foundry/blob/v0.2.0/TUTORIAL.md#can-i-finetune-using-peft--lora).
Note: This is a preview feature, please let us know any feedback! The API and support is subject to change.
5. 🔎 **Evaluation Refactor** (308)
Our evaluation suite has been significantly refactored into our [Model Gauntlet](https://github.com/mosaicml/llm-foundry/blob/v0.2.0/scripts/eval/local_data/MODEL_GAUNTLET.md) approach. This includes a number of breaking API changes to support multiple models:
* Instead of `model`, use the `models` keyword and provide a list of models.
* `tokenizer` is now model-specific.
For example, to run the gauntlet of various eval tasks with `mosaicml/mpt-7b`:
cd llm-foundry/scripts
composer eval/eval.py eval/yamls/hf_eval.yaml
model_name_or_path=mosaicml/mpt-7b
This release also makes evaluation deterministic even on different number of GPUs.
For more details on all these changes, see 308
6. ⏱️ **Benchmarking Inference**
To better support the deployment of LLMs, we have included inference [benchmarking](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/inference/benchmarking) suite and results across different hardware and other LLM models.
PR List
* hf dict cfg overrides by vchiley in https://github.com/mosaicml/llm-foundry/pull/90
* Add slack and license buttons to readme by growlix in https://github.com/mosaicml/llm-foundry/pull/98
* Add minimum `mosaicml-streaming` version by hanlint in https://github.com/mosaicml/llm-foundry/pull/110
* Update dataloader.py by nelsontkq in https://github.com/mosaicml/llm-foundry/pull/102
* Add features to hf_generate by alextrott16 in https://github.com/mosaicml/llm-foundry/pull/116
* Make mpt7b finetuning more obvious by samhavens in https://github.com/mosaicml/llm-foundry/pull/101
* Fix(finetune yaml): fix parameters in mpt-7b_dolly_sft.yaml by alanxmay in https://github.com/mosaicml/llm-foundry/pull/131
* Fix HF conversion script to upload to S3 after editing the files to be HF compatible by dakinggg in https://github.com/mosaicml/llm-foundry/pull/136
* Set pad_token_id to tokenizer.pad_token_id if not set on command line by patrickhwood in https://github.com/mosaicml/llm-foundry/pull/118
* Changed the keep_zip default to False to comply with StreamingDataset by karan6181 in https://github.com/mosaicml/llm-foundry/pull/150
* Add cloud upload to checkpoint conversion script by dakinggg in https://github.com/mosaicml/llm-foundry/pull/151
* Adds precision to eval by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/148
* Update StreamingDataset defaults by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/157
* Explain `composer` command by hanlint in https://github.com/mosaicml/llm-foundry/pull/164
* Remove `pynvml` by hanlint in https://github.com/mosaicml/llm-foundry/pull/165
* Adds a concrete finetuning example from a custom dataset by alextrott16 in https://github.com/mosaicml/llm-foundry/pull/156
* Remove health checker by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/167
* Rename datasets to avoid hf conflict by hanlint in https://github.com/mosaicml/llm-foundry/pull/175
* Torch2 (177) by vchiley in https://github.com/mosaicml/llm-foundry/pull/178
* Revert "Torch2 (177) (178)" by dakinggg in https://github.com/mosaicml/llm-foundry/pull/181
* clean up dataset conversion readme by codestar12 in https://github.com/mosaicml/llm-foundry/pull/168
* Convert MPT checkpoints to FT format by dskhudia in https://github.com/mosaicml/llm-foundry/pull/169
* Update README.md by jacobfulano in https://github.com/mosaicml/llm-foundry/pull/198
* Removed unused `tokenizer_name` config field by dakinggg in https://github.com/mosaicml/llm-foundry/pull/206
* Add community links to README by hanlint in https://github.com/mosaicml/llm-foundry/pull/182
* Add Tensorboard logger to yaml config by hanlint in https://github.com/mosaicml/llm-foundry/pull/166
* Update inference README by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/204
* torch2 updt with hf fixes by vchiley in https://github.com/mosaicml/llm-foundry/pull/193
* Removing deprecated vocabulary size parameter from composer CE metrics by sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/222
* Add `composer[libcloud]` dependency by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/218
* Use $RUN_NAME rather than $COMPOSER_RUN_NAME by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/209
* Fixing benchmark mcli example with proper path and image by sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/219
* Update README.md - Slack Link by ejyuen in https://github.com/mosaicml/llm-foundry/pull/207
* Kv cache speed by vchiley in https://github.com/mosaicml/llm-foundry/pull/210
* Fix a race condition in ICL eval by dakinggg in https://github.com/mosaicml/llm-foundry/pull/235
* Add basic issue templates by dakinggg in https://github.com/mosaicml/llm-foundry/pull/252
* Add a script to run mpt with FasterTransformer by dskhudia in https://github.com/mosaicml/llm-foundry/pull/229
* Change mcli eval YAMLs to use `mixed_precision: FULL` by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/255
* Bump Xentropy Version by nik-mosaic in https://github.com/mosaicml/llm-foundry/pull/261
* updt tritonpremlir to sm90 version by vchiley in https://github.com/mosaicml/llm-foundry/pull/260
* Add `mosaicml/llm-foundry` Docker workflow by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/254
* Patch README for better visibility by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/267
* Add support for `device_map` by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/225
* Fix model init when using 1 GPU by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/269
* Update README.md by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/268
* Update mcp_pytest.py by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/274
* Fastertransformer: replace config['mpt'] with config['gpt'] by dwyatte in https://github.com/mosaicml/llm-foundry/pull/272
* Add `device_map` support for `hf_generate.py` and `hf_chat.py` by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/276
* Add shift_labels arg to HF wrappers by dakinggg in https://github.com/mosaicml/llm-foundry/pull/288
* Update README.md by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/294
* Small formatting fix in eval README by sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/285
* Default to debug level debug by samhavens in https://github.com/mosaicml/llm-foundry/pull/299
* Sam/chat v2 by samhavens in https://github.com/mosaicml/llm-foundry/pull/296
* Add `save_weights_only` as an option by dakinggg in https://github.com/mosaicml/llm-foundry/pull/301
* Adding Custom Embedding, Enabling us to initialize on Heterogeneous Devices by bcui19 in https://github.com/mosaicml/llm-foundry/pull/298
* Fix convert_dataset_hf.py hanging with excessive num_workers by casperbh96 in https://github.com/mosaicml/llm-foundry/pull/270
* Update README.md by jacobfulano in https://github.com/mosaicml/llm-foundry/pull/300
* Fix autocast dtype by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/302
* Set eval shuffle to False by eldarkurtic in https://github.com/mosaicml/llm-foundry/pull/297
* Huggingface Mixed Initialization by bcui19 in https://github.com/mosaicml/llm-foundry/pull/303
* Added new community tutorial on MPT-7B-Instruct Fine Tuning by VRSEN in https://github.com/mosaicml/llm-foundry/pull/311
* Fix generate callback to work with precision context by dakinggg in https://github.com/mosaicml/llm-foundry/pull/322
* Allow MPT past the tied word embeddings error by dakinggg in https://github.com/mosaicml/llm-foundry/pull/323
* Refresh Mosaicml platform yamls by aspfohl in https://github.com/mosaicml/llm-foundry/pull/208
* hard set bias(alibi) precision by vchiley in https://github.com/mosaicml/llm-foundry/pull/329
* Create tasks_light.yaml by jfrankle in https://github.com/mosaicml/llm-foundry/pull/335
* Attn amp by vchiley in https://github.com/mosaicml/llm-foundry/pull/337
* Load on rank 0 only flag by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/334
* Add mixed device by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/342
* Better error messages for ckpt conversion script by dskhudia in https://github.com/mosaicml/llm-foundry/pull/320
* Add script to update hub code from foundry by dakinggg in https://github.com/mosaicml/llm-foundry/pull/338
* Upgrade to `mosaicml-streaming==0.5.x` by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/292
* updt composer to 0.15.0 by vchiley in https://github.com/mosaicml/llm-foundry/pull/347
* updt yml by vchiley in https://github.com/mosaicml/llm-foundry/pull/349
* Fix bug with saving optimizer states with MonolithicCheckpointSaver Callback by eracah in https://github.com/mosaicml/llm-foundry/pull/310
* Add step to free up some disk space on the worker by bandish-shah in https://github.com/mosaicml/llm-foundry/pull/350
* Filter out sequences where prompt is longer than max length, rather than dropping them on the fly later by dakinggg in https://github.com/mosaicml/llm-foundry/pull/348
* Revert "Filter out sequences where prompt is longer than max length, rather than dropping them on the fly later" by codestar12 in https://github.com/mosaicml/llm-foundry/pull/354
* Remote JSONL IFT data by samhavens in https://github.com/mosaicml/llm-foundry/pull/275
* Add MPT-30B to README by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/356
* Codeql on PRs by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/352
* Add secrets check as part of pre-commit by karan6181 in https://github.com/mosaicml/llm-foundry/pull/360
* Onboarding tutorial and related improvements by alextrott16 in https://github.com/mosaicml/llm-foundry/pull/205
* fixed rmsnorm bug. Changed division to multiply since using torch.rsqrt by vancoykendall in https://github.com/mosaicml/llm-foundry/pull/372
* Adds max seq len filter before finetuning ds by vchiley in https://github.com/mosaicml/llm-foundry/pull/359
* Feature/peft compatible models by danbider in https://github.com/mosaicml/llm-foundry/pull/346
* Fix Typing (part 1) by hanlint in https://github.com/mosaicml/llm-foundry/pull/240
* improve hf_chat UI and readme by samhavens in https://github.com/mosaicml/llm-foundry/pull/351
* Update onnx by vchiley in https://github.com/mosaicml/llm-foundry/pull/385
* Model gauntlet by bmosaicml in https://github.com/mosaicml/llm-foundry/pull/308
* Add 30b IFT example yaml by samhavens in https://github.com/mosaicml/llm-foundry/pull/388
* Add benchmarks to inference README by sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/393
* updt install instructions by vchiley in https://github.com/mosaicml/llm-foundry/pull/396
* update quickstart eval task by vchiley in https://github.com/mosaicml/llm-foundry/pull/395
* Correct small typo in README.md by jacobfulano in https://github.com/mosaicml/llm-foundry/pull/391
* make peft installs a extra_dep by vchiley in https://github.com/mosaicml/llm-foundry/pull/397
* add fn to clear tests after every test by vchiley in https://github.com/mosaicml/llm-foundry/pull/400
* propagate cache_limit in streaming ds by vchiley in https://github.com/mosaicml/llm-foundry/pull/402
* Fixing hf_generate bug to account for pre-tokenization by ksreenivasan in https://github.com/mosaicml/llm-foundry/pull/387
* Eval Quickstart by samhavens in https://github.com/mosaicml/llm-foundry/pull/398
* Clean up train README by jacobfulano in https://github.com/mosaicml/llm-foundry/pull/392
* Fix/bugbash002 by danbider in https://github.com/mosaicml/llm-foundry/pull/405
* add install for AMD beta support by vchiley in https://github.com/mosaicml/llm-foundry/pull/407
* updt dtype of causal mask by vchiley in https://github.com/mosaicml/llm-foundry/pull/408
* YAMLS for MPT runs inherit global max_seq_len in model config by alextrott16 in https://github.com/mosaicml/llm-foundry/pull/409
* Update mcli-hf-eval.yaml by samhavens in https://github.com/mosaicml/llm-foundry/pull/411
* Edit tutorial comments on PEFT / LoRA by vchiley in https://github.com/mosaicml/llm-foundry/pull/416
* rm peft from pypi package by vchiley in https://github.com/mosaicml/llm-foundry/pull/420
* Update tasks_light.yaml by jfrankle in https://github.com/mosaicml/llm-foundry/pull/422
New Contributors
* nelsontkq made their first contribution in https://github.com/mosaicml/llm-foundry/pull/102
* samhavens made their first contribution in https://github.com/mosaicml/llm-foundry/pull/101
* alanxmay made their first contribution in https://github.com/mosaicml/llm-foundry/pull/131
* patrickhwood made their first contribution in https://github.com/mosaicml/llm-foundry/pull/118
* karan6181 made their first contribution in https://github.com/mosaicml/llm-foundry/pull/150
* dskhudia made their first contribution in https://github.com/mosaicml/llm-foundry/pull/169
* nik-mosaic made their first contribution in https://github.com/mosaicml/llm-foundry/pull/261
* dwyatte made their first contribution in https://github.com/mosaicml/llm-foundry/pull/272
* casperbh96 made their first contribution in https://github.com/mosaicml/llm-foundry/pull/270
* eldarkurtic made their first contribution in https://github.com/mosaicml/llm-foundry/pull/297
* VRSEN made their first contribution in https://github.com/mosaicml/llm-foundry/pull/311
* aspfohl made their first contribution in https://github.com/mosaicml/llm-foundry/pull/208
* jfrankle made their first contribution in https://github.com/mosaicml/llm-foundry/pull/335
* eracah made their first contribution in https://github.com/mosaicml/llm-foundry/pull/310
* bandish-shah made their first contribution in https://github.com/mosaicml/llm-foundry/pull/350
* vancoykendall made their first contribution in https://github.com/mosaicml/llm-foundry/pull/372
* danbider made their first contribution in https://github.com/mosaicml/llm-foundry/pull/346
* ksreenivasan made their first contribution in https://github.com/mosaicml/llm-foundry/pull/387
**Full Changelog**: https://github.com/mosaicml/llm-foundry/compare/v0.1.1...v0.2.0