Llm-foundry

Latest version: v0.15.0

Safety actively analyzes 682471 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 4

0.4.0

LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT-7B and MPT-30B models.

In addition to the usual bug fixes and performance improvements, we've added lots of new features!

New Features

Automatic sequence packing ([683](https://github.com/mosaicml/llm-foundry/pull/683))

You can now specify `packing_ratio: auto` under your finetuning dataset, to automatically profile and select a good packing ratio to efficiently pack your sequences together on the fly during finetuning. This can dramatically reduce the amount of compute wasted on padding tokens.

Flash Attention 2 ([651](https://github.com/mosaicml/llm-foundry/pull/651), [#666](https://github.com/mosaicml/llm-foundry/pull/666), [#672](https://github.com/mosaicml/llm-foundry/pull/672))

We now support using [Flash Attention 2](https://arxiv.org/abs/2307.08691) both in MPT and in any model that supports Flash Attention 2 via the Transformers library. See the [training instructions](https://github.com/mosaicml/llm-foundry/tree/main/scripts/train#using-flash-attention-) to learn how to use the different versions of Flash Attention.

New PyTorch, Composer, Streaming, and Transformers versions ([648](https://github.com/mosaicml/llm-foundry/pull/648), [#672](https://github.com/mosaicml/llm-foundry/pull/672), [#736](https://github.com/mosaicml/llm-foundry/pull/736))

As always, we've updated to new versions of the core dependencies of LLM Foundry, bringing better performance, new features, and support for new models (codellama and mistral in particular).

Easy Databricks model deployment ([618](https://github.com/mosaicml/llm-foundry/pull/618))

We've made it much easier to go from a training run to a served model using Databricks model serving. To make use of this feature, you need to specify both an `MLFlowLogger` and a `HuggingFaceCheckpointer` for your run.

The `MLFlowLogger` should have a Unity Catalog model registry prefix in the form of `catalog.schema`. This specifies where to register your models to. For example,

loggers:
mlflow:
experiment_name: /Users/first.lastemail.com/my_experiment_name
tracking_uri: databricks
model_registry_prefix: catalog.schema
model_registry_uri: databricks-uc


The `HuggingFaceCheckpointer` should specify the name you want to register the model under. For example,

callbacks:
hf_checkpointer:
save_interval: 1ep Save Hugging Face formatted checkpoints each epoch
save_folder: s3://bucket/path/to/my/checkpoints
mlflow_registered_model_name: my_model_name Final model will be registered to catalog.schema.my_model_name


MPT model configurations

We've added a few new options when training with the MPT architecture in LLM Foundry.

- Rotary embeddings ([675](https://github.com/mosaicml/llm-foundry/pull/675))
- (Un)Tied word embeddings ([728](https://github.com/mosaicml/llm-foundry/pull/728))
- Fine grained activation checkpointing ([720](https://github.com/mosaicml/llm-foundry/pull/720))

Evaluation Improvements

We've released v0.1 of our Eval Gauntlet ([674](https://github.com/mosaicml/llm-foundry/pull/674), [#748](https://github.com/mosaicml/llm-foundry/pull/748))! This adds many new benchmarks, chain-of-thought, and a new safety category. Check out the [README](https://github.com/mosaicml/llm-foundry/blob/main/scripts/eval/local_data/EVAL_GAUNTLET.md) for full details!

In addition, we've made a few improvements to our evaluation options, with more to come!
- Allow specifying multiple evaluation datasets to compute cross entropy and perplexity on during training ([603](https://github.com/mosaicml/llm-foundry/pull/603))
- Easier versions of the HumanEval dataset, which can be useful for comparing smaller models ([645](https://github.com/mosaicml/llm-foundry/pull/645))
- More options for averaging the results of the Eval Gauntlet ([640](https://github.com/mosaicml/llm-foundry/pull/640))

New pretraining benchmarks ([543](https://github.com/mosaicml/llm-foundry/pull/543))

Added H100 profiling results to our [benchmarking table](https://github.com/mosaicml/llm-foundry/blob/main/scripts/train/benchmarking/README.md).

Quality of life improvements

- Improved [Generate](https://github.com/mosaicml/composer/blob/dev/composer/callbacks/generate.py) callback with more logging options. Use the `Generate` callback to log generations from your model over the course of training. ([#631](https://github.com/mosaicml/llm-foundry/pull/631))
- Count number of tokens during training _excluding_ padding tokens. Previously this count _included_ padding tokens. ([676](https://github.com/mosaicml/llm-foundry/pull/676))
- Use the PyTorch profiler to profile your training runs. ([678](https://github.com/mosaicml/llm-foundry/pull/678))
- A [convenience script](https://github.com/mosaicml/llm-foundry/blob/main/scripts/misc/download_hf_model.py) for using the much faster Hugging Face `snapshot_download` to download models from the Hugging Face Hub. ([#708](https://github.com/mosaicml/llm-foundry/pull/708))
- New [AWS specific Docker images](https://github.com/mosaicml/llm-foundry#mosaicml-docker-images) with LLM Foundry dependencies pre-installed. ([731](https://github.com/mosaicml/llm-foundry/pull/731))

Experimental features

Inverse square root learning rate scheduler ([657](https://github.com/mosaicml/llm-foundry/pull/657))

We've added experimental support for the [inverse square root learning rate scheduler](https://github.com/mosaicml/llm-foundry/blob/1793c366fd49f96685481e086d7584f21afef450/llmfoundry/optim/scheduler.py#L40).

Breaking changes

Updated Streaming defaults ([723](https://github.com/mosaicml/llm-foundry/pull/723))

We've upgraded to the latest Streaming version, including vastly improved default settings for partitioning and shuffling. This means that if you were using the defaults, you will get different results after upgrading. The new defaults should be more performant for the large majority of use cases. See the [Streaming release notes](https://github.com/mosaicml/streaming/releases/tag/v0.7.0) for more details.

Removed support for PrefixLM for Bloom and OPT models ([704](https://github.com/mosaicml/llm-foundry/pull/704))

We occasionally remove unused experimental parts of the code base to focus on new features and better support for existing features, and we've removed support for PrefixLM applied to Bloom and OPT models in this release.



What's Changed
* Multi eval dataset logging by snarayan21 in https://github.com/mosaicml/llm-foundry/pull/603
* Merge release 0.3.0 back to main by dakinggg in https://github.com/mosaicml/llm-foundry/pull/635
* Add tmp path retention policy by j316chuck in https://github.com/mosaicml/llm-foundry/pull/641
* Add flag to disable train metrics by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/642
* Update pins to latest version that were missed by dakinggg in https://github.com/mosaicml/llm-foundry/pull/646
* Fix overriding of rope_scaling config by dakinggg in https://github.com/mosaicml/llm-foundry/pull/644
* Add 2.1 images to docker workflow and tests by dakinggg in https://github.com/mosaicml/llm-foundry/pull/648
* Fixes to lion8b test for torch 2.1 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/649
* Only log "changing autoresume" when actually changing by aspfohl in https://github.com/mosaicml/llm-foundry/pull/653
* Fix lion8b error correction with torch 2.1 by dblalock in https://github.com/mosaicml/llm-foundry/pull/656
* Clean up processes between distributed gpu tests by j316chuck in https://github.com/mosaicml/llm-foundry/pull/660
* Revert "Clean up processes between distributed gpu tests (660)" by j316chuck in https://github.com/mosaicml/llm-foundry/pull/662
* Switch ordering of foundry gpu tests by j316chuck in https://github.com/mosaicml/llm-foundry/pull/665
* Change batch size on coding tasks to 1 to avoid OOM by bmosaicml in https://github.com/mosaicml/llm-foundry/pull/654
* Add images with flash attention 2 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/651
* Fix yaml change by dakinggg in https://github.com/mosaicml/llm-foundry/pull/667
* Revert actions change by dakinggg in https://github.com/mosaicml/llm-foundry/pull/668
* Inverse Square Root LR Schedule by mansheej in https://github.com/mosaicml/llm-foundry/pull/657
* Add test suite for flash attention 2 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/666
* Adding Simplified Coding Tasks by mcarbin in https://github.com/mosaicml/llm-foundry/pull/645
* Fix typo in image name by dakinggg in https://github.com/mosaicml/llm-foundry/pull/669
* Point to composer.callback.Generate by aspfohl in https://github.com/mosaicml/llm-foundry/pull/631
* Do not update past_key_values in place by irenedea in https://github.com/mosaicml/llm-foundry/pull/652
* Fix small typos in the eval readme by maxisawesome in https://github.com/mosaicml/llm-foundry/pull/671
* Convert to DataSpec and add token counts that include padding by dakinggg in https://github.com/mosaicml/llm-foundry/pull/676
* Add support for automatically registering models to UC at the end of training by dakinggg in https://github.com/mosaicml/llm-foundry/pull/618
* add `load_strict_model_weights` as an optional config parameter by AllenHW in https://github.com/mosaicml/llm-foundry/pull/655
* Small changes to HF repo update script by dakinggg in https://github.com/mosaicml/llm-foundry/pull/680
* Add profiler support in llm foundry by j316chuck in https://github.com/mosaicml/llm-foundry/pull/678
* Update_pretrain_benchmarks by crinard in https://github.com/mosaicml/llm-foundry/pull/543
* add |---| to render tables correctly by crinard in https://github.com/mosaicml/llm-foundry/pull/686
* Adding Mosaic logger + logging data validated event by jjanezhang in https://github.com/mosaicml/llm-foundry/pull/670
* Tiktoken wrapper add_eos_token option by rajammanabrolu in https://github.com/mosaicml/llm-foundry/pull/681
* Attempt to fix flaky test by dakinggg in https://github.com/mosaicml/llm-foundry/pull/688
* Allow flash attention 2 and upgrade to transformers 4.34.1 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/672
* Fix mlflow model logging bug by dakinggg in https://github.com/mosaicml/llm-foundry/pull/692
* Add fixtures by irenedea in https://github.com/mosaicml/llm-foundry/pull/673
* Make default for cuda_load_lazy false by irenedea in https://github.com/mosaicml/llm-foundry/pull/694
* Update README.md by j316chuck in https://github.com/mosaicml/llm-foundry/pull/693
* Pad tiktoken vocab so that additional_special_tokens works by dakinggg in https://github.com/mosaicml/llm-foundry/pull/695
* Remove live logs to be consistent with Composer by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/698
* Change gauntlet avging by bmosaicml in https://github.com/mosaicml/llm-foundry/pull/640
* Remove prefixlm support for OPT and Bloom by dakinggg in https://github.com/mosaicml/llm-foundry/pull/704
* Fix attention patch compatibility for llama2 by irenedea in https://github.com/mosaicml/llm-foundry/pull/705
* Add test coverage for lion and lion8b checkpoint interop by dblalock in https://github.com/mosaicml/llm-foundry/pull/679
* Improvement in README.md and TUTORIAL.md by tmsagarofficial in https://github.com/mosaicml/llm-foundry/pull/699
* Make TiktokenTokenizerWrapper picklable by irenedea in https://github.com/mosaicml/llm-foundry/pull/700
* Add num_proc to map and filter calls by dakinggg in https://github.com/mosaicml/llm-foundry/pull/706
* Fix HF local module copy contention with a meta init on local rank 0 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/710
* Add support for auto packing ratio by irenedea in https://github.com/mosaicml/llm-foundry/pull/683
* Remove HumanEval tasks from ICL eval by tbarton16 in https://github.com/mosaicml/llm-foundry/pull/715
* Allow logging metadata by dakinggg in https://github.com/mosaicml/llm-foundry/pull/714
* Run HF dataset processing on local rank 0 first by dakinggg in https://github.com/mosaicml/llm-foundry/pull/716
* Add Hugging Face model download script by jerrychen109 in https://github.com/mosaicml/llm-foundry/pull/708
* Adding support for Rotary Position Embeddings by ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/675
* Add databricks dependency by irenedea in https://github.com/mosaicml/llm-foundry/pull/717
* Set persistent_workers = False for packing profiling by dakinggg in https://github.com/mosaicml/llm-foundry/pull/718
* raise timeout for GPU tests by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/719
* change default overwrite to True by dakinggg in https://github.com/mosaicml/llm-foundry/pull/724
* Attempt to fix a very occasional hang in datasets map/filter by dakinggg in https://github.com/mosaicml/llm-foundry/pull/725
* Add Unity Catalog support to HF checkpointer by dakinggg in https://github.com/mosaicml/llm-foundry/pull/721
* Combine filters into one, to avoid datasets error by dakinggg in https://github.com/mosaicml/llm-foundry/pull/729
* Fix logging verbosity in HF model download script and repair symlinks by jerrychen109 in https://github.com/mosaicml/llm-foundry/pull/727
* Gate the dist calls in build_tokenizer by dakinggg in https://github.com/mosaicml/llm-foundry/pull/732
* Create AWS docker image for fine tuning by j316chuck in https://github.com/mosaicml/llm-foundry/pull/731
* Make TiktokenTokenizerWrapper compatible with convert_composer_to_hf.py by irenedea in https://github.com/mosaicml/llm-foundry/pull/730
* Enable `tie_word_embeddings` config setting to enable / disable weight tied embeddings by vchiley in https://github.com/mosaicml/llm-foundry/pull/728
* add act checkpoint at sub layer level by cli99 in https://github.com/mosaicml/llm-foundry/pull/720
* Better defaults for StreamingDataset subclasses by snarayan21 in https://github.com/mosaicml/llm-foundry/pull/723
* Rename log message by b-chu in https://github.com/mosaicml/llm-foundry/pull/734
* Remove tokenizer_name field by dakinggg in https://github.com/mosaicml/llm-foundry/pull/735
* Fix pairwise attention comparison in test by sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/737
* Fix passed metadata to mlflow logging by wenfeiy-db in https://github.com/mosaicml/llm-foundry/pull/713
* HF script explicitly casts precision by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/741
* Bump to composer 0.17 by dakinggg in https://github.com/mosaicml/llm-foundry/pull/736
* Patch os cpu count to avoid extra multiprocessing inside pytest which sometimes hangs by dakinggg in https://github.com/mosaicml/llm-foundry/pull/745
* Reenable tests that were accidentally disabled by dakinggg in https://github.com/mosaicml/llm-foundry/pull/746
* Gauntlet v0.1 by bmosaicml in https://github.com/mosaicml/llm-foundry/pull/674
* Remove extra test suite by dakinggg in https://github.com/mosaicml/llm-foundry/pull/743
* Fix typo in workflow file by dakinggg in https://github.com/mosaicml/llm-foundry/pull/750
* Fix 1.13 tests by dakinggg in https://github.com/mosaicml/llm-foundry/pull/751
* Pin Chat format to TiktokenTokenizerWrapper by rajammanabrolu in https://github.com/mosaicml/llm-foundry/pull/752
* Catch exception raised in hf prep properly by j316chuck in https://github.com/mosaicml/llm-foundry/pull/749
* Gauntlet v0.1.0 yaml fixes by bmosaicml in https://github.com/mosaicml/llm-foundry/pull/748
* Fix flash attention GQA bug to use the dynamic size of the key/value tensors - used for eval/inference by sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/756

New Contributors
* mansheej made their first contribution in https://github.com/mosaicml/llm-foundry/pull/657
* mcarbin made their first contribution in https://github.com/mosaicml/llm-foundry/pull/645
* maxisawesome made their first contribution in https://github.com/mosaicml/llm-foundry/pull/671
* AllenHW made their first contribution in https://github.com/mosaicml/llm-foundry/pull/655
* crinard made their first contribution in https://github.com/mosaicml/llm-foundry/pull/543
* jjanezhang made their first contribution in https://github.com/mosaicml/llm-foundry/pull/670
* rajammanabrolu made their first contribution in https://github.com/mosaicml/llm-foundry/pull/681
* tmsagarofficial made their first contribution in https://github.com/mosaicml/llm-foundry/pull/699
* tbarton16 made their first contribution in https://github.com/mosaicml/llm-foundry/pull/715
* ShashankMosaicML made their first contribution in https://github.com/mosaicml/llm-foundry/pull/675
* cli99 made their first contribution in https://github.com/mosaicml/llm-foundry/pull/720
* b-chu made their first contribution in https://github.com/mosaicml/llm-foundry/pull/734
* wenfeiy-db made their first contribution in https://github.com/mosaicml/llm-foundry/pull/713

**Full Changelog**: https://github.com/mosaicml/llm-foundry/compare/v0.3.0...v0.4.0

0.3.0

LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series. This release includes lots of bug fixes, stability improvements, and improved error messages, in addition to all the new features listed below!

Features

Llama-2 ([485](https://github.com/mosaicml/llm-foundry/pull/485), [#520](https://github.com/mosaicml/llm-foundry/pull/520), [#533](https://github.com/mosaicml/llm-foundry/pull/533))

Adds support for training Llama-2 models with optimized flash attention. To enable flash attention, set the `attention_patch_type` in your yaml like so:


model:
...
attention_patch_type: triton
...


See the [example yaml](https://github.com/mosaicml/llm-foundry/blob/main/mcli/mcli-llama2-finetune.yaml) for a full example of how to finetune Llama-2 on the MosaicML platform.

8-bit Lion ([514](https://github.com/mosaicml/llm-foundry/pull/514))

We have implemented an 8-bit version of the Lion optimizer. This reduces the memory needed per parameter from 12 bits to 9 bits. To switch from Lion to 8-bit Lion, simply change the optimizer name from `decoupled_lionw` to `decoupled_lionw_8b`!

Checkpoint conversion ([526](https://github.com/mosaicml/llm-foundry/pull/526), [#519](https://github.com/mosaicml/llm-foundry/pull/519), [#594](https://github.com/mosaicml/llm-foundry/pull/594))

We've greatly improved our utilities for checkpoint conversion, including generalizing the Composer to Hugging Face conversion script to support all causal LMs, adding a callback to perform the conversion to Hugging Face format during the training job, and support for Faster Transformer conversion from a Composer MPT checkpoint.

To enable the new callback, add the `hf_checkpointer` callback to your yaml like so:


callbacks:
...
hf_checkpointer:
Save a Hugging Face formatted checkpoint at the end of each epoch
save_interval: 1ep
The Hugging Face formatted checkpoints will be saved inside a subfolder called huggingface,
so this folder will likely be the same as your overall save_folder
save_folder: ./{run_name}/checkpoints
Set the precision you want the checkpoint saved in
precision: bfloat16


Code evaluation ([587](https://github.com/mosaicml/llm-foundry/pull/587))

We have added support for running HumanEval (code evaluation) using LLM Foundry! See the [evaluation readme](https://github.com/mosaicml/llm-foundry/tree/main/scripts/eval#incontextlearningcodeevaldataset) for a more detailed description and the [tasks yaml](https://github.com/mosaicml/llm-foundry/blob/main/scripts/eval/yamls/coding_tasks.yaml) for an ICL yaml that can be used to run the HumanEval evaluation task.

Transformer Engine support ([432](https://github.com/mosaicml/llm-foundry/pull/432))

Adds support for using NVIDIA's [Transformer Enginer](https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html) to enable FP8 training. To enable, set `fc_type='te'` and/or `ffn_config['ffn_type']='te_ln_mlp'` and `precision='amp_fp8'`.

MLFlow ([475](https://github.com/mosaicml/llm-foundry/pull/475))

Adds support for using MLFlow as an experiment tracker. To enable, simply add `mlflow` to the `loggers` section of your yaml. See the [Composer docs](https://docs.mosaicml.com/projects/composer/en/stable/api_reference/generated/composer.loggers.MLFlowLogger.html) for more configuration options for MLFlow. Stay tuned for automatic model logging to MLFlow for easy deployment.

Updated streaming version/defaults ([503](https://github.com/mosaicml/llm-foundry/pull/503), [#573](https://github.com/mosaicml/llm-foundry/pull/573), [#580](https://github.com/mosaicml/llm-foundry/pull/580), [#602](https://github.com/mosaicml/llm-foundry/pull/602))

Updates to the latest release of MosaicML [Streaming](https://github.com/mosaicml/streaming/releases/tag/v0.6.0) and sets better defaults for improved shuffling quality and training throughput. Check out the Streaming release notes for the full details of all the new options!

Grouped Query Attention ([492](https://github.com/mosaicml/llm-foundry/pull/492))

Implements Grouped Query Attention, which can strike a good balance between the quality of Multi Head Attention and the speed of Multi Query Attention. To enable, set `attn_config['attn_type']='grouped_query_attention'` and `attn_config['kv_n_heads']` to the desired number of kv heads.

MPT quality of life improvements ([559](https://github.com/mosaicml/llm-foundry/pull/559), [#599](https://github.com/mosaicml/llm-foundry/pull/599))

Thanks to tdoublep and lorabit110 for making MPT a bit easier to use with other parts of the NLP ecosystem!

Eval gauntlet during training, inference API eval wrapper ([501](https://github.com/mosaicml/llm-foundry/pull/501), [#494](https://github.com/mosaicml/llm-foundry/pull/494))

Improvements to our evaluation setup, including the ability to run the eval gauntlet during training, and a wrapper to allow using inference APIs with our eval gauntlet. The ICL tasks and gauntlet can be specified as shown [here](https://github.com/mosaicml/llm-foundry/blob/fd36398dad5ac9fde085af679514189ce9439be4/scripts/eval/yamls/hf_eval.yaml#L46-L47.

tiktoken support ([610](https://github.com/mosaicml/llm-foundry/pull/610))

We have enabled training with tiktoken tokenizers with a thin wrapper around the tiktoken library for compatibility with all the tooling built around Hugging Face tokenizers. You can enable this with a simple change to the tokenizer section of your yaml:


tokenizer:
name: tiktoken
kwargs:
model_name: gpt-4


LoRA eval ([515](https://github.com/mosaicml/llm-foundry/pull/515))

Allows the use of our evaluation script with a model trained using LoRA. Coming soon, full support for LoRA with FSDP! See [this yaml](https://github.com/mosaicml/llm-foundry/blob/main/scripts/eval/yamls/hf_lora_eval.yml) for an example of evaluating a model trained using LoRA. Stay tuned for full LoRA support with FSDP!

Finetuning API

Lastly, we are building a [finetuning API](https://docs.mosaicml.com/projects/mcli/en/latest/training/finetuning.html#finetuning-private-preview) on top of LLM Foundry, Composer, and Streaming. Please [reach out](https://www.mosaicml.com/get-started) if you might be interested in using this API as a customer!

What's Changed

0.2.0

LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLM). LLM Foundry serves as the efficient training codebase for the [MPT-7B](https://www.mosaicml.com/blog/mpt-7b) and [MPT-30B](https://www.mosaicml.com/blog/mpt-30b) models. Our emphasis is on efficiency, scalability, and ease-of-use, to enable fast iteration and prototyping.

We are excited to share the release of `v0.2.0`, packed with support for new hardware, features, and tutorials.

📖 Tutorials

We have released new tutorial content and helper scripts for dataset preparation, pre-training, fine-tuning, and inference!

To start off, a basic walkthrough and answers to FAQs can be found in our [Basic Tutorial](https://github.com/mosaicml/llm-foundry/blob/v0.2.0/TUTORIAL.md).

Next, detailed guides for different workflows are linked below:

Training


1. [Part 1: LLM Pretraining](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/train#llmpretraining)
1. [Installation](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/train#installation)
2. [Dataset Preparation](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/train#datasetpreparation)
3. [How to start single and multi-node pretraining](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/train#howtostartpretraining)
2. [Part 2: LLM Finetuning](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/train#llmfinetuning)
1. [Using a dataset on the HuggingFace Hub](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/train#hfdataset)
2. [Using a local dataset](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/train#2-using-a-local-dataset-)
3. [Using a StreamingDataset (MDS) formatted dataset locally or in an object store](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/train#mdsdataset)

In addition, for a more advanced and self-contained example of finetuning the MPT-7B model, see [Finetune Example](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/train/finetune_example).

Inference

The inference tutorials cover several new features we've added that improve integration with HuggingFace and FasterTransformer libraries:

- [Converting a Composer checkpoint to an HF checkpoint folder](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/inference#converting-a-composer-checkpoint-to-an-hf-checkpoint-folder)
- [Interactive Generation with HF models](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/inference#interactive-generation-with-hf-models)
- [Interactive Chat with HF models](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/inference#interactive-chat-with-hf-models)
- [Converting an HF model to ONNX](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/inference#converting-an-hf-model-to-onnx)
- [Converting an HF MPT to FasterTransformer](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/inference#converting-an-hf-mpt-to-fastertransformer)
- [Running MPT with FasterTransformer](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/inference#running-mpt-with-fastertransformer)


Major Features

LLM Foundry now uses Composer `v0.15.0` and Streaming `v0.5.1` as minimum requirements. For more details, see their release notes for [Composer](https://github.com/mosaicml/composer/releases) and [Streaming](https://github.com/mosaicml/streaming/releases) for all the improvements.

⚠️ The new Streaming release includes a few API changes, see the [Streaming v0.5](https://github.com/mosaicml/streaming/releases/tag/v0.5.0) release notes for more details. Our API have also been changed to reflect these API modifications.


1. 🆕 **Torch 2.0 support**

LLM Foundry is now Torch 2.0 compatible!

Note: we have not tested `torch.compile`, but do not expect significant performance improvements.

2. ⚡ **H100 Support**

We now support NVIDIA H100 systems! See our blog post on [Benchmarking LLMs on H100 GPUs](https://www.mosaicml.com/blog/coreweave-nvidia-h100-part-1) for initial performance and convergence details.

To run LLM Foundry with NVIDIA H100 systems, be sure to use a docker images that has CUDA 11.8+ and PyTorch 2.0+ versions.

For example, `mosaicml/pytorch:2.0.1_cu118-python3.10-ubuntu20.04` from our dockerhub has been tested with NVIDIA H100 systems.

No code changes should be required.

3. 📈 **AMD MI250 GPU Support**

With the release of PyTorch 2.0 and ROCm 5.4+, excited to share that LLM training now works out of the box on AMD Datacenter GPUs! Read our blog post on [Training LLMs with AMD MI250 GPUs](https://www.mosaicml.com/blog/amd-mi250) for more details.

Running with our stack was straightforward: use the ROCm 5.4 docker image `rocm/dev-ubuntu-20.04:5.4.3-complete`; and install PyTorch for ROCm 5.4 and [install Flash Attention](https://github.com/ROCmSoftwarePlatform/flash-attention/tree/flash_attention_for_rocm2#amd-gpurocm-support).

Modify your configuration settings:
* `attn_impl=flash` instead of the default `triton`
* Note: ALiBi is currently not supported with `attn_impl=flash`.
* `loss_fn=torch_crossentropy` instead of the default `fused_crossentropy`.


4. 🚧 **LoRA finetuning** (*Preview*)

We have included a *preview* release of Low Rank Adaptation (LoRA) support for memory-efficient fine-tuning of LLMs ([Shen et al, 2021](https://arxiv.org/abs/2106.09685)).

To use LoRA, follow the instructions found [here](https://github.com/mosaicml/llm-foundry/blob/v0.2.0/TUTORIAL.md#can-i-finetune-using-peft--lora).

Note: This is a preview feature, please let us know any feedback! The API and support is subject to change.


5. 🔎 **Evaluation Refactor** (308)

Our evaluation suite has been significantly refactored into our [Model Gauntlet](https://github.com/mosaicml/llm-foundry/blob/v0.2.0/scripts/eval/local_data/MODEL_GAUNTLET.md) approach. This includes a number of breaking API changes to support multiple models:
* Instead of `model`, use the `models` keyword and provide a list of models.
* `tokenizer` is now model-specific.

For example, to run the gauntlet of various eval tasks with `mosaicml/mpt-7b`:


cd llm-foundry/scripts
composer eval/eval.py eval/yamls/hf_eval.yaml
model_name_or_path=mosaicml/mpt-7b


This release also makes evaluation deterministic even on different number of GPUs.

For more details on all these changes, see 308

6. ⏱️ **Benchmarking Inference**

To better support the deployment of LLMs, we have included inference [benchmarking](https://github.com/mosaicml/llm-foundry/tree/v0.2.0/scripts/inference/benchmarking) suite and results across different hardware and other LLM models.


PR List
* hf dict cfg overrides by vchiley in https://github.com/mosaicml/llm-foundry/pull/90
* Add slack and license buttons to readme by growlix in https://github.com/mosaicml/llm-foundry/pull/98
* Add minimum `mosaicml-streaming` version by hanlint in https://github.com/mosaicml/llm-foundry/pull/110
* Update dataloader.py by nelsontkq in https://github.com/mosaicml/llm-foundry/pull/102
* Add features to hf_generate by alextrott16 in https://github.com/mosaicml/llm-foundry/pull/116
* Make mpt7b finetuning more obvious by samhavens in https://github.com/mosaicml/llm-foundry/pull/101
* Fix(finetune yaml): fix parameters in mpt-7b_dolly_sft.yaml by alanxmay in https://github.com/mosaicml/llm-foundry/pull/131
* Fix HF conversion script to upload to S3 after editing the files to be HF compatible by dakinggg in https://github.com/mosaicml/llm-foundry/pull/136
* Set pad_token_id to tokenizer.pad_token_id if not set on command line by patrickhwood in https://github.com/mosaicml/llm-foundry/pull/118
* Changed the keep_zip default to False to comply with StreamingDataset by karan6181 in https://github.com/mosaicml/llm-foundry/pull/150
* Add cloud upload to checkpoint conversion script by dakinggg in https://github.com/mosaicml/llm-foundry/pull/151
* Adds precision to eval by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/148
* Update StreamingDataset defaults by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/157
* Explain `composer` command by hanlint in https://github.com/mosaicml/llm-foundry/pull/164
* Remove `pynvml` by hanlint in https://github.com/mosaicml/llm-foundry/pull/165
* Adds a concrete finetuning example from a custom dataset by alextrott16 in https://github.com/mosaicml/llm-foundry/pull/156
* Remove health checker by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/167
* Rename datasets to avoid hf conflict by hanlint in https://github.com/mosaicml/llm-foundry/pull/175
* Torch2 (177) by vchiley in https://github.com/mosaicml/llm-foundry/pull/178
* Revert "Torch2 (177) (178)" by dakinggg in https://github.com/mosaicml/llm-foundry/pull/181
* clean up dataset conversion readme by codestar12 in https://github.com/mosaicml/llm-foundry/pull/168
* Convert MPT checkpoints to FT format by dskhudia in https://github.com/mosaicml/llm-foundry/pull/169
* Update README.md by jacobfulano in https://github.com/mosaicml/llm-foundry/pull/198
* Removed unused `tokenizer_name` config field by dakinggg in https://github.com/mosaicml/llm-foundry/pull/206
* Add community links to README by hanlint in https://github.com/mosaicml/llm-foundry/pull/182
* Add Tensorboard logger to yaml config by hanlint in https://github.com/mosaicml/llm-foundry/pull/166
* Update inference README by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/204
* torch2 updt with hf fixes by vchiley in https://github.com/mosaicml/llm-foundry/pull/193
* Removing deprecated vocabulary size parameter from composer CE metrics by sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/222
* Add `composer[libcloud]` dependency by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/218
* Use $RUN_NAME rather than $COMPOSER_RUN_NAME by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/209
* Fixing benchmark mcli example with proper path and image by sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/219
* Update README.md - Slack Link by ejyuen in https://github.com/mosaicml/llm-foundry/pull/207
* Kv cache speed by vchiley in https://github.com/mosaicml/llm-foundry/pull/210
* Fix a race condition in ICL eval by dakinggg in https://github.com/mosaicml/llm-foundry/pull/235
* Add basic issue templates by dakinggg in https://github.com/mosaicml/llm-foundry/pull/252
* Add a script to run mpt with FasterTransformer by dskhudia in https://github.com/mosaicml/llm-foundry/pull/229
* Change mcli eval YAMLs to use `mixed_precision: FULL` by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/255
* Bump Xentropy Version by nik-mosaic in https://github.com/mosaicml/llm-foundry/pull/261
* updt tritonpremlir to sm90 version by vchiley in https://github.com/mosaicml/llm-foundry/pull/260
* Add `mosaicml/llm-foundry` Docker workflow by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/254
* Patch README for better visibility by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/267
* Add support for `device_map` by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/225
* Fix model init when using 1 GPU by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/269
* Update README.md by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/268
* Update mcp_pytest.py by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/274
* Fastertransformer: replace config['mpt'] with config['gpt'] by dwyatte in https://github.com/mosaicml/llm-foundry/pull/272
* Add `device_map` support for `hf_generate.py` and `hf_chat.py` by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/276
* Add shift_labels arg to HF wrappers by dakinggg in https://github.com/mosaicml/llm-foundry/pull/288
* Update README.md by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/294
* Small formatting fix in eval README by sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/285
* Default to debug level debug by samhavens in https://github.com/mosaicml/llm-foundry/pull/299
* Sam/chat v2 by samhavens in https://github.com/mosaicml/llm-foundry/pull/296
* Add `save_weights_only` as an option by dakinggg in https://github.com/mosaicml/llm-foundry/pull/301
* Adding Custom Embedding, Enabling us to initialize on Heterogeneous Devices by bcui19 in https://github.com/mosaicml/llm-foundry/pull/298
* Fix convert_dataset_hf.py hanging with excessive num_workers by casperbh96 in https://github.com/mosaicml/llm-foundry/pull/270
* Update README.md by jacobfulano in https://github.com/mosaicml/llm-foundry/pull/300
* Fix autocast dtype by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/302
* Set eval shuffle to False by eldarkurtic in https://github.com/mosaicml/llm-foundry/pull/297
* Huggingface Mixed Initialization by bcui19 in https://github.com/mosaicml/llm-foundry/pull/303
* Added new community tutorial on MPT-7B-Instruct Fine Tuning by VRSEN in https://github.com/mosaicml/llm-foundry/pull/311
* Fix generate callback to work with precision context by dakinggg in https://github.com/mosaicml/llm-foundry/pull/322
* Allow MPT past the tied word embeddings error by dakinggg in https://github.com/mosaicml/llm-foundry/pull/323
* Refresh Mosaicml platform yamls by aspfohl in https://github.com/mosaicml/llm-foundry/pull/208
* hard set bias(alibi) precision by vchiley in https://github.com/mosaicml/llm-foundry/pull/329
* Create tasks_light.yaml by jfrankle in https://github.com/mosaicml/llm-foundry/pull/335
* Attn amp by vchiley in https://github.com/mosaicml/llm-foundry/pull/337
* Load on rank 0 only flag by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/334
* Add mixed device by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/342
* Better error messages for ckpt conversion script by dskhudia in https://github.com/mosaicml/llm-foundry/pull/320
* Add script to update hub code from foundry by dakinggg in https://github.com/mosaicml/llm-foundry/pull/338
* Upgrade to `mosaicml-streaming==0.5.x` by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/292
* updt composer to 0.15.0 by vchiley in https://github.com/mosaicml/llm-foundry/pull/347
* updt yml by vchiley in https://github.com/mosaicml/llm-foundry/pull/349
* Fix bug with saving optimizer states with MonolithicCheckpointSaver Callback by eracah in https://github.com/mosaicml/llm-foundry/pull/310
* Add step to free up some disk space on the worker by bandish-shah in https://github.com/mosaicml/llm-foundry/pull/350
* Filter out sequences where prompt is longer than max length, rather than dropping them on the fly later by dakinggg in https://github.com/mosaicml/llm-foundry/pull/348
* Revert "Filter out sequences where prompt is longer than max length, rather than dropping them on the fly later" by codestar12 in https://github.com/mosaicml/llm-foundry/pull/354
* Remote JSONL IFT data by samhavens in https://github.com/mosaicml/llm-foundry/pull/275
* Add MPT-30B to README by abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/356
* Codeql on PRs by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/352
* Add secrets check as part of pre-commit by karan6181 in https://github.com/mosaicml/llm-foundry/pull/360
* Onboarding tutorial and related improvements by alextrott16 in https://github.com/mosaicml/llm-foundry/pull/205
* fixed rmsnorm bug. Changed division to multiply since using torch.rsqrt by vancoykendall in https://github.com/mosaicml/llm-foundry/pull/372
* Adds max seq len filter before finetuning ds by vchiley in https://github.com/mosaicml/llm-foundry/pull/359
* Feature/peft compatible models by danbider in https://github.com/mosaicml/llm-foundry/pull/346
* Fix Typing (part 1) by hanlint in https://github.com/mosaicml/llm-foundry/pull/240
* improve hf_chat UI and readme by samhavens in https://github.com/mosaicml/llm-foundry/pull/351
* Update onnx by vchiley in https://github.com/mosaicml/llm-foundry/pull/385
* Model gauntlet by bmosaicml in https://github.com/mosaicml/llm-foundry/pull/308
* Add 30b IFT example yaml by samhavens in https://github.com/mosaicml/llm-foundry/pull/388
* Add benchmarks to inference README by sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/393
* updt install instructions by vchiley in https://github.com/mosaicml/llm-foundry/pull/396
* update quickstart eval task by vchiley in https://github.com/mosaicml/llm-foundry/pull/395
* Correct small typo in README.md by jacobfulano in https://github.com/mosaicml/llm-foundry/pull/391
* make peft installs a extra_dep by vchiley in https://github.com/mosaicml/llm-foundry/pull/397
* add fn to clear tests after every test by vchiley in https://github.com/mosaicml/llm-foundry/pull/400
* propagate cache_limit in streaming ds by vchiley in https://github.com/mosaicml/llm-foundry/pull/402
* Fixing hf_generate bug to account for pre-tokenization by ksreenivasan in https://github.com/mosaicml/llm-foundry/pull/387
* Eval Quickstart by samhavens in https://github.com/mosaicml/llm-foundry/pull/398
* Clean up train README by jacobfulano in https://github.com/mosaicml/llm-foundry/pull/392
* Fix/bugbash002 by danbider in https://github.com/mosaicml/llm-foundry/pull/405
* add install for AMD beta support by vchiley in https://github.com/mosaicml/llm-foundry/pull/407
* updt dtype of causal mask by vchiley in https://github.com/mosaicml/llm-foundry/pull/408
* YAMLS for MPT runs inherit global max_seq_len in model config by alextrott16 in https://github.com/mosaicml/llm-foundry/pull/409
* Update mcli-hf-eval.yaml by samhavens in https://github.com/mosaicml/llm-foundry/pull/411
* Edit tutorial comments on PEFT / LoRA by vchiley in https://github.com/mosaicml/llm-foundry/pull/416
* rm peft from pypi package by vchiley in https://github.com/mosaicml/llm-foundry/pull/420
* Update tasks_light.yaml by jfrankle in https://github.com/mosaicml/llm-foundry/pull/422

New Contributors
* nelsontkq made their first contribution in https://github.com/mosaicml/llm-foundry/pull/102
* samhavens made their first contribution in https://github.com/mosaicml/llm-foundry/pull/101
* alanxmay made their first contribution in https://github.com/mosaicml/llm-foundry/pull/131
* patrickhwood made their first contribution in https://github.com/mosaicml/llm-foundry/pull/118
* karan6181 made their first contribution in https://github.com/mosaicml/llm-foundry/pull/150
* dskhudia made their first contribution in https://github.com/mosaicml/llm-foundry/pull/169
* nik-mosaic made their first contribution in https://github.com/mosaicml/llm-foundry/pull/261
* dwyatte made their first contribution in https://github.com/mosaicml/llm-foundry/pull/272
* casperbh96 made their first contribution in https://github.com/mosaicml/llm-foundry/pull/270
* eldarkurtic made their first contribution in https://github.com/mosaicml/llm-foundry/pull/297
* VRSEN made their first contribution in https://github.com/mosaicml/llm-foundry/pull/311
* aspfohl made their first contribution in https://github.com/mosaicml/llm-foundry/pull/208
* jfrankle made their first contribution in https://github.com/mosaicml/llm-foundry/pull/335
* eracah made their first contribution in https://github.com/mosaicml/llm-foundry/pull/310
* bandish-shah made their first contribution in https://github.com/mosaicml/llm-foundry/pull/350
* vancoykendall made their first contribution in https://github.com/mosaicml/llm-foundry/pull/372
* danbider made their first contribution in https://github.com/mosaicml/llm-foundry/pull/346
* ksreenivasan made their first contribution in https://github.com/mosaicml/llm-foundry/pull/387

**Full Changelog**: https://github.com/mosaicml/llm-foundry/compare/v0.1.1...v0.2.0

0.1.1

What's New

LLM Foundry is now on PyPI!

What's Changed
* Update README.md by ejyuen in https://github.com/mosaicml/llm-foundry/pull/72
* Update version by dakinggg in https://github.com/mosaicml/llm-foundry/pull/73
* Remove todo in workflow by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/74
* Bump composer version by vchiley in https://github.com/mosaicml/llm-foundry/pull/84
* Fix pypi by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/80
* Remove xentropy from pypi by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/86
* Fix sed command for xentropy by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/87
* Updates to prefixlm and t5 by alextrott16 in https://github.com/mosaicml/llm-foundry/pull/85
* Disable image for pypi by mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/97

New Contributors
* ejyuen made their first contribution in https://github.com/mosaicml/llm-foundry/pull/72

**Full Changelog**: https://github.com/mosaicml/llm-foundry/compare/v0.1.0...v0.1.1

0.1.0

This is the first release of MosaicML's LLM Foundry!

Our efficient code for training, evaluating, and deploying LLMs outgrew our [examples repository](https://github.com/mosaicml/examples), so we've migrated to a brand new repository dedicated to everything LLMs. Keep watching this space and see the [top-level README](https://github.com/mosaicml/llm-foundry) and our [blog post](www.mosaicml.com/blog/mpt-7b) for more details on this announcement!

Model releases

In addition to all the open-source code released here, we're releasing four open-source models that we hope will be useful to the community. All models were trained on the [MosaicML platform](https://www.mosaicml.com/training), using [Composer](https://github.com/mosaicml/composer) and [Streaming](https://github.com/mosaicml/streaming). If you're interested in training your own models, or using these models with our [optimized inference stack](https://www.mosaicml.com/inference), please [reach out](https://forms.mosaicml.com/demo)!

- `mpt-7b`: This is our base **7-billion parameter** model, trained for **1 trillion tokens**. This model is released with an Apache-2.0 (commercial use permitted) license.
- `mpt-7b-storywriter`: All of the models use ALiBi to allow them to exrapolate to longer sequence lengths than they saw during training, but storywriter is our **long context** model, further pretrained on 65k-token excerpts of a fiction subset of the books3 corpus. This model is released with an Apache-2.0 (commercial use permitted) license.
- `mpt-7b-instruct`: This model is **instruction finetuned** on a dataset we also release, derived from Databrick's [Dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and Anthropic’s [Helpful and Harmless](https://huggingface.co/datasets/Anthropic/hh-rlhf) datasets. This model is released with a CC-By-SA-3.0 (commercial use permitted) license.
- `mpt-7b-chat`: This model is trained to be able to **chat** by further training on the [ShareGPT-Vicuna](https://huggingface.co/datasets/jeffwan/sharegpt_vicuna), [HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3), [Alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca), [Helpful and Harmless](https://huggingface.co/datasets/Anthropic/hh-rlhf), and [Evol-Instruct](https://huggingface.co/datasets/victor123/evol_instruct_70k) datasets. This model is released with a CC-By-NC-SA-4.0 (non-commercial use only) license.

Features

Training

We release fully featured code for efficiently training any HuggingFace LLM (including our optimized [MPT](https://github.com/mosaicml/llm-foundry/tree/main/llmfoundry/models/mpt) using FSDP, Composer, and Streaming. Seamlessly scale to multi-gpu and multi-node training, stream your data from one cloud, train on a different cloud, write checkpoints to a third cloud, send your training logs to Weights&Biases, and much more. See the [README](https://github.com/mosaicml/llm-foundry/tree/main/scripts/train) for more detailed instructions on getting started pretraining and finetuning!

Our MPT model is equipped with the latest advancements in training large transformers (e.g. ALiBi, the LION optimizer, FlashAttention), and is desgined to be easily hackable, configurable, and extendable!

Evaluation

Our [evaluation framework](https://github.com/mosaicml/llm-foundry/tree/main/scripts/eval), makes it easy to fully re-evaluate any HuggingFace model. We also include [copies of the processed data for many popular benchmarks](https://github.com/mosaicml/llm-foundry/tree/main/scripts/eval/local_data), to make it easy to replicate our evals, and perform your own! We welcome the addition of new benchmarks to our suite. In previous benchmarks, our setup is 8x faster than other eval frameworks on a single GPU and seamlessly achieves linear scaling with multiple GPUs. Built-in support for FSDP makes it possible to evaluate large models and use larger batch sizes for further acceleration.

Inference

MPT is designed to be fast, easy, and cheap to deploy for inference. To begin with, all MPT models are subclassed from the HuggingFace PretrainedModel base class, which means that they are fully compatible with the HuggingFace ecosystem. You can upload MPT models to the HuggingFace Hub, generate outputs with standard pipelines like `model.generate(...)`, build HuggingFace Spaces (see some of ours [here](https://huggingface.co/mosaicml#spaces)!), and more.

What about performance? With MPT’s optimized layers (including FlashAttention and low precision layernorm), the out-of-the-box performance of MPT-7B on GPUs when using `model.generate(...)` is 1.5x-2x faster than other 7B models like LLaMa-7B. This makes it easy to build fast and flexible inference pipelines with just HuggingFace and PyTorch.

Finally, for the best hosting experience, deploy your MPT models directly on MosaicML’s [Inference service](https://www.mosaicml.com/inference). Start with our managed endpoints for models like MPT-7B-Instruct, and/or deploy your own custom model endpoints for optimal cost and data privacy. Check out the [Inference blog post](https://www.mosaicml.com/blog/inference-launch) for more details!

Page 4 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.