This is the first release of MosaicML's LLM Foundry!
Our efficient code for training, evaluating, and deploying LLMs outgrew our [examples repository](https://github.com/mosaicml/examples), so we've migrated to a brand new repository dedicated to everything LLMs. Keep watching this space and see the [top-level README](https://github.com/mosaicml/llm-foundry) and our [blog post](www.mosaicml.com/blog/mpt-7b) for more details on this announcement!
Model releases
In addition to all the open-source code released here, we're releasing four open-source models that we hope will be useful to the community. All models were trained on the [MosaicML platform](https://www.mosaicml.com/training), using [Composer](https://github.com/mosaicml/composer) and [Streaming](https://github.com/mosaicml/streaming). If you're interested in training your own models, or using these models with our [optimized inference stack](https://www.mosaicml.com/inference), please [reach out](https://forms.mosaicml.com/demo)!
- `mpt-7b`: This is our base **7-billion parameter** model, trained for **1 trillion tokens**. This model is released with an Apache-2.0 (commercial use permitted) license.
- `mpt-7b-storywriter`: All of the models use ALiBi to allow them to exrapolate to longer sequence lengths than they saw during training, but storywriter is our **long context** model, further pretrained on 65k-token excerpts of a fiction subset of the books3 corpus. This model is released with an Apache-2.0 (commercial use permitted) license.
- `mpt-7b-instruct`: This model is **instruction finetuned** on a dataset we also release, derived from Databrick's [Dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and Anthropic’s [Helpful and Harmless](https://huggingface.co/datasets/Anthropic/hh-rlhf) datasets. This model is released with a CC-By-SA-3.0 (commercial use permitted) license.
- `mpt-7b-chat`: This model is trained to be able to **chat** by further training on the [ShareGPT-Vicuna](https://huggingface.co/datasets/jeffwan/sharegpt_vicuna), [HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3), [Alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca), [Helpful and Harmless](https://huggingface.co/datasets/Anthropic/hh-rlhf), and [Evol-Instruct](https://huggingface.co/datasets/victor123/evol_instruct_70k) datasets. This model is released with a CC-By-NC-SA-4.0 (non-commercial use only) license.
Features
Training
We release fully featured code for efficiently training any HuggingFace LLM (including our optimized [MPT](https://github.com/mosaicml/llm-foundry/tree/main/llmfoundry/models/mpt) using FSDP, Composer, and Streaming. Seamlessly scale to multi-gpu and multi-node training, stream your data from one cloud, train on a different cloud, write checkpoints to a third cloud, send your training logs to Weights&Biases, and much more. See the [README](https://github.com/mosaicml/llm-foundry/tree/main/scripts/train) for more detailed instructions on getting started pretraining and finetuning!
Our MPT model is equipped with the latest advancements in training large transformers (e.g. ALiBi, the LION optimizer, FlashAttention), and is desgined to be easily hackable, configurable, and extendable!
Evaluation
Our [evaluation framework](https://github.com/mosaicml/llm-foundry/tree/main/scripts/eval), makes it easy to fully re-evaluate any HuggingFace model. We also include [copies of the processed data for many popular benchmarks](https://github.com/mosaicml/llm-foundry/tree/main/scripts/eval/local_data), to make it easy to replicate our evals, and perform your own! We welcome the addition of new benchmarks to our suite. In previous benchmarks, our setup is 8x faster than other eval frameworks on a single GPU and seamlessly achieves linear scaling with multiple GPUs. Built-in support for FSDP makes it possible to evaluate large models and use larger batch sizes for further acceleration.
Inference
MPT is designed to be fast, easy, and cheap to deploy for inference. To begin with, all MPT models are subclassed from the HuggingFace PretrainedModel base class, which means that they are fully compatible with the HuggingFace ecosystem. You can upload MPT models to the HuggingFace Hub, generate outputs with standard pipelines like `model.generate(...)`, build HuggingFace Spaces (see some of ours [here](https://huggingface.co/mosaicml#spaces)!), and more.
What about performance? With MPT’s optimized layers (including FlashAttention and low precision layernorm), the out-of-the-box performance of MPT-7B on GPUs when using `model.generate(...)` is 1.5x-2x faster than other 7B models like LLaMa-7B. This makes it easy to build fast and flexible inference pipelines with just HuggingFace and PyTorch.
Finally, for the best hosting experience, deploy your MPT models directly on MosaicML’s [Inference service](https://www.mosaicml.com/inference). Start with our managed endpoints for models like MPT-7B-Instruct, and/or deploy your own custom model endpoints for optimal cost and data privacy. Check out the [Inference blog post](https://www.mosaicml.com/blog/inference-launch) for more details!