Phil 5.6.2023

MPT-7B is a decoder-style transformer pretrained from scratch on 1T tokens of English text and code. This model was trained by MosaicML and is open-sourced for commercial use (Apache-2.0).

MPT-7B is part of the family of MosaicPretrainedTransformer (MPT) models, which use a modified transformer architecture optimized for efficient training and inference.

These architectural changes include performance-optimized layer implementations and the elimination of context length limits by replacing positional embeddings with Attention with Linear Biases (ALiBi). Thanks to these modifications, MPT models can be trained with high throughput efficiency and stable convergence. MPT models can also be served efficiently with both standard HuggingFace pipelines and NVIDIA’s FasterTransformer.

This model uses the MosaicML LLM codebase, which can be found in the llm-foundry repository. It was trained by MosaicML’s NLP team on the MosaicML platform for LLM pretraining, finetuning, and inference.