Designing Large Foundation Models for Efficient Training and Inference: A Survey
Dong Liu, Yanxuan Yu, Yite Wang, Jing Wu, Zhongwei Wan, Sina Alinejad, Benjamin Lengerich, Ying Nian Wu
TL;DR
This survey assesses how large foundation models can be trained and deployed more efficiently by exploring three intertwined paths: model design, system design, and model-system co-design. It covers quantization (bit-width based and method-based), knowledge distillation (soft and hard), and pruning (unstructured and structured) as core model-level strategies, and KV cache design, token reduction, sparsity-aware optimization, and multi-modal fusion as system-level techniques. It then examines how MoE, mixed precision training, efficient pretraining, and efficient fine-tuning enable scalable, cost-effective development, culminating in a unified framework for model-system co-design. The work emphasizes practical improvements for accessibility and affordability of foundation models across NLP, vision, and multimodal applications, with broad implications for real-world deployment and research efficiency.
Abstract
This paper focuses on modern efficient training and inference technologies on foundation models and illustrates them from two perspectives: model and system design. Model and System Design optimize LLM training and inference from different aspects to save computational resources, making LLMs more efficient, affordable, and more accessible. The paper list repository is available at https://github.com/NoakLiu/Efficient-Foundation-Models-Survey.
