Table of Contents
Fetching ...

The Efficiency Spectrum of Large Language Models: An Algorithmic Survey

Tianyu Ding, Tianyi Chen, Haidong Zhu, Jiachen Jiang, Yiqi Zhong, Jinxin Zhou, Guangzhi Wang, Zhihui Zhu, Ilya Zharkov, Luming Liang

TL;DR

LLMs face significant compute and memory demands that limit research and deployment. The paper presents a comprehensive, algorithmic survey of efficiency across data utilization, architecture, training/tuning, and inference, anchored by scaling laws and practical techniques such as PEFT, MoE, efficient attention, and quantization. It ties together methods from data filtering and curriculum learning to memory- and compute-optimized training regimes, 3D parallelism, and attention-free architectures, highlighting how end-to-end optimization preserves performance while reducing resource use. The work emphasizes that achieving practical efficiency requires coordinated strategies across the entire model lifecycle, enabling broader, more sustainable deployment of large language models in diverse environments.

Abstract

The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains, reshaping the artificial general intelligence landscape. However, the increasing computational and memory demands of these models present substantial challenges, hindering both academic research and practical applications. To address these issues, a wide array of methods, including both algorithmic and hardware solutions, have been developed to enhance the efficiency of LLMs. This survey delivers a comprehensive review of algorithmic advancements aimed at improving LLM efficiency. Unlike other surveys that typically focus on specific areas such as training or model compression, this paper examines the multi-faceted dimensions of efficiency essential for the end-to-end algorithmic development of LLMs. Specifically, it covers various topics related to efficiency, including scaling laws, data utilization, architectural innovations, training and tuning strategies, and inference techniques. This paper aims to serve as a valuable resource for researchers and practitioners, laying the groundwork for future innovations in this critical research area. Our repository of relevant references is maintained at url{https://github.com/tding1/Efficient-LLM-Survey}.

The Efficiency Spectrum of Large Language Models: An Algorithmic Survey

TL;DR

LLMs face significant compute and memory demands that limit research and deployment. The paper presents a comprehensive, algorithmic survey of efficiency across data utilization, architecture, training/tuning, and inference, anchored by scaling laws and practical techniques such as PEFT, MoE, efficient attention, and quantization. It ties together methods from data filtering and curriculum learning to memory- and compute-optimized training regimes, 3D parallelism, and attention-free architectures, highlighting how end-to-end optimization preserves performance while reducing resource use. The work emphasizes that achieving practical efficiency requires coordinated strategies across the entire model lifecycle, enabling broader, more sustainable deployment of large language models in diverse environments.

Abstract

The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains, reshaping the artificial general intelligence landscape. However, the increasing computational and memory demands of these models present substantial challenges, hindering both academic research and practical applications. To address these issues, a wide array of methods, including both algorithmic and hardware solutions, have been developed to enhance the efficiency of LLMs. This survey delivers a comprehensive review of algorithmic advancements aimed at improving LLM efficiency. Unlike other surveys that typically focus on specific areas such as training or model compression, this paper examines the multi-faceted dimensions of efficiency essential for the end-to-end algorithmic development of LLMs. Specifically, it covers various topics related to efficiency, including scaling laws, data utilization, architectural innovations, training and tuning strategies, and inference techniques. This paper aims to serve as a valuable resource for researchers and practitioners, laying the groundwork for future innovations in this critical research area. Our repository of relevant references is maintained at url{https://github.com/tding1/Efficient-LLM-Survey}.
Paper Structure (35 sections, 2 figures, 2 tables)

This paper contains 35 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Capability and performance tree palm-blog2022 of PaLM chowdhery2022palm across different scales (8-billion, 62-billion, 540-billion). Each circle node represents a specific capability, and its size indicates the corresponding performance level—the larger the circle, the greater the capability. As the model scale increases, performance not only improves across existing tasks but also reveals new capabilities.
  • Figure 2: The schematic overview of the multi-faceted dimensions of LLM Efficiency. This diagram illustrates the key areas covered in this survey, including data utilization, architectural designs, training and tuning strategies, and inference techniques, thereby providing a holistic view of the factors contributing to LLM efficiency.