Table of Contents
Fetching ...

Shifting AI Efficiency From Model-Centric to Data-Centric Compression

Xuyang Liu, Zichen Wen, Shaobo Wang, Junjie Chen, Zhishan Tao, Yubo Wang, Tailai Chen, Xiangqi Jin, Chang Zou, Yiyu Wang, Chenfei Liao, Xu Zheng, Honggang Chen, Weijia Li, Xuming Hu, Conghui He, Linfeng Zhang

TL;DR

The paper argues that AI efficiency research should shift from model-centric to data-centric compression as the dominant bottleneck moves from parameter count to long-context processing. It formalizes a two-stage data-centric paradigm with a scoring function $\mathcal{E}$ and a strategy $\mathcal{P}$ to produce a compressed sequence $\mathbf{X}'$, via token pruning or merging, and analyzes benefits for both training and inference. The authors provide a unified framework, review existing data-centric techniques, and discuss challenges such as attention biases and evaluation gaps, while proposing future work on co-development with model-centric methods and dedicated benchmarks. The work highlights potential for universal applicability and substantial speedups, emphasizing practical impacts for long-context LLMs, MLLMs, and DiTs in resource-constrained settings.

Abstract

The advancement of large language models (LLMs) and multi-modal LLMs (MLLMs) has historically relied on scaling model parameters. However, as hardware limits constrain further model growth, the primary computational bottleneck has shifted to the quadratic cost of self-attention over increasingly long sequences by ultra-long text contexts, high-resolution images, and extended videos. In this position paper, \textbf{we argue that the focus of research for efficient artificial intelligence (AI) is shifting from model-centric compression to data-centric compression}. We position data-centric compression as the emerging paradigm, which improves AI efficiency by directly compressing the volume of data processed during model training or inference. To formalize this shift, we establish a unified framework for existing efficiency strategies and demonstrate why it constitutes a crucial paradigm change for long-context AI. We then systematically review the landscape of data-centric compression methods, analyzing their benefits across diverse scenarios. Finally, we outline key challenges and promising future research directions. Our work aims to provide a novel perspective on AI efficiency, synthesize existing efforts, and catalyze innovation to address the challenges posed by ever-increasing context lengths.

Shifting AI Efficiency From Model-Centric to Data-Centric Compression

TL;DR

The paper argues that AI efficiency research should shift from model-centric to data-centric compression as the dominant bottleneck moves from parameter count to long-context processing. It formalizes a two-stage data-centric paradigm with a scoring function and a strategy to produce a compressed sequence , via token pruning or merging, and analyzes benefits for both training and inference. The authors provide a unified framework, review existing data-centric techniques, and discuss challenges such as attention biases and evaluation gaps, while proposing future work on co-development with model-centric methods and dedicated benchmarks. The work highlights potential for universal applicability and substantial speedups, emphasizing practical impacts for long-context LLMs, MLLMs, and DiTs in resource-constrained settings.

Abstract

The advancement of large language models (LLMs) and multi-modal LLMs (MLLMs) has historically relied on scaling model parameters. However, as hardware limits constrain further model growth, the primary computational bottleneck has shifted to the quadratic cost of self-attention over increasingly long sequences by ultra-long text contexts, high-resolution images, and extended videos. In this position paper, \textbf{we argue that the focus of research for efficient artificial intelligence (AI) is shifting from model-centric compression to data-centric compression}. We position data-centric compression as the emerging paradigm, which improves AI efficiency by directly compressing the volume of data processed during model training or inference. To formalize this shift, we establish a unified framework for existing efficiency strategies and demonstrate why it constitutes a crucial paradigm change for long-context AI. We then systematically review the landscape of data-centric compression methods, analyzing their benefits across diverse scenarios. Finally, we outline key challenges and promising future research directions. Our work aims to provide a novel perspective on AI efficiency, synthesize existing efforts, and catalyze innovation to address the challenges posed by ever-increasing context lengths.

Paper Structure

This paper contains 30 sections, 8 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: The evolution of AI efficiency: from model-centric to data-centric compression. From 2022 to 2024, AI model performance gains were primarily driven by scaling model size, directing efficiency research toward model-centric compression. By 2024, with model sizes approaching 1T parameters, their growth has slowed down. Consequently, the focus has shifted to expandingcontext length to enhance model capabilities, necessitating a transition to data-centric compression that reduces context length for efficiency.
  • Figure 2: Overview of the data-centric compression paradigm. Given an input token sequence $\mathbf{X} = [\mathbf{x}_1, \dots, \mathbf{x}_T]$, data-centric compression first computes importance scores via a scoring function $\mathcal{E}: \mathbf{X} \to \{s_t\}_{t=1}^T$, then generates a compressed sequence $\mathbf{X}'$ through a compression strategy $\mathcal{P}: (\mathbf{X}, \{s_t\}_{t=1}^T) \to \mathbf{X}'$, where $|\mathbf{X}'| < |\mathbf{X}|$.
  • Figure 3: Empirical comparison of carefully designed data-centric compression methods and random token dropping. Results demonstrate that in multiple scenarios (e.g., LLMs, MLLMs, and DiTs), some carefully designed methods surprisingly underperform compared to random token selection.