Shifting AI Efficiency From Model-Centric to Data-Centric Compression
Xuyang Liu, Zichen Wen, Shaobo Wang, Junjie Chen, Zhishan Tao, Yubo Wang, Tailai Chen, Xiangqi Jin, Chang Zou, Yiyu Wang, Chenfei Liao, Xu Zheng, Honggang Chen, Weijia Li, Xuming Hu, Conghui He, Linfeng Zhang
TL;DR
The paper argues that AI efficiency research should shift from model-centric to data-centric compression as the dominant bottleneck moves from parameter count to long-context processing. It formalizes a two-stage data-centric paradigm with a scoring function $\mathcal{E}$ and a strategy $\mathcal{P}$ to produce a compressed sequence $\mathbf{X}'$, via token pruning or merging, and analyzes benefits for both training and inference. The authors provide a unified framework, review existing data-centric techniques, and discuss challenges such as attention biases and evaluation gaps, while proposing future work on co-development with model-centric methods and dedicated benchmarks. The work highlights potential for universal applicability and substantial speedups, emphasizing practical impacts for long-context LLMs, MLLMs, and DiTs in resource-constrained settings.
Abstract
The advancement of large language models (LLMs) and multi-modal LLMs (MLLMs) has historically relied on scaling model parameters. However, as hardware limits constrain further model growth, the primary computational bottleneck has shifted to the quadratic cost of self-attention over increasingly long sequences by ultra-long text contexts, high-resolution images, and extended videos. In this position paper, \textbf{we argue that the focus of research for efficient artificial intelligence (AI) is shifting from model-centric compression to data-centric compression}. We position data-centric compression as the emerging paradigm, which improves AI efficiency by directly compressing the volume of data processed during model training or inference. To formalize this shift, we establish a unified framework for existing efficiency strategies and demonstrate why it constitutes a crucial paradigm change for long-context AI. We then systematically review the landscape of data-centric compression methods, analyzing their benefits across diverse scenarios. Finally, we outline key challenges and promising future research directions. Our work aims to provide a novel perspective on AI efficiency, synthesize existing efforts, and catalyze innovation to address the challenges posed by ever-increasing context lengths.
