The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
Zhenheng Tang, Xiang Liu, Qian Wang, Peijie Dong, Bingsheng He, Xiaowen Chu, Bo Li
TL;DR
The paper challenges the notion that compressing LLMs should only preserve perplexity and benchmark performance, arguing that crucial capabilities such as long-context reasoning, retrieval, external tool usage, and external memory must be preserved or emulated. It introduces the Lottery LLM Hypothesis: for a given task, a smaller LLM can match the performance of a larger model when aided by multi-step reasoning, retrieval, external tools, and memory, using a dynamic divide-and-conquer algorithm $\mathcal{A}$ that may access external resources $\mathcal{D}, \mathcal{R}, \mathcal{C}, \mathcal{M}$. The authors review progress in retrieval-augmented generation, adaptive resource distribution between model parameters and knowledge bases, and the role of external tools, memory, and planning in expanding computational expressivity beyond basic transformers. They propose a concrete set of abilities essential for lottery LLMs and argue that aligning compression with these capabilities can yield substantial energy savings while maintaining or enhancing real-world reasoning performance. The work motivates future research in efficient LLM and KV-cache compression that preserves cognition-like processes, driving practical deployment at scale.
Abstract
Motivated by reducing the computational and storage costs of LLMs, model compression and KV cache compression have attracted much attention from researchers. However, current methods predominantly emphasize maintaining the performance of compressed LLMs, as measured by perplexity or simple accuracy on tasks of common sense knowledge QA and basic arithmetic reasoning. In this blog, we present a brief review of recent advancements in LLMs related to retrieval-augmented generation, multi-step reasoning, external tools, and computational expressivity, all of which substantially enhance LLM performance. Then, we propose a lottery LLM hypothesis suggesting that for a given LLM and task, there exists a smaller lottery LLM capable of producing the same performance as the original LLM with the assistance of multi-step reasoning and external tools. Based on the review of current progress in LLMs, we discuss and summarize the essential capabilities that the lottery LLM and KV cache compression must possess, which are currently overlooked in existing methods.
