Table of Contents
Fetching ...

The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?

Zhenheng Tang, Xiang Liu, Qian Wang, Peijie Dong, Bingsheng He, Xiaowen Chu, Bo Li

TL;DR

The paper challenges the notion that compressing LLMs should only preserve perplexity and benchmark performance, arguing that crucial capabilities such as long-context reasoning, retrieval, external tool usage, and external memory must be preserved or emulated. It introduces the Lottery LLM Hypothesis: for a given task, a smaller LLM can match the performance of a larger model when aided by multi-step reasoning, retrieval, external tools, and memory, using a dynamic divide-and-conquer algorithm $\mathcal{A}$ that may access external resources $\mathcal{D}, \mathcal{R}, \mathcal{C}, \mathcal{M}$. The authors review progress in retrieval-augmented generation, adaptive resource distribution between model parameters and knowledge bases, and the role of external tools, memory, and planning in expanding computational expressivity beyond basic transformers. They propose a concrete set of abilities essential for lottery LLMs and argue that aligning compression with these capabilities can yield substantial energy savings while maintaining or enhancing real-world reasoning performance. The work motivates future research in efficient LLM and KV-cache compression that preserves cognition-like processes, driving practical deployment at scale.

Abstract

Motivated by reducing the computational and storage costs of LLMs, model compression and KV cache compression have attracted much attention from researchers. However, current methods predominantly emphasize maintaining the performance of compressed LLMs, as measured by perplexity or simple accuracy on tasks of common sense knowledge QA and basic arithmetic reasoning. In this blog, we present a brief review of recent advancements in LLMs related to retrieval-augmented generation, multi-step reasoning, external tools, and computational expressivity, all of which substantially enhance LLM performance. Then, we propose a lottery LLM hypothesis suggesting that for a given LLM and task, there exists a smaller lottery LLM capable of producing the same performance as the original LLM with the assistance of multi-step reasoning and external tools. Based on the review of current progress in LLMs, we discuss and summarize the essential capabilities that the lottery LLM and KV cache compression must possess, which are currently overlooked in existing methods.

The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?

TL;DR

The paper challenges the notion that compressing LLMs should only preserve perplexity and benchmark performance, arguing that crucial capabilities such as long-context reasoning, retrieval, external tool usage, and external memory must be preserved or emulated. It introduces the Lottery LLM Hypothesis: for a given task, a smaller LLM can match the performance of a larger model when aided by multi-step reasoning, retrieval, external tools, and memory, using a dynamic divide-and-conquer algorithm that may access external resources . The authors review progress in retrieval-augmented generation, adaptive resource distribution between model parameters and knowledge bases, and the role of external tools, memory, and planning in expanding computational expressivity beyond basic transformers. They propose a concrete set of abilities essential for lottery LLMs and argue that aligning compression with these capabilities can yield substantial energy savings while maintaining or enhancing real-world reasoning performance. The work motivates future research in efficient LLM and KV-cache compression that preserves cognition-like processes, driving practical deployment at scale.

Abstract

Motivated by reducing the computational and storage costs of LLMs, model compression and KV cache compression have attracted much attention from researchers. However, current methods predominantly emphasize maintaining the performance of compressed LLMs, as measured by perplexity or simple accuracy on tasks of common sense knowledge QA and basic arithmetic reasoning. In this blog, we present a brief review of recent advancements in LLMs related to retrieval-augmented generation, multi-step reasoning, external tools, and computational expressivity, all of which substantially enhance LLM performance. Then, we propose a lottery LLM hypothesis suggesting that for a given LLM and task, there exists a smaller lottery LLM capable of producing the same performance as the original LLM with the assistance of multi-step reasoning and external tools. Based on the review of current progress in LLMs, we discuss and summarize the essential capabilities that the lottery LLM and KV cache compression must possess, which are currently overlooked in existing methods.

Paper Structure

This paper contains 7 sections, 1 equation, 8 figures, 3 tables.

Figures (8)

  • Figure 1: A general pseudo code of the reasoning algorithm $\mathcal{A}$.
  • Figure 2: The problem solving process of the multi-step reasoning with external tools (the interaction with the external memory and the verification are not shown in the figure).
  • Figure 3: Simulating the Turing machine with LLMs and the external memory Memory-Augmented-Turing.
  • Figure 4: Vanilla NIAH results of LLaMA3-8B-Instruct.
  • Figure 5: NIAH results of LLaMA3-8B-Instruct with preprocessing prompts.
  • ...and 3 more figures