Table of Contents
Fetching ...

Exploring Information Processing in Large Language Models: Insights from Information Bottleneck Theory

Zhou Yang, Zhengyu Qi, Zhaochun Ren, Zhikai Jia, Haizhou Sun, Xiaofei Zhu, Xiangwen Liao

TL;DR

The paper analyzes how large language models process information through the lens of Information Bottleneck theory, revealing a two-stage flow: compression of inputs into task spaces during understanding and subsequent decompression during prediction. It introduces non-gradient task-space detection to trace information flow and two IB-based methods, IC-ICL and TS-FT, to improve compression/decompression. Empirical results on Empathetic Dialogues show IC-ICL yields strong gains in reasoning accuracy and reduces inference time by over 40%, while TS-FT provides robust improvements with minimal adjustments. The work offers a practical framework for enhancing LLM efficiency and interpretability by shaping internal information representations.

Abstract

Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of tasks by understanding input information and predicting corresponding outputs. However, the internal mechanisms by which LLMs comprehend input and make effective predictions remain poorly understood. In this paper, we explore the working mechanism of LLMs in information processing from the perspective of Information Bottleneck Theory. We propose a non-training construction strategy to define a task space and identify the following key findings: (1) LLMs compress input information into specific task spaces (e.g., sentiment space, topic space) to facilitate task understanding; (2) they then extract and utilize relevant information from the task space at critical moments to generate accurate predictions. Based on these insights, we introduce two novel approaches: an Information Compression-based Context Learning (IC-ICL) and a Task-Space-guided Fine-Tuning (TS-FT). IC-ICL enhances reasoning performance and inference efficiency by compressing retrieved example information into the task space. TS-FT employs a space-guided loss to fine-tune LLMs, encouraging the learning of more effective compression and selection mechanisms. Experiments across multiple datasets validate the effectiveness of task space construction. Additionally, IC-ICL not only improves performance but also accelerates inference speed by over 40\%, while TS-FT achieves superior results with a minimal strategy adjustment.

Exploring Information Processing in Large Language Models: Insights from Information Bottleneck Theory

TL;DR

The paper analyzes how large language models process information through the lens of Information Bottleneck theory, revealing a two-stage flow: compression of inputs into task spaces during understanding and subsequent decompression during prediction. It introduces non-gradient task-space detection to trace information flow and two IB-based methods, IC-ICL and TS-FT, to improve compression/decompression. Empirical results on Empathetic Dialogues show IC-ICL yields strong gains in reasoning accuracy and reduces inference time by over 40%, while TS-FT provides robust improvements with minimal adjustments. The work offers a practical framework for enhancing LLM efficiency and interpretability by shaping internal information representations.

Abstract

Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of tasks by understanding input information and predicting corresponding outputs. However, the internal mechanisms by which LLMs comprehend input and make effective predictions remain poorly understood. In this paper, we explore the working mechanism of LLMs in information processing from the perspective of Information Bottleneck Theory. We propose a non-training construction strategy to define a task space and identify the following key findings: (1) LLMs compress input information into specific task spaces (e.g., sentiment space, topic space) to facilitate task understanding; (2) they then extract and utilize relevant information from the task space at critical moments to generate accurate predictions. Based on these insights, we introduce two novel approaches: an Information Compression-based Context Learning (IC-ICL) and a Task-Space-guided Fine-Tuning (TS-FT). IC-ICL enhances reasoning performance and inference efficiency by compressing retrieved example information into the task space. TS-FT employs a space-guided loss to fine-tune LLMs, encouraging the learning of more effective compression and selection mechanisms. Experiments across multiple datasets validate the effectiveness of task space construction. Additionally, IC-ICL not only improves performance but also accelerates inference speed by over 40\%, while TS-FT achieves superior results with a minimal strategy adjustment.
Paper Structure (16 sections, 17 equations, 9 figures, 3 tables)

This paper contains 16 sections, 17 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: 2D visualization of the emotion space.
  • Figure 2: 3D visualization of the emotion space.
  • Figure 3: Visualization of emotion similarity
  • Figure 4: Information variation of LLMs in the ground-truth emotion space.
  • Figure 5: Information Variation of LLMs in the Emotion Space with $d_k=2$
  • ...and 4 more figures