Table of Contents
Fetching ...

Steering Information Utility in Key-Value Memory for Language Model Post-Training

Chunyuan Deng, Ruidi Chang, Hanjie Chen

TL;DR

InfoSteer tackles the underutilization of pretrained knowledge during LM post-training by steering the FFN's memory-like key–value structure. It treats the first FFN projection as a content-dependent key and the second as a memory vector, guiding key distributions through forward interventions and entropy-based regularization to engage a broader set of memory vectors. Across Qwen, LLaMA, and Gemma models on 15 ID and OOD tasks, InfoSteer yields consistent accuracy gains and enables adaptive information allocation that favors semantically rich tokens while deprioritizing trivial ones. The approach is lightweight, architecture-agnostic, and enhances interpretability through memory-surrogate analysis, suggesting a practical path to improve both capability and transparency in post-trained language models.

Abstract

Recent advancements in language models (LMs) have marked a shift toward the growing importance of post-training. Yet, post-training approaches such as supervised fine-tuning (SFT) do not guarantee the effective use of knowledge acquired during pretraining. We therefore introduce InfoSteer, a lightweight method that encourages parametric information utilization in LMs during post-training. Specifically, InfoSteer treats the feed-forward network (FFN) layer as associate key-value memory and promotes the use of stored memory vectors via forward-pass interventions or regularization during backpropagation. This simple guidance during post-training phase yields consistent performance improvements across diverse model families -- including Qwen, Gemma and Llama -- spanning 15 downstream tasks in both in-distribution (ID) and out-of-distribution (OOD) evaluations. Beyond performance gains, we also find that steered LMs can adaptively allocate information by placing more emphasis on generating semantically meaningful tokens, while using fewer resources on simple transition ones (e.g., `\texttt{,}' or `\texttt{and}'). Our work underscores that vanilla post-training does not fully exploit the potential gained during pre-training, and that steering LMs in latent representation space offers a promising approach to enhance both performance and interpretability. The code is available at: https://github.com/chili-lab/InfoSteer.

Steering Information Utility in Key-Value Memory for Language Model Post-Training

TL;DR

InfoSteer tackles the underutilization of pretrained knowledge during LM post-training by steering the FFN's memory-like key–value structure. It treats the first FFN projection as a content-dependent key and the second as a memory vector, guiding key distributions through forward interventions and entropy-based regularization to engage a broader set of memory vectors. Across Qwen, LLaMA, and Gemma models on 15 ID and OOD tasks, InfoSteer yields consistent accuracy gains and enables adaptive information allocation that favors semantically rich tokens while deprioritizing trivial ones. The approach is lightweight, architecture-agnostic, and enhances interpretability through memory-surrogate analysis, suggesting a practical path to improve both capability and transparency in post-trained language models.

Abstract

Recent advancements in language models (LMs) have marked a shift toward the growing importance of post-training. Yet, post-training approaches such as supervised fine-tuning (SFT) do not guarantee the effective use of knowledge acquired during pretraining. We therefore introduce InfoSteer, a lightweight method that encourages parametric information utilization in LMs during post-training. Specifically, InfoSteer treats the feed-forward network (FFN) layer as associate key-value memory and promotes the use of stored memory vectors via forward-pass interventions or regularization during backpropagation. This simple guidance during post-training phase yields consistent performance improvements across diverse model families -- including Qwen, Gemma and Llama -- spanning 15 downstream tasks in both in-distribution (ID) and out-of-distribution (OOD) evaluations. Beyond performance gains, we also find that steered LMs can adaptively allocate information by placing more emphasis on generating semantically meaningful tokens, while using fewer resources on simple transition ones (e.g., `\texttt{,}' or `\texttt{and}'). Our work underscores that vanilla post-training does not fully exploit the potential gained during pre-training, and that steering LMs in latent representation space offers a promising approach to enhance both performance and interpretability. The code is available at: https://github.com/chili-lab/InfoSteer.

Paper Structure

This paper contains 58 sections, 12 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Overview of our proposed InfoSteer framework. The interpretation of Transformer FFNs as key-value memory was introduced by geva2021transformer, further details are provided in §\ref{['sec:preliminary']}.
  • Figure 2: Key Design of InfoSteer. (a) An illustration of viewing the Transformer FFN as a key-value memory. The key acts as a control mechanism that determines the extent to which each memory vector is engaged. (b) and (c) Two general methods used to modulate the distribution of key coefficients, thereby encouraging the engagement of memory vectors during post-training.
  • Figure 3: OOD Model Performance Comparison across different mathematical reasoning datasets. Results are reported w/ average score of three separate runs.
  • Figure 4: Key Coefficient Distribution Shift from Base Model. The gray line serve as baseline for the number of key in corresponding region.