Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models

Mansi Sakarvadia; Aswathy Ajith; Arham Khan; Daniel Grzenda; Nathaniel Hudson; André Bauer; Kyle Chard; Ian Foster

Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models

Mansi Sakarvadia, Aswathy Ajith, Arham Khan, Daniel Grzenda, Nathaniel Hudson, André Bauer, Kyle Chard, Ian Foster

TL;DR

This paper tackles the challenge of unreliable multi-hop reasoning in transformer-based LLMs by identifying attention-head mechanisms that retrieve memories during inference. It proposes a lightweight memory-injection method that adds prompt-relevant memories directly into the hidden activations at selected attention layers, enabling the model to recall intermediary information and improve multi-hop completions. Through curated experiments on GPT-2 variants with programmatic and human-generated datasets, the approach shows significant gains in the probability of the correct next token for multi-hop prompts, while random injections generally harm performance. The work advances interpretability and knowledge editing in LLMs, suggesting scalable online memory augmentation but also noting limitations and ethics considerations around biases and potential misuse.

Abstract

Answering multi-hop reasoning questions requires retrieving and synthesizing information from diverse sources. Large Language Models (LLMs) struggle to perform such reasoning consistently. Here we propose an approach to pinpoint and rectify multi-hop reasoning failures through targeted memory injections on LLM attention heads. First, we analyze the per-layer activations of GPT-2 models in response to single and multi-hop prompts. We then propose a mechanism that allows users to inject pertinent prompt-specific information, which we refer to as "memories," at critical LLM locations during inference. By thus enabling the LLM to incorporate additional relevant information during inference, we enhance the quality of multi-hop prompt completions. We show empirically that a simple, efficient, and targeted memory injection into a key attention layer can often increase the probability of the desired next token in multi-hop tasks, by up to 424%.

Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models

TL;DR

Abstract

Paper Structure (27 sections, 8 equations, 9 figures, 5 tables)

This paper contains 27 sections, 8 equations, 9 figures, 5 tables.

Introduction
Background & Notation
Multi-hop vs. single-hop prompts
Transformer Architecture
Embedding Inputs
Residual Stream
Multi-Headed Self Attention (MHSA)
Multi-Layer Perceptron (MLP)
Unembedding Predictions into Logits
Experimental Overview
Dataset Descriptions
Programmatically Generated Dataset
Human-Generated Dataset
Part of Speech Dataset
Model Description
...and 12 more sections

Figures (9)

Figure 1: A multi-hop prompt vs. two analogous single-hop prompts. The outputs are from GPT2-Small.
Figure 2: Diagram of language model reasoning. Highest ranked attention outputs of GPT2-Small at layer $\ell=9$, head $h=8$ when projected into vocabulary space (via the GPT2-Small embedding matrix) for a single-hop prompt (green) and its multi-hop counterpart (red).
Figure 3: Memory injection. Injecting memory "The Great Barrier Reef" into GPT2-Small hidden activations at layer $\ell=9$, head $8$, $\tau=4$.
Figure 4: Curated memory injections. From left to right: GPT2-Small + Hand, GPT2-Large + Hand, GPT2-Small + 2WMH, GPT2-Large + 2WMH. Each cell in each heatmap is the average percent difference between the pre- and post-injection next token predictions for multi-hop prompts. Green cells denote a positive percent difference (i.e., correct prediction is more likely), while red cells denote a negative percent difference (i.e., correct prediction is less likely). When computing the averages for each ($\ell$, $\tau$) pair we exclude outliers not within $\pm2$ standard deviations from the mean.
Figure 5: Part of speech memory injections. This figure shows the average effect of memory injections from various parts of speech as a function of layer $\ell$ (top row) and magnitude $\tau$ (bottom row). The standard deviation scaled by 10% is pictured across magnitudes (top row) and layers (bottom row).
...and 4 more figures

Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models

TL;DR

Abstract

Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (9)