Table of Contents
Fetching ...

LLMs Process Lists With General Filter Heads

Arnab Sen Sharma, Giordano Rogers, Natalie Shapira, David Bau

TL;DR

The paper reveals that transformer LLMs implement list-filtering as a modular, transferable computation via specialized filter heads that encode predicates in their query states, enabling a lazy, portable filtering primitive. It shows a parallel, coexisting eager strategy where is_match flags can be stored in item latents, illustrating dual pathways akin to lazy vs. eager evaluation. Through causal mediation analysis and activation patching across six filter-reduce tasks, the authors demonstrate generalization of predicate representations across formats, languages, and tasks, identify the essential role of filter heads, and provide a lightweight, training-free probe mechanism for concept detection. These findings illuminate how neural transformers can internalize and generalize symbolic-like operations, offering insight into the emergence of reusable computational primitives in AI systems and guiding future interpretability and refinement efforts.

Abstract

We investigate the mechanisms underlying a range of list-processing tasks in LLMs, and we find that LLMs have learned to encode a compact, causal representation of a general filtering operation that mirrors the generic "filter" function of functional programming. Using causal mediation analysis on a diverse set of list-processing tasks, we find that a small number of attention heads, which we dub filter heads, encode a compact representation of the filtering predicate in their query states at certain tokens. We demonstrate that this predicate representation is general and portable: it can be extracted and reapplied to execute the same filtering operation on different collections, presented in different formats, languages, or even in tasks. However, we also identify situations where transformer LMs can exploit a different strategy for filtering: eagerly evaluating if an item satisfies the predicate and storing this intermediate result as a flag directly in the item representations. Our results reveal that transformer LMs can develop human-interpretable implementations of abstract computational operations that generalize in ways that are surprisingly similar to strategies used in traditional functional programming patterns.

LLMs Process Lists With General Filter Heads

TL;DR

The paper reveals that transformer LLMs implement list-filtering as a modular, transferable computation via specialized filter heads that encode predicates in their query states, enabling a lazy, portable filtering primitive. It shows a parallel, coexisting eager strategy where is_match flags can be stored in item latents, illustrating dual pathways akin to lazy vs. eager evaluation. Through causal mediation analysis and activation patching across six filter-reduce tasks, the authors demonstrate generalization of predicate representations across formats, languages, and tasks, identify the essential role of filter heads, and provide a lightweight, training-free probe mechanism for concept detection. These findings illuminate how neural transformers can internalize and generalize symbolic-like operations, offering insight into the emergence of reusable computational primitives in AI systems and guiding future interpretability and refinement efforts.

Abstract

We investigate the mechanisms underlying a range of list-processing tasks in LLMs, and we find that LLMs have learned to encode a compact, causal representation of a general filtering operation that mirrors the generic "filter" function of functional programming. Using causal mediation analysis on a diverse set of list-processing tasks, we find that a small number of attention heads, which we dub filter heads, encode a compact representation of the filtering predicate in their query states at certain tokens. We demonstrate that this predicate representation is general and portable: it can be extracted and reapplied to execute the same filtering operation on different collections, presented in different formats, languages, or even in tasks. However, we also identify situations where transformer LMs can exploit a different strategy for filtering: eagerly evaluating if an item satisfies the predicate and storing this intermediate result as a flag directly in the item representations. Our results reveal that transformer LMs can develop human-interpretable implementations of abstract computational operations that generalize in ways that are surprisingly similar to strategies used in traditional functional programming patterns.

Paper Structure

This paper contains 72 sections, 10 equations, 25 figures, 11 tables.

Figures (25)

  • Figure 1: A filter head $[35, 19]$ in Llama-70B encodes a compact representation of the predicate "is this fruit?". (a) Within a prompt $p_{\mathrm{src}}$ to find a fruit in a list, we examine the attention head's behavior at the last token ":" (b) The head focuses its attention on the one fruit in the list. (c) We examine the same attention head's behavior in a second prompt $p_{\mathrm{dest}}$ searching a different list for a vehicle (d) and we also examine the behavior of the head when patching its query state to use the $q_{\mathrm{src}}$ vector from the source context. (e) The head attends to the vehicle but then (f) redirects its attention to the fruit in the new list after the query vector is patched. (g) A sparse set of attention heads work together to conduct filtering over a wide range of predicates. These filter heads are concentrated in the middle layers (out of 80 layers in Llama-70B).
  • Figure 2: Filter heads retain a causality close to 0.8 even with 7 distractors in the collections of the destination prompt.
  • Figure 2: Portability of predicate representations across linguistic variations. The predicate vector $q_\mathrm{src}$ is extracted from a source prompt and patched to destination prompts in (a) different languages, (b) different presentation formats for the items, and (c) placing the question before or after presenting the collection.
  • Figure 3: Generalization across different tasks. (a) shows whether the heads identified with one task (rows) maintain causal influence in another task (columns). (b) shows how portable the predicate representation is across tasks. The predicate rep $q_{\text{src}}$ is cached from one source task example (e.g., find the fruit in SelectOne task) and was patched to an example from another destination task (e.g., count the vehicles in Counting task). The heatmap shows causality scores, i.e. whether the LM correctly performs the destination task with the transferred predicate (e.g., count the fruits). For both (a) and (b) the values in the diagonal grid show within task scores.
  • Figure 3: LM performance on filtering tasks drops significantly when filter heads are ablated. These heads constitute $<2\%$ of the heads in the LM. Evaluated on 512 samples that the LM predicts correctly without any ablation (baseline $100\%$).
  • ...and 20 more figures