Table of Contents
Fetching ...

The Anxiety of Influence: Bloom Filters in Transformer Attention Heads

Peter Balogh

TL;DR

Three genuine membership-testing heads form a multi-resolution system concentrated in early layers (0-1), taxonomically distinct from induction and previous-token heads, with false positive rates that decay monotonically with embedding distance -- consistent with distance-sensitive Bloom filters.

Abstract

Some transformer attention heads appear to function as membership testers, dedicating themselves to answering the question "has this token appeared before in the context?" We identify these heads across four language models (GPT-2 small, medium, and large; Pythia-160M) and show that they form a spectrum of membership-testing strategies. Two heads (L0H1 and L0H5 in GPT-2 small) function as high-precision membership filters with false positive rates of 0-4\% even at 180 unique context tokens -- well above the $d_\text{head} = 64$ bit capacity of a classical Bloom filter. A third head (L1H11) shows the classic Bloom filter capacity curve: its false positive rate follows the theoretical formula $p \approx (1 - e^{-kn/m})^k$ with $R^2 = 1.0$ and fitted capacity $m \approx 5$ bits, saturating by $n \approx 20$ unique tokens. A fourth head initially identified as a Bloom filter (L3H0) was reclassified as a general prefix-attention head after confound controls revealed its apparent capacity curve was a sequence-length artifact. Together, the three genuine membership-testing heads form a multi-resolution system concentrated in early layers (0-1), taxonomically distinct from induction and previous-token heads, with false positive rates that decay monotonically with embedding distance -- consistent with distance-sensitive Bloom filters. These heads generalize broadly: they respond to any repeated token type, not just repeated names, with 43\% higher generalization than duplicate-token-only heads. Ablation reveals these heads contribute to both repeated and novel token processing, indicating that membership testing coexists with broader computational roles. The reclassification of L3H0 through confound controls strengthens rather than weakens the case: the surviving heads withstand the scrutiny that eliminated a false positive in our own analysis.

The Anxiety of Influence: Bloom Filters in Transformer Attention Heads

TL;DR

Three genuine membership-testing heads form a multi-resolution system concentrated in early layers (0-1), taxonomically distinct from induction and previous-token heads, with false positive rates that decay monotonically with embedding distance -- consistent with distance-sensitive Bloom filters.

Abstract

Some transformer attention heads appear to function as membership testers, dedicating themselves to answering the question "has this token appeared before in the context?" We identify these heads across four language models (GPT-2 small, medium, and large; Pythia-160M) and show that they form a spectrum of membership-testing strategies. Two heads (L0H1 and L0H5 in GPT-2 small) function as high-precision membership filters with false positive rates of 0-4\% even at 180 unique context tokens -- well above the bit capacity of a classical Bloom filter. A third head (L1H11) shows the classic Bloom filter capacity curve: its false positive rate follows the theoretical formula with and fitted capacity bits, saturating by unique tokens. A fourth head initially identified as a Bloom filter (L3H0) was reclassified as a general prefix-attention head after confound controls revealed its apparent capacity curve was a sequence-length artifact. Together, the three genuine membership-testing heads form a multi-resolution system concentrated in early layers (0-1), taxonomically distinct from induction and previous-token heads, with false positive rates that decay monotonically with embedding distance -- consistent with distance-sensitive Bloom filters. These heads generalize broadly: they respond to any repeated token type, not just repeated names, with 43\% higher generalization than duplicate-token-only heads. Ablation reveals these heads contribute to both repeated and novel token processing, indicating that membership testing coexists with broader computational roles. The reclassification of L3H0 through confound controls strengthens rather than weakens the case: the surviving heads withstand the scrutiny that eliminated a false positive in our own analysis.
Paper Structure (46 sections, 2 equations, 7 figures, 6 tables)

This paper contains 46 sections, 2 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Bloom filter selectivity (hit attention / baseline attention) across all 144 heads in GPT-2 small. Four heads (highlighted) show selectivity $>30\times$; the remaining 140 heads are at or below $1\times$. Note that the classification threshold is $>3\times$ (Section \ref{['sec:methods']}); the gap between the top four (51$\times$--146$\times$) and the fifth-highest head (2.7$\times$) renders the threshold choice immaterial.
  • Figure 2: Hit, baseline, and synonym attention for Bloom filter heads vs. control heads. Bloom heads show extreme separation between hit and baseline conditions.
  • Figure 3: Three non-overlapping functional categories of attention heads in GPT-2 small. Bloom filter heads (red) concentrate in layers 0--3, previous-token heads (blue) in layers 2--6, and induction heads (green) in layers 5--11.
  • Figure 4: Capacity analysis under controlled conditions (fixed sequence length = 200). Left: L1H11 shows the classic Bloom filter saturation curve ($R^2 = 1.0$, $m \approx 5$ bits), with theoretical Bloom filter overlay. L0H1 and L0H5 maintain near-zero FP rates across all loads. Right: L3H0$^\dagger$ (reclassified as prefix-attention head) shows FP = 100% at all load levels, indistinguishable from control head L5H5.
  • Figure 5: (a) Pairwise phi coefficients between Bloom heads; low values indicate largely independent FP decisions. (b) Distribution of how many Bloom heads fire false positives per probe token. The 0-head and 4-head categories are exact counts; the 1--3 head breakdown is estimated from the aggregate "mixed" category (see text).
  • ...and 2 more figures