Table of Contents
Fetching ...

The Quantum Sieve Tracer: A Hybrid Framework for Layer-Wise Activation Tracing in Large Language Models

Jonathan Pan

TL;DR

The paper tackles the challenge of disentangling polysemantic signals in LLMs to improve mechanistic interpretability. It introduces the Quantum Sieve Tracer, a hybrid Locate-then-Analyze pipeline that couples classical causal localization with quantum kernel analysis to map activation geometry in a high-dimensional Hilbert space. Key findings reveal architecture-specific recall mechanisms—Qwen exhibits Early Retrieval at Layer 7, while Llama shows Layer 9 Interference Suppression—demonstrated by head-level fidelity and ablation results, with classical and quantum traces showing near-orthogonality ($\rho \approx -0.04$). This approach provides a high-resolution, architecture-aware tool for probing the fine-grained topology of attention and paves the way for quantum-assisted interventions and more nuanced model design.

Abstract

Mechanistic interpretability aims to reverse-engineer the internal computations of Large Language Models (LLMs), yet separating sparse semantic signals from high-dimensional polysemantic noise remains a significant challenge. This paper introduces the Quantum Sieve Tracer, a hybrid quantum-classical framework designed to characterize factual recall circuits. We implement a modular pipeline that first localizes critical layers using classical causal tracing, then maps specific attention head activations into an exponentially large quantum Hilbert space. Using open-weight models (Meta Llama-3.2-1B and Alibaba Qwen2.5-1.5B-Instruct), we perform a two-stage analysis that reveals a fundamental architectural divergence. While Qwen's layer 7 circuit functions as a classic Recall Hub, we discover that Llama's layer 9 acts as an Interference Suppression circuit, where ablating the identified heads paradoxically improves factual recall. Our results demonstrate that quantum kernels can distinguish between these constructive (recall) and reductive (suppression) mechanisms, offering a high-resolution tool for analyzing the fine-grained topology of attention.

The Quantum Sieve Tracer: A Hybrid Framework for Layer-Wise Activation Tracing in Large Language Models

TL;DR

The paper tackles the challenge of disentangling polysemantic signals in LLMs to improve mechanistic interpretability. It introduces the Quantum Sieve Tracer, a hybrid Locate-then-Analyze pipeline that couples classical causal localization with quantum kernel analysis to map activation geometry in a high-dimensional Hilbert space. Key findings reveal architecture-specific recall mechanisms—Qwen exhibits Early Retrieval at Layer 7, while Llama shows Layer 9 Interference Suppression—demonstrated by head-level fidelity and ablation results, with classical and quantum traces showing near-orthogonality (). This approach provides a high-resolution, architecture-aware tool for probing the fine-grained topology of attention and paves the way for quantum-assisted interventions and more nuanced model design.

Abstract

Mechanistic interpretability aims to reverse-engineer the internal computations of Large Language Models (LLMs), yet separating sparse semantic signals from high-dimensional polysemantic noise remains a significant challenge. This paper introduces the Quantum Sieve Tracer, a hybrid quantum-classical framework designed to characterize factual recall circuits. We implement a modular pipeline that first localizes critical layers using classical causal tracing, then maps specific attention head activations into an exponentially large quantum Hilbert space. Using open-weight models (Meta Llama-3.2-1B and Alibaba Qwen2.5-1.5B-Instruct), we perform a two-stage analysis that reveals a fundamental architectural divergence. While Qwen's layer 7 circuit functions as a classic Recall Hub, we discover that Llama's layer 9 acts as an Interference Suppression circuit, where ablating the identified heads paradoxically improves factual recall. Our results demonstrate that quantum kernels can distinguish between these constructive (recall) and reductive (suppression) mechanisms, offering a high-resolution tool for analyzing the fine-grained topology of attention.
Paper Structure (19 sections, 1 equation, 4 figures, 1 table)

This paper contains 19 sections, 1 equation, 4 figures, 1 table.

Figures (4)

  • Figure 1: Causal Trace for Llama-3.2-1B. The Recovery Score (y-axis) peaks sharply at Layer 9, indicating this layer is the primary mediator for integrating the factual subject.
  • Figure 2: Causal Trace for Qwen2.5-1.5B-Instruct. The steepest rise in causal influence occurs earlier at Layer 7.
  • Figure 3: Head-by-Head Interaction Matrix ($K$) generated by the Quantum Sieve at Layer 9 for Llama-3.2-1B. The heatmap visualizes the quantum fidelity between different attention heads.
  • Figure 4: Head-by-Head Interaction Matrix ($K$) for Qwen2.5-1.5B-Instruct at Layer 7. The distinct patterns indicate the varying degrees of orthogonality between attention heads at this critical depth.