Table of Contents
Fetching ...

Emergence and Localisation of Semantic Role Circuits in LLMs

Nura Aljaafari, Danilo S. Carvalho, André Freitas

TL;DR

This work investigates whether large language models ground abstract semantic relations by locating and tracking semantic-role circuits. Using COMPASS, a causal–temporal pipeline that combines role-cross minimal pairs with edge-attribution (EAP-IG) and temporal emergence analysis, the authors show that semantic-role processing concentrates in compact, causally necessary subgraphs whose structure stabilises before functional engagement. They observe gradual circuit refinement across training and model scales, with partial transfer of components across Pythia and LLaMA-1B families. The findings imply that LLMs develop partially modular semantic representations that can be edited or steered through targeted interventions, offering a principled view on how abstract semantics manifests in deep language models and aiding future interpretability and alignment efforts.

Abstract

Despite displaying semantic competence, large language models' internal mechanisms that ground abstract semantic structure remain insufficiently characterised. We propose a method integrating role-cross minimal pairs, temporal emergence analysis, and cross-model comparison to study how LLMs implement semantic roles. Our analysis uncovers: (i) highly concentrated circuits (89-94% attribution within 28 nodes); (ii) gradual structural refinement rather than phase transitions, with larger models sometimes bypassing localised circuits; and (iii) moderate cross-scale conservation (24-59% component overlap) alongside high spectral similarity. These findings suggest that LLMs form compact, causally isolated mechanisms for abstract semantic structure, and these mechanisms exhibit partial transfer across scales and architectures.

Emergence and Localisation of Semantic Role Circuits in LLMs

TL;DR

This work investigates whether large language models ground abstract semantic relations by locating and tracking semantic-role circuits. Using COMPASS, a causal–temporal pipeline that combines role-cross minimal pairs with edge-attribution (EAP-IG) and temporal emergence analysis, the authors show that semantic-role processing concentrates in compact, causally necessary subgraphs whose structure stabilises before functional engagement. They observe gradual circuit refinement across training and model scales, with partial transfer of components across Pythia and LLaMA-1B families. The findings imply that LLMs develop partially modular semantic representations that can be edited or steered through targeted interventions, offering a principled view on how abstract semantics manifests in deep language models and aiding future interpretability and alignment efforts.

Abstract

Despite displaying semantic competence, large language models' internal mechanisms that ground abstract semantic structure remain insufficiently characterised. We propose a method integrating role-cross minimal pairs, temporal emergence analysis, and cross-model comparison to study how LLMs implement semantic roles. Our analysis uncovers: (i) highly concentrated circuits (89-94% attribution within 28 nodes); (ii) gradual structural refinement rather than phase transitions, with larger models sometimes bypassing localised circuits; and (iii) moderate cross-scale conservation (24-59% component overlap) alongside high spectral similarity. These findings suggest that LLMs form compact, causally isolated mechanisms for abstract semantic structure, and these mechanisms exhibit partial transfer across scales and architectures.

Paper Structure

This paper contains 108 sections, 24 equations, 12 figures, 14 tables, 1 algorithm.

Figures (12)

  • Figure 1: COMPASS methodology. It extracts and tracks the circuits that mediate semantic-role behaviour in LLMs, revealing where role-specific computation occurs and how it develops over training. (1) Role-cross minimal pairs isolate predicate–argument binding. (2) EAP-IG identifies edges whose interventions affect role predictions, producing sparse, causally functional subgraphs. (3) Temporal analysis follows these subgraphs across checkpoints to determine when their structure stabilises and when they become computationally indispensable.
  • Figure 2: Evolution of the Beneficiary circuit across training. The circuit undergoes complex reorganisation from early exploration (step 32) through intensive mid-training feature extraction (step 71000) to its final architecture (step 143000). Beneficiary's delayed structural consolidation ($t_{\mathrm{cons}}=2{,}000$) and persistent complexity reflect the computational demands of distinguishing benefactive from alternative role readings. Edge colour encodes operation type (blue: residual flow; green: value-composition; dashed: negative); edge width is proportional to attribution magnitude.
  • Figure 3: Structural and functional dynamics of role circuits across training. Faithfulness (left) shows pronounced role-dependent volatility, with early peaks and mid-training drops, indicating that circuit usefulness does not increase monotonically. In contrast, structural metrics evolve more smoothly: circuit size (middle) contracts gradually, and edge density (right) rises or stabilises over time. These trends show that circuit structure consolidates early and steadily, while functional engagement remains variable across roles.
  • Figure 4: Cross-role overlap of high-importance components over training (Pythia–1B). Each heatmap shows the Jaccard similarity between the nodes for different roles at a three sample training stages. Overlap remains consistently low, indicating that roles recruit largely distinct component sets. This specialisation strengthens over training, supporting the view that role circuits differentiate rather than collapse into a shared mechanism.
  • Figure 5: Cross-family correspondence for Location (top) and Instrument (bottom). Average node-(left) and edge-level (right) overlaps between Pythia–1B and LLaMA–1B. Node sets align substantially more than edges, suggesting shared component selection but model-specific routing.
  • ...and 7 more figures