Emergence and Localisation of Semantic Role Circuits in LLMs
Nura Aljaafari, Danilo S. Carvalho, André Freitas
TL;DR
This work investigates whether large language models ground abstract semantic relations by locating and tracking semantic-role circuits. Using COMPASS, a causal–temporal pipeline that combines role-cross minimal pairs with edge-attribution (EAP-IG) and temporal emergence analysis, the authors show that semantic-role processing concentrates in compact, causally necessary subgraphs whose structure stabilises before functional engagement. They observe gradual circuit refinement across training and model scales, with partial transfer of components across Pythia and LLaMA-1B families. The findings imply that LLMs develop partially modular semantic representations that can be edited or steered through targeted interventions, offering a principled view on how abstract semantics manifests in deep language models and aiding future interpretability and alignment efforts.
Abstract
Despite displaying semantic competence, large language models' internal mechanisms that ground abstract semantic structure remain insufficiently characterised. We propose a method integrating role-cross minimal pairs, temporal emergence analysis, and cross-model comparison to study how LLMs implement semantic roles. Our analysis uncovers: (i) highly concentrated circuits (89-94% attribution within 28 nodes); (ii) gradual structural refinement rather than phase transitions, with larger models sometimes bypassing localised circuits; and (iii) moderate cross-scale conservation (24-59% component overlap) alongside high spectral similarity. These findings suggest that LLMs form compact, causally isolated mechanisms for abstract semantic structure, and these mechanisms exhibit partial transfer across scales and architectures.
