An introduction to graphical tensor notation for mechanistic interpretability
Jordan K. Taylor
TL;DR
The paper argues that graphical tensor notation provides a compact, intuition-friendly language for representing linear tensor operations, with the goal of enhancing mechanistic interpretability in neural networks and transformers. It develops the notation from basic tensors to contractions, delta/isometric tensors, and common decompositions such as $M=U D V^\dagger$, $M$ expressed via $\lambda_i$, and higher-order CP/Tucker generalizations, before applying it to tensor networks and neural models. The second half adapts the framework to interpretability work in language models, loosely following transformer-circuit ideas and culminating in a handcrafted toy induction-head circuit to illustrate how composition and fixed attention patterns can encode predictive structure. The approach highlights dualities and path-wise interpretations that can simplify analysis, compression, and communication of mechanistic insights, potentially aiding debugging and transparency in large models. The techniques provide a structured lens for tracing information flow and identifying dominant terms in multi-layer attention networks, with practical impact for understanding how learned behavior emerges in transformers and beyond $M=U D V^\dagger$ and related decompositions$.$
Abstract
Graphical tensor notation is a simple way of denoting linear operations on tensors, originating from physics. Modern deep learning consists almost entirely of operations on or between tensors, so easily understanding tensor operations is quite important for understanding these systems. This is especially true when attempting to reverse-engineer the algorithms learned by a neural network in order to understand its behavior: a field known as mechanistic interpretability. It's often easy to get confused about which operations are happening between tensors and lose sight of the overall structure, but graphical tensor notation makes it easier to parse things at a glance and see interesting equivalences. The first half of this document introduces the notation and applies it to some decompositions (SVD, CP, Tucker, and tensor network decompositions), while the second half applies it to some existing some foundational approaches for mechanistically understanding language models, loosely following ``A Mathematical Framework for Transformer Circuits'', then constructing an example ``induction head'' circuit in graphical tensor notation.
