Grokking Group Multiplication with Cosets
Dashiell Stander, Qinan Yu, Honglu Fan, Stella Biderman
TL;DR
This work tackles mechanistic interpretability by reverse-engineering a one-hidden-layer network trained to multiply permutations in the symmetric groups $S_5$ and $S_6$. It uncovers coset-based circuits that decompose group arithmetic using conjugate subgroups and validates the mechanism through ablations and targeted causal interventions. The authors critically compare their coset-circuit explanation with the Group Composition via Representations (GCR) hypothesis, arguing that coset-based decoding more faithfully explains the observed behavior while GCR alone cannot account for all findings. The study highlights the challenges of drawing mechanistic conclusions in neural networks and advocates rigorous causal testing to establish robust, interpretable mechanisms for structured tasks.
Abstract
The complex and unpredictable nature of deep neural networks prevents their safe use in many high-stakes applications. There have been many techniques developed to interpret deep neural networks, but all have substantial limitations. Algorithmic tasks have proven to be a fruitful test ground for interpreting a neural network end-to-end. Building on previous work, we completely reverse engineer fully connected one-hidden layer networks that have ``grokked'' the arithmetic of the permutation groups $S_5$ and $S_6$. The models discover the true subgroup structure of the full group and converge on neural circuits that decompose the group arithmetic using the permutation group's subgroups. We relate how we reverse engineered the model's mechanisms and confirmed our theory was a faithful description of the circuit's functionality. We also draw attention to current challenges in conducting interpretability research by comparing our work to Chughtai et al. [4] which alleges to find a different algorithm for this same problem.
