A Diagrammatic Approach to Improve Computational Efficiency in Group Equivariant Neural Networks
Edward Pearce-Crump, William J. Knottenbelt
TL;DR
The paper tackles the high computational cost of applying equivariant weight matrices between tensor-power layers in group-equivariant neural networks. It develops a diagrammatic, category-theoretic framework that realises equivariant maps as images of monoidal functors from partition/Brauer categories to group-representation categories, enabling a fast forward pass. By introducing algorithmically planar set partition diagrams and a three-stage multiplication procedure (Factor, Permute, PlanarMult), the authors obtain substantial Big-$O$ improvements across $S_n$, $O(n)$, $Sp(n)$, and $SO(n)$, with tailored group-specific implementations and complexities. This approach promises practical acceleration and broader adoption of high-order tensor-power equivarient networks in domains with symmetry, while preserving exact equivariance via diagrammatic representations.
Abstract
Group equivariant neural networks are growing in importance owing to their ability to generalise well in applications where the data has known underlying symmetries. Recent characterisations of a class of these networks that use high-order tensor power spaces as their layers suggest that they have significant potential; however, their implementation remains challenging owing to the prohibitively expensive nature of the computations that are involved. In this work, we present a fast matrix multiplication algorithm for any equivariant weight matrix that maps between tensor power layer spaces in these networks for four groups: the symmetric, orthogonal, special orthogonal, and symplectic groups. We obtain this algorithm by developing a diagrammatic framework based on category theory that enables us to not only express each weight matrix as a linear combination of diagrams but also makes it possible for us to use these diagrams to factor the original computation into a series of steps that are optimal. We show that this algorithm improves the Big-$O$ time complexity exponentially in comparison to a naïve matrix multiplication.
