Table of Contents
Fetching ...

Discovering Abstract Symbolic Relations by Learning Unitary Group Representations

Dongsung Huh

TL;DR

This work reframes symbolic operation completion (SOC) as a minimal yet informative testbed for symbolic reasoning and introduces HyperCube, a bilinear model parameterized by three order-3 tensor factors that encode symbols as operators via matrix embeddings. A novel regularizer promotes unitary, group-like representations, yielding exact recovery of group operation tables in many cases, faster learning, and interpretable internal structure that aligns with the regular representation and its irreducible components. The approach demonstrates strong generalization across diverse SOC datasets, including non-group operations, and suggests a universal inductive bias toward discovering underlying group structures, with implications for automatic symmetry discovery and the construction of symmetry-aware architectures. While offering practical speedups and interpretability, the method faces memory challenges due to tensor factors and raises open questions about generalization guarantees and scaling to more complex symbolic domains.

Abstract

We investigate a principled approach for symbolic operation completion (SOC), a minimal task for studying symbolic reasoning. While conceptually similar to matrix completion, SOC poses a unique challenge in modeling abstract relationships between discrete symbols. We demonstrate that SOC can be efficiently solved by a minimal model - a bilinear map - with a novel factorized architecture. Inspired by group representation theory, this architecture leverages matrix embeddings of symbols, modeling each symbol as an operator that dynamically influences others. Our model achieves perfect test accuracy on SOC with comparable or superior sample efficiency to Transformer baselines across most datasets, while boasting significantly faster learning speeds (100-1000$\times$). Crucially, the model exhibits an implicit bias towards learning general group structures, precisely discovering the unitary representations of underlying groups. This remarkable property not only confers interpretability but also significant implications for automatic symmetry discovery in geometric deep learning. Overall, our work establishes group theory as a powerful guiding principle for discovering abstract algebraic structures in deep learning, and showcases matrix representations as a compelling alternative to traditional vector embeddings for modeling symbolic relationships.

Discovering Abstract Symbolic Relations by Learning Unitary Group Representations

TL;DR

This work reframes symbolic operation completion (SOC) as a minimal yet informative testbed for symbolic reasoning and introduces HyperCube, a bilinear model parameterized by three order-3 tensor factors that encode symbols as operators via matrix embeddings. A novel regularizer promotes unitary, group-like representations, yielding exact recovery of group operation tables in many cases, faster learning, and interpretable internal structure that aligns with the regular representation and its irreducible components. The approach demonstrates strong generalization across diverse SOC datasets, including non-group operations, and suggests a universal inductive bias toward discovering underlying group structures, with implications for automatic symmetry discovery and the construction of symmetry-aware architectures. While offering practical speedups and interpretability, the method faces memory challenges due to tensor factors and raises open questions about generalization guarantees and scaling to more complex symbolic domains.

Abstract

We investigate a principled approach for symbolic operation completion (SOC), a minimal task for studying symbolic reasoning. While conceptually similar to matrix completion, SOC poses a unique challenge in modeling abstract relationships between discrete symbols. We demonstrate that SOC can be efficiently solved by a minimal model - a bilinear map - with a novel factorized architecture. Inspired by group representation theory, this architecture leverages matrix embeddings of symbols, modeling each symbol as an operator that dynamically influences others. Our model achieves perfect test accuracy on SOC with comparable or superior sample efficiency to Transformer baselines across most datasets, while boasting significantly faster learning speeds (100-1000). Crucially, the model exhibits an implicit bias towards learning general group structures, precisely discovering the unitary representations of underlying groups. This remarkable property not only confers interpretability but also significant implications for automatic symmetry discovery in geometric deep learning. Overall, our work establishes group theory as a powerful guiding principle for discovering abstract algebraic structures in deep learning, and showcases matrix representations as a compelling alternative to traditional vector embeddings for modeling symbolic relationships.
Paper Structure (51 sections, 5 theorems, 30 equations, 14 figures)

This paper contains 51 sections, 5 theorems, 30 equations, 14 figures.

Key Result

Proposition 4.1

If $A,B,C$ form the optimal solution of the regularized loss eq eq:regularized_loss, then any unitary basis changes leave the solution optimal, but non-unitary basis changes generally increase the loss.

Figures (14)

  • Figure 1: Small symbolic operation tables (Cayley tables): Symmetric (permutation) group $S_3$, modular addition, subtraction, and squared addition. Elements of $S_3$ are illustrated in Figure \ref{['fig:S3_elements_illustrated']}.
  • Figure 2: Visual illustration of matrix and tensor products. Nodes are factors and edges are indices. (Left) Matrix product. (Middle) Matrix product with trace operation. (Right) HyperCube product.
  • Figure 3: Model slices $T_{\cdot\cdot c}$ after trained on the $S_3$ dataset. Training data are marked by stars (1s) and circles (0s).
  • Figure 4: Optimization trajectories on the $S_3$ dataset with 60% training data fraction. (Top) Unregularized, (Middle) L2-regularized, and (Bottom) $\mathcal{H}$-regularized training. Column 3 shows the average imbalance $(\Vert \xi_I\Vert^2_F +\Vert\xi_J\Vert^2_F + \Vert\xi_K\Vert^2_F)^{1/2}$, and column 4 shows deviation from C-unitarity $\Vert \sum_a A_{a} A_{a}^\dagger/n - \alpha^2 I \Vert^2_F$ and S-unitarity $\Vert A_{a} A_{a}^\dagger - \alpha_{A_a}^2 I \Vert^2_F$, averaged over all factors and slices. Column 5 shows normalized singular values of unfolded factors $A,B,C$.
  • Figure 5: Generalization performance (test accuracy) shown as a function of training data fraction across a diverse set of symbolic operation tasks. Trial-to-trial variation due to randomized model initialization and data split is shown as dots for Transformer and as shaded area for HyperCubes.
  • ...and 9 more figures

Theorems & Definitions (11)

  • Proposition 4.1
  • Lemma 5.1: Balanced Condition
  • Definition 5.2: Contracted Unitarity
  • Proposition 5.3
  • Lemma 5.4
  • Definition 5.5: Slice Unitarity
  • Conjecture 6.1
  • proof
  • Lemma C.1
  • proof
  • ...and 1 more