Table of Contents
Fetching ...

Interpretable Machine Learning for Kronecker Coefficients

Giorgi Butbaia, Kyu-Hwan Lee, Fabian Ruehle

TL;DR

The paper tackles the problem of predicting whether Kronecker coefficients $g_{\lambda,\mu}^\nu$ vanish, a task with NP-hardness implications. It develops and compares several interpretable ML approaches—gradient saliency, Kolmogorov--Arnold Networks, small neural networks, and symbolic regression—using both 3n-dimensional partition triples and 1-dimensional $b$-loadings, achieving about 83% accuracy with interpretable models and over 99% with transformer architectures. A key finding is that simple, explicit decision functions can be derived in terms of $b$-loadings, while transformers provide the strongest predictive performance. The work highlights how dataset saliency and interpretable models can yield rapid, mechanistic insights into a deep combinatorial problem and points to future work on mechanistic interpretability and richer feature representations.

Abstract

We analyze the saliency of neural networks and employ interpretable machine learning models to predict whether the Kronecker coefficients of the symmetric group are zero or not. Our models use triples of partitions as input features, as well as b-loadings derived from the principal component of an embedding that captures the differences between partitions. Across all approaches, we achieve an accuracy of approximately 83% and derive explicit formulas for a decision function in terms of b-loadings. Additionally, we develop transformer-based models for prediction, achieving the highest reported accuracy of over 99%.

Interpretable Machine Learning for Kronecker Coefficients

TL;DR

The paper tackles the problem of predicting whether Kronecker coefficients vanish, a task with NP-hardness implications. It develops and compares several interpretable ML approaches—gradient saliency, Kolmogorov--Arnold Networks, small neural networks, and symbolic regression—using both 3n-dimensional partition triples and 1-dimensional -loadings, achieving about 83% accuracy with interpretable models and over 99% with transformer architectures. A key finding is that simple, explicit decision functions can be derived in terms of -loadings, while transformers provide the strongest predictive performance. The work highlights how dataset saliency and interpretable models can yield rapid, mechanistic insights into a deep combinatorial problem and points to future work on mechanistic interpretability and richer feature representations.

Abstract

We analyze the saliency of neural networks and employ interpretable machine learning models to predict whether the Kronecker coefficients of the symmetric group are zero or not. Our models use triples of partitions as input features, as well as b-loadings derived from the principal component of an embedding that captures the differences between partitions. Across all approaches, we achieve an accuracy of approximately 83% and derive explicit formulas for a decision function in terms of b-loadings. Additionally, we develop transformer-based models for prediction, achieving the highest reported accuracy of over 99%.

Paper Structure

This paper contains 20 sections, 1 theorem, 19 equations, 9 figures, 1 table.

Key Result

Lemma 2.1

FH Let $\lambda, \mu, \nu \vdash n$. Then the Kronecker coefficients $g_{\lambda, \mu}^\nu$ are invariant under the permutations of $\lambda, \mu, \nu$. That is, we have

Figures (9)

  • Figure 1: Histograms of $b$-loadings of $\mathbf t \in \mathcal{P}(n)^3$ for $n=15$ (left) and $16$ (right) along with curves (red) of gamma distributions.
  • Figure 2: Histograms of $b$-loadings for $n=12$ (left) and $n=13$ (right). The red (resp. blue) region represents the numbers of $\mathbf t$ such that $g(\mathbf t) \neq 0$ (resp. $g(\mathbf t) =0$).
  • Figure 3: The ratio of $\mathbf t$ satisfying $b(\mathbf t) < b_\star$.
  • Figure 4: Left: Accuracies for classification based on $b$-loadings for different decision boundaries. Right: Percentage of non-zero Kronecker coefficients.
  • Figure 5: Explained variance ratio of Principal Components, a heatmap of their loadings, and the Spearman correlation matrix of the inputs for $n=12$.
  • ...and 4 more figures

Theorems & Definitions (8)

  • Lemma 2.1
  • Example 2.2
  • Example 2.3
  • Definition 2.4
  • Example 2.5
  • Remark 2.6
  • Definition 2.7
  • Example 2.8