Table of Contents
Fetching ...

Concept Learning in the Wild: Towards Algorithmic Understanding of Neural Networks

Elad Shoham, Hadar Cohen, Khalil Wattad, Havana Rika, Dan Vilenchik

TL;DR

The paper demonstrates that a graph neural network trained to solve SAT (NeuroSAT) learns high-level, algorithmic concepts such as assignment, support, backbone, majority vote, and appearance count, which are embedded in the top principal components of its latent representations. By applying unsupervised (and sparse) PCA to the embedding covariance, the authors uncover minimal and teachable concepts and show these can be transferred to simpler architectures and even used to rewrite the solver as a white-box textbook algorithm. They further leverage these concepts to improve a classical solver (WalkSAT) and to create concept-guided versions like Textbook NeuroSAT and SupportSAT-01, illustrating practical gains in convergence and interpretability. The work argues for a framework of concept learning in the wild for algorithmic neural networks, offering insights for explainability and principled improvements in combinatorial optimization tasks.

Abstract

Explainable AI (XAI) methods typically focus on identifying essential input features or more abstract concepts for tasks like image or text classification. However, for algorithmic tasks like combinatorial optimization, these concepts may depend not only on the input but also on the current state of the network, like in the graph neural networks (GNN) case. This work studies concept learning for an existing GNN model trained to solve Boolean satisfiability (SAT). \textcolor{black}{Our analysis reveals that the model learns key concepts matching those guiding human-designed SAT heuristics, particularly the notion of 'support.' We demonstrate that these concepts are encoded in the top principal components (PCs) of the embedding's covariance matrix, allowing for unsupervised discovery. Using sparse PCA, we establish the minimality of these concepts and show their teachability through a simplified GNN. Two direct applications of our framework are (a) We improve the convergence time of the classical WalkSAT algorithm and (b) We use the discovered concepts to "reverse-engineer" the black-box GNN and rewrite it as a white-box textbook algorithm. Our results highlight the potential of concept learning in understanding and enhancing algorithmic neural networks for combinatorial optimization tasks.

Concept Learning in the Wild: Towards Algorithmic Understanding of Neural Networks

TL;DR

The paper demonstrates that a graph neural network trained to solve SAT (NeuroSAT) learns high-level, algorithmic concepts such as assignment, support, backbone, majority vote, and appearance count, which are embedded in the top principal components of its latent representations. By applying unsupervised (and sparse) PCA to the embedding covariance, the authors uncover minimal and teachable concepts and show these can be transferred to simpler architectures and even used to rewrite the solver as a white-box textbook algorithm. They further leverage these concepts to improve a classical solver (WalkSAT) and to create concept-guided versions like Textbook NeuroSAT and SupportSAT-01, illustrating practical gains in convergence and interpretability. The work argues for a framework of concept learning in the wild for algorithmic neural networks, offering insights for explainability and principled improvements in combinatorial optimization tasks.

Abstract

Explainable AI (XAI) methods typically focus on identifying essential input features or more abstract concepts for tasks like image or text classification. However, for algorithmic tasks like combinatorial optimization, these concepts may depend not only on the input but also on the current state of the network, like in the graph neural networks (GNN) case. This work studies concept learning for an existing GNN model trained to solve Boolean satisfiability (SAT). \textcolor{black}{Our analysis reveals that the model learns key concepts matching those guiding human-designed SAT heuristics, particularly the notion of 'support.' We demonstrate that these concepts are encoded in the top principal components (PCs) of the embedding's covariance matrix, allowing for unsupervised discovery. Using sparse PCA, we establish the minimality of these concepts and show their teachability through a simplified GNN. Two direct applications of our framework are (a) We improve the convergence time of the classical WalkSAT algorithm and (b) We use the discovered concepts to "reverse-engineer" the black-box GNN and rewrite it as a white-box textbook algorithm. Our results highlight the potential of concept learning in understanding and enhancing algorithmic neural networks for combinatorial optimization tasks.

Paper Structure

This paper contains 19 sections, 5 equations, 11 figures, 13 tables, 2 algorithms.

Figures (11)

  • Figure 1: Red dashed arrows show the standard concept learning pipeline, input is embedded, and concepts are extracted, where the concept of "Beak" is learned for image processing. The black arrows show our setting, where input participates in concept learning, together with the embedding, through a dynamic process of repeatedly applying the NN to solve the algorithmic task, for instance, the concept of bipartiteness. The "Concept: Beak" image was taken from cao21.
  • Figure 2: PCA projection of the literal embedding; embedding of a literal and its negation are symmetrical. Three random pairs of literals $(x_i,\bar{x_i})$ are colored with similar absolute PC2-values. Variables with contradiction are colored red and are all near PC1 equals 0.
  • Figure 3: The clause embedding represents support. PCA projection of clause embedding for $c=3.75$ and $n=1500$ color-coded with the number of literals that satisfy the clause. Support clauses tend to have positive PC1 values.
  • Figure 4: PCA projection of the literals' embedding for $c=4.1$ and $n=1500$. The embedding is color-coded with the support count; A larger absolute $PC1$-value means larger support. Figures $(b),(c)$ are executions where NeuroSAT fails to find a satisfying assignment; support is encoded with noise.
  • Figure 5: Figure \ref{['FGR: backbone_sat']} illustrates a clause $C$ containing three random non-backbone variables $x,y,z$ is added to a planted formula with $n=1000$ and $c=15$. Because $x,y,z$ are non-backbone variables, the formula remains satisfying, and no detection occurs. Figure \ref{['FGR: backbone_unsat']} is the same experiment, but this time $x,y,z$ are part of the backbone. Making the instance UNSAT as $x,y,z$ are all unsatisfied in $C$. NeuroSAT identifies the three. Figure \ref{['FGR: bb_satlib_valid']} is NeuroSAT's embedding of the backbone dataset ($b=90\%$ from the SATLIB dataset). The black backbone variables are placed "correctly" outside the $[-2,2]$ interval of zero support, where the majority of non-backbone variables lie. Figure \ref{['FGR: bb_satlib__invalid']} shows the embedding when NeuroSAT fails to reach a satisfying assignment; in this case, some backbone variables are misplaced. Backbone variables are also color-coded by support count, the darker the greyscale, the higher the support.
  • ...and 6 more figures

Theorems & Definitions (4)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4