Concept Learning in the Wild: Towards Algorithmic Understanding of Neural Networks
Elad Shoham, Hadar Cohen, Khalil Wattad, Havana Rika, Dan Vilenchik
TL;DR
The paper demonstrates that a graph neural network trained to solve SAT (NeuroSAT) learns high-level, algorithmic concepts such as assignment, support, backbone, majority vote, and appearance count, which are embedded in the top principal components of its latent representations. By applying unsupervised (and sparse) PCA to the embedding covariance, the authors uncover minimal and teachable concepts and show these can be transferred to simpler architectures and even used to rewrite the solver as a white-box textbook algorithm. They further leverage these concepts to improve a classical solver (WalkSAT) and to create concept-guided versions like Textbook NeuroSAT and SupportSAT-01, illustrating practical gains in convergence and interpretability. The work argues for a framework of concept learning in the wild for algorithmic neural networks, offering insights for explainability and principled improvements in combinatorial optimization tasks.
Abstract
Explainable AI (XAI) methods typically focus on identifying essential input features or more abstract concepts for tasks like image or text classification. However, for algorithmic tasks like combinatorial optimization, these concepts may depend not only on the input but also on the current state of the network, like in the graph neural networks (GNN) case. This work studies concept learning for an existing GNN model trained to solve Boolean satisfiability (SAT). \textcolor{black}{Our analysis reveals that the model learns key concepts matching those guiding human-designed SAT heuristics, particularly the notion of 'support.' We demonstrate that these concepts are encoded in the top principal components (PCs) of the embedding's covariance matrix, allowing for unsupervised discovery. Using sparse PCA, we establish the minimality of these concepts and show their teachability through a simplified GNN. Two direct applications of our framework are (a) We improve the convergence time of the classical WalkSAT algorithm and (b) We use the discovered concepts to "reverse-engineer" the black-box GNN and rewrite it as a white-box textbook algorithm. Our results highlight the potential of concept learning in understanding and enhancing algorithmic neural networks for combinatorial optimization tasks.
