Table of Contents
Fetching ...

Benchmarking Graph Neural Networks in Solving Hard Constraint Satisfaction Problems

Geri Skenderi, Lorenzo Buffoni, Francesco D'Amico, David Machado, Raffaele Marino, Matteo Negri, Federico Ricci-Tersenghi, Carlo Lucibello, Maria Chiara Angelini

TL;DR

A fair comparison shows that classical algorithms still outperform GNNs, and proposes new hard benchmarks based on random problems, which can be made more robust using the benchmarks provided.

Abstract

Graph neural networks (GNNs) are increasingly applied to hard optimization problems, often claiming superiority over classical heuristics. However, such claims risk being unsolid due to a lack of standard benchmarks on truly hard instances. From a statistical physics perspective, we propose new hard benchmarks based on random problems. We provide these benchmarks, along with performance results from both classical heuristics and GNNs. Our fair comparison shows that classical algorithms still outperform GNNs. We discuss the challenges for neural networks in this domain. Future claims of superiority can be made more robust using our benchmarks, available at https://github.com/ArtLabBocconi/RandCSPBench.

Benchmarking Graph Neural Networks in Solving Hard Constraint Satisfaction Problems

TL;DR

A fair comparison shows that classical algorithms still outperform GNNs, and proposes new hard benchmarks based on random problems, which can be made more robust using the benchmarks provided.

Abstract

Graph neural networks (GNNs) are increasingly applied to hard optimization problems, often claiming superiority over classical heuristics. However, such claims risk being unsolid due to a lack of standard benchmarks on truly hard instances. From a statistical physics perspective, we propose new hard benchmarks based on random problems. We provide these benchmarks, along with performance results from both classical heuristics and GNNs. Our fair comparison shows that classical algorithms still outperform GNNs. We discuss the challenges for neural networks in this domain. Future claims of superiority can be made more robust using our benchmarks, available at https://github.com/ArtLabBocconi/RandCSPBench.
Paper Structure (33 sections, 18 equations, 9 figures, 5 tables, 2 algorithms)

This paper contains 33 sections, 18 equations, 9 figures, 5 tables, 2 algorithms.

Figures (9)

  • Figure 1: Probability of finding a satisfying assignment to 3-SAT problems as a function of $\alpha=M/N$ with the supervised (top row) and unsupervised (bottom row) NeuroSAT GNN model. To validate claims regarding the importance of scaling the inference time message-passing iterations of GNNs with the problem size ($N$), we test the model in three different scenarios by performing a small number of fixed iterations (left), a large number of fixed iterations (center), and linearly scaling the number of iterations with the problem size (right). The figure shows that the linear scaling procedure produces excellent results. It is worth noting that with the scaling, inference time is handled more consistently (by definition). For instance, using 512 iterations takes significantly longer for smaller sizes compared to employing linear scaling. Finally, it is clear to see that training an unsupervised model produces much better results compared to the supervised case.
  • Figure 2: Probability of finding a satisfiable assignment using different algorithms at fixed size ($N=256$ on the left and $N=1024$ on the right) for the $K$-SAT ($K=3,4$) and $q$-col ($q=3,5$) problems. We focus on two sizes: $N=256$ is the largest size in the training set, and $N=1024$, which is out-of-distribution, to check the generalization power.
  • Figure 3: Estimation of the algorithmic threshold in the large size limit for different algorithms and problems, indicated in the label under the panel. The left panel shows the easy situation, when crossings of curves for different sizes take place at the same point (apart from small finite-size corrections), and the crossing point corresponds to the algorithmic threshold in the large $N$ limit. Instead, in the central and the right panels, we report more difficult situations, where the crossings are not present or happen very close to boundaries ($P=0$ or $P=1$). In the latter cases, the curves measured at finite $N$ provide a bound to the algorithmic threshold in the large $N$ limit. Data in the central panel provide a lower bound, while those in the right panel provide an upper bound.
  • Figure 4: Average extensive energy of the best proposed configuration using different algorithms at fixed size ($N=256$ on the left and $N=1024$ on the right) for the $K$-sat ($K=3,4$) and $q$-coloring ($q=3,5$) problems. We focus on two sizes: $N=256$ is the largest size in the training set, and $N=1024$, which is out-of-distribution, to check the generalization power of GNN.
  • Figure 5: Solving probability and corresponding extensive energy for the FMS and SP algorithms on problems with large values of $N$.
  • ...and 4 more figures