Table of Contents
Fetching ...

Structure based SAT dataset for analysing GNN generalisation

Yi Fu, Anthony Tompkins, Yang Song, Maurice Pagnucco

TL;DR

StructureSAT addresses how SAT problem structure influences GNN generalisation by assembling 11 diverse problem domains and 9 structural metrics (CNF-based and graph-based) into a structure-aware dataset. It introduces a novel splitting strategy across structure values and evaluates NeuroSAT, GCN, and GIN on in-domain and out-of-domain generalisation tasks, including augmentation with learned clauses. Key findings show CNF-based properties, such as phase transition and backbone size, more consistently affect generalisation, while graph-based properties yield model-dependent effects; results also indicate that matching training and test structural characteristics enhances transfer, and augmentation may not improve raw-data generalisation. Taken together, StructureSAT provides both a robust dataset and a principled methodology to study and improve the structure-aware generalisation of GNN-based SAT solvers, guiding future solver development.

Abstract

Satisfiability (SAT) solvers based on techniques such as conflict driven clause learning (CDCL) have produced excellent performance on both synthetic and real world industrial problems. While these CDCL solvers only operate on a per-problem basis, graph neural network (GNN) based solvers bring new benefits to the field by allowing practitioners to exploit knowledge gained from solved problems to expedite solving of new SAT problems. However, one specific area that is often studied in the context of CDCL solvers, but largely overlooked in GNN solvers, is the relationship between graph theoretic measure of structure in SAT problems and the generalisation ability of GNN solvers. To bridge the gap between structural graph properties (e.g., modularity, self-similarity) and the generalisability (or lack thereof) of GNN based SAT solvers, we present StructureSAT: a curated dataset, along with code to further generate novel examples, containing a diverse set of SAT problems from well known problem domains. Furthermore, we utilise a novel splitting method that focuses on deconstructing the families into more detailed hierarchies based on their structural properties. With the new dataset, we aim to help explain problematic generalisation in existing GNN SAT solvers by exploiting knowledge of structural graph properties. We conclude with multiple future directions that can help researchers in GNN based SAT solving develop more effective and generalisable SAT solvers.

Structure based SAT dataset for analysing GNN generalisation

TL;DR

StructureSAT addresses how SAT problem structure influences GNN generalisation by assembling 11 diverse problem domains and 9 structural metrics (CNF-based and graph-based) into a structure-aware dataset. It introduces a novel splitting strategy across structure values and evaluates NeuroSAT, GCN, and GIN on in-domain and out-of-domain generalisation tasks, including augmentation with learned clauses. Key findings show CNF-based properties, such as phase transition and backbone size, more consistently affect generalisation, while graph-based properties yield model-dependent effects; results also indicate that matching training and test structural characteristics enhances transfer, and augmentation may not improve raw-data generalisation. Taken together, StructureSAT provides both a robust dataset and a principled methodology to study and improve the structure-aware generalisation of GNN-based SAT solvers, guiding future solver development.

Abstract

Satisfiability (SAT) solvers based on techniques such as conflict driven clause learning (CDCL) have produced excellent performance on both synthetic and real world industrial problems. While these CDCL solvers only operate on a per-problem basis, graph neural network (GNN) based solvers bring new benefits to the field by allowing practitioners to exploit knowledge gained from solved problems to expedite solving of new SAT problems. However, one specific area that is often studied in the context of CDCL solvers, but largely overlooked in GNN solvers, is the relationship between graph theoretic measure of structure in SAT problems and the generalisation ability of GNN solvers. To bridge the gap between structural graph properties (e.g., modularity, self-similarity) and the generalisability (or lack thereof) of GNN based SAT solvers, we present StructureSAT: a curated dataset, along with code to further generate novel examples, containing a diverse set of SAT problems from well known problem domains. Furthermore, we utilise a novel splitting method that focuses on deconstructing the families into more detailed hierarchies based on their structural properties. With the new dataset, we aim to help explain problematic generalisation in existing GNN SAT solvers by exploiting knowledge of structural graph properties. We conclude with multiple future directions that can help researchers in GNN based SAT solving develop more effective and generalisable SAT solvers.

Paper Structure

This paper contains 31 sections, 11 figures, 13 tables.

Figures (11)

  • Figure 1: Distributions of various structural graph properties from different problem domains. Left: scale-free measure by variable $\alpha_v$, Middle: fractal dimension $D_f$, Right: modularity $Q$.
  • Figure 2: Distribution of various structural graph properties from different problem domains. Left: scale-free measure by clause $\alpha_c$. Middle: centrality measure$BE$. Right: entropy measure $H$.
  • Figure 3: Distribution of various stuctural graph properties from different problem domains. Left: Proximity:$P_r$. Middle: ratio $c/v$. Right: Treediwth $T_w$.
  • Figure 4: SR correlogram.
  • Figure 5: Random 3-SAT correlogram.
  • ...and 6 more figures