Table of Contents
Fetching ...

HardSATGEN: Understanding the Difficulty of Hard SAT Formula Generation and A Strong Structure-Hardness-Aware Baseline

Yang Li, Xinyan Chen, Wenxuan Guo, Xijun Li, Wanqian Luo, Junhua Huang, Hui-Ling Zhen, Mingxuan Yuan, Junchi Yan

TL;DR

HardSATGEN addresses the gap in industrial SAT formula generation by explicitly incorporating structure and hardness signals into a multi-stage, community/core-aware generation pipeline. By embedding prior knowledge of communities and cores into a bipartite graph representation and applying a two-stage merge strategy guided by dedicated GNNs, it achieves superior alignment with both the global graph structure and solver-hardness characteristics of real-world instances. Empirical results across public benchmarks and private industrial data show substantial improvements over hand-crafted and prior learning-based generators in structural fidelity (up to 38.5% relative error reduction) and hardness reproduction (up to 88.4% relative error reduction), with additional benefits for hyperparameter tuning of SAT solvers. The approach enables controllable generation via a modularity-α parameter and demonstrates practical value for solver optimization in domain-specific scenarios, with source code publicly available.

Abstract

Industrial SAT formula generation is a critical yet challenging task. Existing SAT generation approaches can hardly simultaneously capture the global structural properties and maintain plausible computational hardness. We first present an in-depth analysis for the limitation of previous learning methods in reproducing the computational hardness of original instances, which may stem from the inherent homogeneity in their adopted split-merge procedure. On top of the observations that industrial formulae exhibit clear community structure and oversplit substructures lead to the difficulty in semantic formation of logical structures, we propose HardSATGEN, which introduces a fine-grained control mechanism to the neural split-merge paradigm for SAT formula generation to better recover the structural and computational properties of the industrial benchmarks. Experiments including evaluations on private and practical corporate testbed show the superiority of HardSATGEN being the only method to successfully augment formulae maintaining similar computational hardness and capturing the global structural properties simultaneously. Compared to the best previous methods, the average performance gains achieve 38.5% in structural statistics, 88.4% in computational metrics, and over 140.7% in the effectiveness of guiding solver tuning by our generated instances. Source code is available at http://github.com/Thinklab-SJTU/HardSATGEN

HardSATGEN: Understanding the Difficulty of Hard SAT Formula Generation and A Strong Structure-Hardness-Aware Baseline

TL;DR

HardSATGEN addresses the gap in industrial SAT formula generation by explicitly incorporating structure and hardness signals into a multi-stage, community/core-aware generation pipeline. By embedding prior knowledge of communities and cores into a bipartite graph representation and applying a two-stage merge strategy guided by dedicated GNNs, it achieves superior alignment with both the global graph structure and solver-hardness characteristics of real-world instances. Empirical results across public benchmarks and private industrial data show substantial improvements over hand-crafted and prior learning-based generators in structural fidelity (up to 38.5% relative error reduction) and hardness reproduction (up to 88.4% relative error reduction), with additional benefits for hyperparameter tuning of SAT solvers. The approach enables controllable generation via a modularity-α parameter and demonstrates practical value for solver optimization in domain-specific scenarios, with source code publicly available.

Abstract

Industrial SAT formula generation is a critical yet challenging task. Existing SAT generation approaches can hardly simultaneously capture the global structural properties and maintain plausible computational hardness. We first present an in-depth analysis for the limitation of previous learning methods in reproducing the computational hardness of original instances, which may stem from the inherent homogeneity in their adopted split-merge procedure. On top of the observations that industrial formulae exhibit clear community structure and oversplit substructures lead to the difficulty in semantic formation of logical structures, we propose HardSATGEN, which introduces a fine-grained control mechanism to the neural split-merge paradigm for SAT formula generation to better recover the structural and computational properties of the industrial benchmarks. Experiments including evaluations on private and practical corporate testbed show the superiority of HardSATGEN being the only method to successfully augment formulae maintaining similar computational hardness and capturing the global structural properties simultaneously. Compared to the best previous methods, the average performance gains achieve 38.5% in structural statistics, 88.4% in computational metrics, and over 140.7% in the effectiveness of guiding solver tuning by our generated instances. Source code is available at http://github.com/Thinklab-SJTU/HardSATGEN
Paper Structure (44 sections, 7 equations, 11 figures, 8 tables)

This paper contains 44 sections, 7 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Node splitting and merging operations on graphs.
  • Figure 2: Visualization of clause feature distributions during node split-merge process. The features are extracted by the learned GNN and the dimension reduction is performed by PCA pearson1901liii. The clause features tend to collapse to certain patterns since the clauses presumably exhibit similar structures.
  • Figure 3: Visualization of the graph building process.
  • Figure 4: Visualization of node splitting process.
  • Figure 5: We devise a novel post-processing step that iteratively transforms unexpected small cores to satisfiable substructures by introducing a new variable to one of its clauses.
  • ...and 6 more figures