Table of Contents
Fetching ...

Teaching Transformers to Solve Combinatorial Problems through Efficient Trial & Error

Panagiotis Giannoulis, Yorgos Pantis, Christos Tzamos

TL;DR

This work trains a Transformer to solve combinatorial NP problems by integrating imitation learning with trial-and-error search guided by a verifier, using Sudoku as the principal testbed. It achieves near-perfect Sudoku performance (≈99%) and 99% on 1-in-3 SAT by combining a compact action-level transcript encoding, multi-target next-token losses, and a DFS-style backtracking framework. Beyond imitation, it formalizes the guessing step as a contextual Min-Sum Set Cover problem, introducing a novel loss that minimizes expected solution length and yielding faster inference (median ≈1.5 guesses). The approach is validated on diverse puzzle distributions with a fast generator (SudokuPy) and demonstrates strong generalization, outperforming prior NN-based and autoregressive methods. The work lays groundwork for applying structured reasoning with LLMs to broader NP problems while highlighting current limits to depth and adaptivity in search strategies.

Abstract

Despite their proficiency in various language tasks, Large Language Models (LLMs) struggle with combinatorial problems like Satisfiability, Traveling Salesman Problem, or even basic arithmetic. We address this gap through a novel trial & error approach for solving problems in the class NP, where candidate solutions are iteratively generated and efficiently validated using verifiers. We focus on the paradigmatic task of Sudoku and achieve state-of-the-art accuracy (99%) compared to prior neuro-symbolic approaches. Unlike prior work that used custom architectures, our method employs a vanilla decoder-only Transformer (GPT-2) without external tools or function calling. Our method integrates imitation learning of simple Sudoku rules with an explicit Depth-First Search (DFS) exploration strategy involving informed guessing and backtracking. Moving beyond imitation learning, we seek to minimize the number of guesses until reaching a solution. This is achieved using depth-1 guessing, showing empirically that almost all Sudoku can be solved using the puzzle's rules with at most one guess. We provide a rigorous analysis of this setup formalizing its connection to a contextual variant of Min-Sum Set Cover, a well-studied problem in algorithms and stochastic optimization.

Teaching Transformers to Solve Combinatorial Problems through Efficient Trial & Error

TL;DR

This work trains a Transformer to solve combinatorial NP problems by integrating imitation learning with trial-and-error search guided by a verifier, using Sudoku as the principal testbed. It achieves near-perfect Sudoku performance (≈99%) and 99% on 1-in-3 SAT by combining a compact action-level transcript encoding, multi-target next-token losses, and a DFS-style backtracking framework. Beyond imitation, it formalizes the guessing step as a contextual Min-Sum Set Cover problem, introducing a novel loss that minimizes expected solution length and yielding faster inference (median ≈1.5 guesses). The approach is validated on diverse puzzle distributions with a fast generator (SudokuPy) and demonstrates strong generalization, outperforming prior NN-based and autoregressive methods. The work lays groundwork for applying structured reasoning with LLMs to broader NP problems while highlighting current limits to depth and adaptivity in search strategies.

Abstract

Despite their proficiency in various language tasks, Large Language Models (LLMs) struggle with combinatorial problems like Satisfiability, Traveling Salesman Problem, or even basic arithmetic. We address this gap through a novel trial & error approach for solving problems in the class NP, where candidate solutions are iteratively generated and efficiently validated using verifiers. We focus on the paradigmatic task of Sudoku and achieve state-of-the-art accuracy (99%) compared to prior neuro-symbolic approaches. Unlike prior work that used custom architectures, our method employs a vanilla decoder-only Transformer (GPT-2) without external tools or function calling. Our method integrates imitation learning of simple Sudoku rules with an explicit Depth-First Search (DFS) exploration strategy involving informed guessing and backtracking. Moving beyond imitation learning, we seek to minimize the number of guesses until reaching a solution. This is achieved using depth-1 guessing, showing empirically that almost all Sudoku can be solved using the puzzle's rules with at most one guess. We provide a rigorous analysis of this setup formalizing its connection to a contextual variant of Min-Sum Set Cover, a well-studied problem in algorithms and stochastic optimization.

Paper Structure

This paper contains 36 sections, 2 theorems, 8 equations, 8 figures, 8 tables.

Key Result

Theorem E.1

It is NP-hard to compute a policy that approximates the optimal by a factor better than 4. The greedy policy that always selects the most likely choice conditional on the set not being covered by the choices explored so far is 4-approximate.

Figures (8)

  • Figure 1: Comparison of board accuracy with previous state-of-the-art models on 100K randomly generated Sudoku. As some models are trained on a different dataset, we retrain them using our random Sudoku generator and report their increased accuracy with shaded bars.
  • Figure 2: Example of a training transcript. Values in blue brackets indicate the multiple valid labels for the output of each token during the rule-application step. Yellow brackets show the set of valid candidates when the model reaches a guess token at a given level. The selected guess is shown as a gray circle with yellow outline. If it leads to a dead end, a follow-up guess is made at the same cell.
  • Figure 3: Rule-logic accuracy during training for three model variants differing in token encoding and loss function.
  • Figure 4: Board accuracy during training on samples from our generator, evaluated on Random, Kaggle unfiltered, and RRN datasets.
  • Figure 5: One-level guessing with restarts. The figure illustrates both training and decoding. During decoding, a guess is sampled from the output of the guess node. If the guess is not a backdoor, even if correct, subsequent application of rules fails to produce a solution and a restart is triggered.
  • ...and 3 more figures

Theorems & Definitions (6)

  • Theorem E.1: From feige2002approximating
  • Example E.1
  • Definition E.1
  • Theorem E.2
  • proof
  • Remark E.1