Table of Contents
Fetching ...

Learning a Generic Value-Selection Heuristic Inside a Constraint Programming Solver

Tom Marty, Tristan François, Pierre Tessier, Louis Gauthier, Louis-Martin Rousseau, Quentin Cappart

TL;DR

This work addresses the lack of generic value-selection heuristics in constraint programming by introducing a learning framework that trains inside a CP solver. It combines restart-based reinforcement learning, a propagation-aware reward signal, and a heterogeneous tripartite graph neural network, implemented in SeaPearl.jl, to derive a value-selection heuristic applicable across CP models. Experiments on graph coloring, maximum independent set, and maximum cut show the method yields high-quality solutions near optimality with few backtracks and competitive performance relative to established heuristics like impact-based and activity-based search. The approach offers a practical, solver-integrated pathway to automatically learn effective value selection for a broad class of CP problems, reducing reliance on problem-specific expertise.

Abstract

Constraint programming is known for being an efficient approach for solving combinatorial problems. Important design choices in a solver are the branching heuristics, which are designed to lead the search to the best solutions in a minimum amount of time. However, developing these heuristics is a time-consuming process that requires problem-specific expertise. This observation has motivated many efforts to use machine learning to automatically learn efficient heuristics without expert intervention. To the best of our knowledge, it is still an open research question. Although several generic variable-selection heuristics are available in the literature, the options for a generic value-selection heuristic are more scarce. In this paper, we propose to tackle this issue by introducing a generic learning procedure that can be used to obtain a value-selection heuristic inside a constraint programming solver. This has been achieved thanks to the combination of a deep Q-learning algorithm, a tailored reward signal, and a heterogeneous graph neural network architecture. Experiments on graph coloring, maximum independent set, and maximum cut problems show that our framework is able to find better solutions close to optimality without requiring a large amounts of backtracks while being generic.

Learning a Generic Value-Selection Heuristic Inside a Constraint Programming Solver

TL;DR

This work addresses the lack of generic value-selection heuristics in constraint programming by introducing a learning framework that trains inside a CP solver. It combines restart-based reinforcement learning, a propagation-aware reward signal, and a heterogeneous tripartite graph neural network, implemented in SeaPearl.jl, to derive a value-selection heuristic applicable across CP models. Experiments on graph coloring, maximum independent set, and maximum cut show the method yields high-quality solutions near optimality with few backtracks and competitive performance relative to established heuristics like impact-based and activity-based search. The approach offers a practical, solver-integrated pathway to automatically learn effective value selection for a broad class of CP problems, reducing reliance on problem-specific expertise.

Abstract

Constraint programming is known for being an efficient approach for solving combinatorial problems. Important design choices in a solver are the branching heuristics, which are designed to lead the search to the best solutions in a minimum amount of time. However, developing these heuristics is a time-consuming process that requires problem-specific expertise. This observation has motivated many efforts to use machine learning to automatically learn efficient heuristics without expert intervention. To the best of our knowledge, it is still an open research question. Although several generic variable-selection heuristics are available in the literature, the options for a generic value-selection heuristic are more scarce. In this paper, we propose to tackle this issue by introducing a generic learning procedure that can be used to obtain a value-selection heuristic inside a constraint programming solver. This has been achieved thanks to the combination of a deep Q-learning algorithm, a tailored reward signal, and a heterogeneous graph neural network architecture. Experiments on graph coloring, maximum independent set, and maximum cut problems show that our framework is able to find better solutions close to optimality without requiring a large amounts of backtracks while being generic.
Paper Structure (7 sections, 1 equation, 1 figure)

This paper contains 7 sections, 1 equation, 1 figure.

Figures (1)

  • Figure 1: The two training procedures (left: depth-first searchchalumeau_seapearl_2021, right: restart-based - ours)