Table of Contents
Fetching ...

UCPO: A Universal Constrained Combinatorial Optimization Method via Preference Optimization

Zhanhong Fang, Debing Wang, Jinbiao Chen, Jiahai Wang, Zizhen Zhang

TL;DR

UCPO presents a universal, plug-and-play approach to hard-constrained combinatorial optimization by weaving constraint satisfaction into a preference-based learning objective. Through a partial-order formulation, a Bradley–Terry-based preference model, and a three-component loss (Dual Exploration, Feasibility Margin, Primal Refinement), UCPO guides neural solvers toward feasible, high-quality solutions without architectural changes or extensive hyperparameter tuning. A light warm-start fine-tuning protocol leverages pre-trained checkpoints to achieve near-feasibility and near-optimality with only 1%–5% of the original training budget, demonstrated across constrained benchmarks like TSPTW, CVRPTW, TSPDL, and CVRPTWLV. The empirical results show consistent improvements over baselines, robustness to the sole hyperparameter $\lambda$, and strong generalization across backbone models, making UCPO a versatile, low-cost solution for complex constrained neural optimization tasks.

Abstract

Neural solvers have demonstrated remarkable success in combinatorial optimization, often surpassing traditional heuristics in speed, solution quality, and generalization. However, their efficacy deteriorates significantly when confronted with complex constraints that cannot be effectively managed through simple masking mechanisms. To address this limitation, we introduce Universal Constrained Preference Optimization (UCPO), a novel plug-and-play framework that seamlessly integrates preference learning into existing neural solvers via a specially designed loss function, without requiring architectural modifications. UCPO embeds constraint satisfaction directly into a preference-based objective, eliminating the need for meticulous hyperparameter tuning. Leveraging a lightweight warm-start fine-tuning protocol, UCPO enables pre-trained models to consistently produce near-optimal, feasible solutions on challenging constraint-laden tasks, achieving exceptional performance with as little as 1\% of the original training budget.

UCPO: A Universal Constrained Combinatorial Optimization Method via Preference Optimization

TL;DR

UCPO presents a universal, plug-and-play approach to hard-constrained combinatorial optimization by weaving constraint satisfaction into a preference-based learning objective. Through a partial-order formulation, a Bradley–Terry-based preference model, and a three-component loss (Dual Exploration, Feasibility Margin, Primal Refinement), UCPO guides neural solvers toward feasible, high-quality solutions without architectural changes or extensive hyperparameter tuning. A light warm-start fine-tuning protocol leverages pre-trained checkpoints to achieve near-feasibility and near-optimality with only 1%–5% of the original training budget, demonstrated across constrained benchmarks like TSPTW, CVRPTW, TSPDL, and CVRPTWLV. The empirical results show consistent improvements over baselines, robustness to the sole hyperparameter , and strong generalization across backbone models, making UCPO a versatile, low-cost solution for complex constrained neural optimization tasks.

Abstract

Neural solvers have demonstrated remarkable success in combinatorial optimization, often surpassing traditional heuristics in speed, solution quality, and generalization. However, their efficacy deteriorates significantly when confronted with complex constraints that cannot be effectively managed through simple masking mechanisms. To address this limitation, we introduce Universal Constrained Preference Optimization (UCPO), a novel plug-and-play framework that seamlessly integrates preference learning into existing neural solvers via a specially designed loss function, without requiring architectural modifications. UCPO embeds constraint satisfaction directly into a preference-based objective, eliminating the need for meticulous hyperparameter tuning. Leveraging a lightweight warm-start fine-tuning protocol, UCPO enables pre-trained models to consistently produce near-optimal, feasible solutions on challenging constraint-laden tasks, achieving exceptional performance with as little as 1\% of the original training budget.

Paper Structure

This paper contains 79 sections, 85 equations, 2 figures, 12 tables, 1 algorithm.

Figures (2)

  • Figure 1: Overview of the Universal Constrained Preference Optimization (UCPO) framework. The process begins by sampling candidate solutions from the base NCO model, followed by fine-tuning the model using Universal Constrained Preference Loss.
  • Figure 2: Post-training values across two difficulty levels (Medium, Hard).