Table of Contents
Fetching ...

CADO: From Imitation to Cost Minimization for Heatmap-based Solvers in Combinatorial Optimization

Hyungseok Song, Deunsol Yoon, Kanghoon Lee, Han-Seul Jeong, Soonyoung Lee, Woohyung Lim

TL;DR

This paper identifies a fundamental objective mismatch in heatmap-based combinatorial optimization solvers trained with supervised imitation: minimizing imitation loss does not guarantee minimal post-decoded cost. It introduces CADO, a cost-aware diffusion-model fine-tuning framework that formulates the denoising process as an MDP and directly optimizes the post-decoded objective via RL, using a novel Label-Centered Reward to leverage ground-truth baselines. The authors show that addressing Decoder-Blindness and Cost-Blindness yields significant performance gains across TSP and MIS benchmarks, including TSPLIB, and demonstrate robustness to suboptimal training data. The work also contributes a practical Hybrid Fine-Tuning strategy with LoRA and selective layer retraining, establishing a model-agnostic approach that improves a range of heatmap-based solvers and highlights the importance of objective alignment for scalable, cost-effective CO solvers.

Abstract

Heatmap-based solvers have emerged as a promising paradigm for Combinatorial Optimization (CO). However, we argue that the dominant Supervised Learning (SL) training paradigm suffers from a fundamental objective mismatch: minimizing imitation loss (e.g., cross-entropy) does not guarantee solution cost minimization. We dissect this mismatch into two deficiencies: Decoder-Blindness (being oblivious to the non-differentiable decoding process) and Cost-Blindness (prioritizing structural imitation over solution quality). We empirically demonstrate that these intrinsic flaws impose a hard performance ceiling. To overcome this limitation, we propose CADO (Cost-Aware Diffusion models for Optimization), a streamlined Reinforcement Learning fine-tuning framework that formulates the diffusion denoising process as an MDP to directly optimize the post-decoded solution cost. We introduce Label-Centered Reward, which repurposes ground-truth labels as unbiased baselines rather than imitation targets, and Hybrid Fine-Tuning for parameter-efficient adaptation. CADO achieves state-of-the-art performance across diverse benchmarks, validating that objective alignment is essential for unlocking the full potential of heatmap-based solvers.

CADO: From Imitation to Cost Minimization for Heatmap-based Solvers in Combinatorial Optimization

TL;DR

This paper identifies a fundamental objective mismatch in heatmap-based combinatorial optimization solvers trained with supervised imitation: minimizing imitation loss does not guarantee minimal post-decoded cost. It introduces CADO, a cost-aware diffusion-model fine-tuning framework that formulates the denoising process as an MDP and directly optimizes the post-decoded objective via RL, using a novel Label-Centered Reward to leverage ground-truth baselines. The authors show that addressing Decoder-Blindness and Cost-Blindness yields significant performance gains across TSP and MIS benchmarks, including TSPLIB, and demonstrate robustness to suboptimal training data. The work also contributes a practical Hybrid Fine-Tuning strategy with LoRA and selective layer retraining, establishing a model-agnostic approach that improves a range of heatmap-based solvers and highlights the importance of objective alignment for scalable, cost-effective CO solvers.

Abstract

Heatmap-based solvers have emerged as a promising paradigm for Combinatorial Optimization (CO). However, we argue that the dominant Supervised Learning (SL) training paradigm suffers from a fundamental objective mismatch: minimizing imitation loss (e.g., cross-entropy) does not guarantee solution cost minimization. We dissect this mismatch into two deficiencies: Decoder-Blindness (being oblivious to the non-differentiable decoding process) and Cost-Blindness (prioritizing structural imitation over solution quality). We empirically demonstrate that these intrinsic flaws impose a hard performance ceiling. To overcome this limitation, we propose CADO (Cost-Aware Diffusion models for Optimization), a streamlined Reinforcement Learning fine-tuning framework that formulates the diffusion denoising process as an MDP to directly optimize the post-decoded solution cost. We introduce Label-Centered Reward, which repurposes ground-truth labels as unbiased baselines rather than imitation targets, and Hybrid Fine-Tuning for parameter-efficient adaptation. CADO achieves state-of-the-art performance across diverse benchmarks, validating that objective alignment is essential for unlocking the full potential of heatmap-based solvers.
Paper Structure (51 sections, 15 equations, 4 figures, 16 tables)

This paper contains 51 sections, 15 equations, 4 figures, 16 tables.

Figures (4)

  • Figure 1: Scatter plots and correlation analysis for H1 and H2. (a) Surrogate loss ${\mathcal{L}}_{SL}$ vs. Hamming distance $d_{\mathrm{H}}$ (edge disagreements). (b) Hamming distance $d_{\mathrm{H}}$ vs. Drop (% cost gap to $\bm{x}^\star$).
  • Figure 2: The denoising process formulated as an MDP with initial noise $\mathbf{x}_T \sim \text{Bern}(\boldsymbol{p}=0.5^{N})$.
  • Figure 3: Learning curve of CADO-L for MIS-SAT. The average of 4 independent runs.
  • Figure 4: Learning curve of various fine-tuning methods. The result is the average of 4 independent runs.