Transform then Explore: a Simple and Effective Technique for Exploratory Combinatorial Optimization with Reinforcement Learning

Tianle Pu; Changjun Fan; Mutian Shen; Yizhou Lu; Li Zeng; Zohar Nussinov; Chao Chen; Zhong Liu

Transform then Explore: a Simple and Effective Technique for Exploratory Combinatorial Optimization with Reinforcement Learning

Tianle Pu, Changjun Fan, Mutian Shen, Yizhou Lu, Li Zeng, Zohar Nussinov, Chao Chen, Zhong Liu

TL;DR

This work targets the exploration bottleneck of reinforcement-learning approaches to combinatorial optimization, notably Max-Cut, by introducing Gauge Transformation (GT). GT creates equivalent problem representations that preserve the objective while enabling repeated, test-time exploration without retraining or architectural changes. Empirical results show GT-equipped RL methods achieve state-of-the-art performance and strong generalization across graph types and sizes, with statistically significant improvements over baselines. The approach is lightweight, model-agnostic, and easily integrated into existing RL pipelines, offering a practical boost for solving COPs in real-world graphs.

Abstract

Many complex problems encountered in both production and daily life can be conceptualized as combinatorial optimization problems (COPs) over graphs. Recent years, reinforcement learning (RL) based models have emerged as a promising direction, which treat the COPs solving as a heuristic learning problem. However, current finite-horizon-MDP based RL models have inherent limitations. They are not allowed to explore adquately for improving solutions at test time, which may be necessary given the complexity of NP-hard optimization tasks. Some recent attempts solve this issue by focusing on reward design and state feature engineering, which are tedious and ad-hoc. In this work, we instead propose a much simpler but more effective technique, named gauge transformation (GT). The technique is originated from physics, but is very effective in enabling RL agents to explore to continuously improve the solutions during test. Morever, GT is very simple, which can be implemented with less than 10 lines of Python codes, and can be applied to a vast majority of RL models. Experimentally, we show that traditional RL models with GT technique produce the state-of-the-art performances on the MaxCut problem. Furthermore, since GT is independent of any RL models, it can be seamlessly integrated into various RL frameworks, paving the way of these models for more effective explorations in the solving of general COPs.

Transform then Explore: a Simple and Effective Technique for Exploratory Combinatorial Optimization with Reinforcement Learning

TL;DR

Abstract

Paper Structure (38 sections, 10 equations, 4 figures, 10 tables)

This paper contains 38 sections, 10 equations, 4 figures, 10 tables.

Introduction
Related Work
Notations and Preliminaries
Max-Cut Problem
Q-learning
Gauge Transformation Technique
GT and GT foCHr graph-based COPs
Invariant property of GT.
Potential reasons behind GT's success
State resetting and multiple exploration.
GT draw near the optimal.
Experiments
Experimental Setup
Datasets.
Baseline methods.
...and 23 more sections

Figures (4)

Figure 1: The illustration of GT on a graph. White and black nodes denote $u\in U\equiv V \setminus T$ and $u\in T$ respectively. Solid and dashed lines correspond to $J(u,v)>0$ and $J(u,v)<0$. $s_us_v=\pm 1$ are represented by the black/red links respectively. After GT, all $s'_u=+1$ and the weights $J(u,v)$ change their signs respectively.
Figure 2: The illustration of GT on a graph. White and black nodes denote $u\in U\equiv V \setminus T$ and $u\in T$ respectively. Solid and dashed lines correspond to the weights of the edges are positive and negative.
Figure 3: Statistical significance test of the performance gains brought by GT over S2V-DQN, under different distributions. We perform statistical tests between two methods (S2V-DQN and S2V-DQN-GT), so as to demonstrate the achieved gains brought by GT over traditional RL models, like S2V-DQN are not marginal, but statistically significant. For each size, we have 50 instances and report the results in a standard box-plot, together with the $p$-values of the comparisons: S2V-DQN vs S2V-DQN-GT. Here $****$ denotes $p<0.0001$. Statistical test: Wilcoxon signed-ranked test.
Figure 5: GT iterations with the size of BA graphs(average degree 4) with edge wights of $\mathcal{N}(0,1)$

Transform then Explore: a Simple and Effective Technique for Exploratory Combinatorial Optimization with Reinforcement Learning

TL;DR

Abstract

Transform then Explore: a Simple and Effective Technique for Exploratory Combinatorial Optimization with Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)