Table of Contents
Fetching ...

Learning Combinatorial Optimization Algorithms over Graphs

Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, Le Song

TL;DR

Addresses automation of designing greedy heuristics for NP-hard graph optimization problems by learning a common Q-function over a graph-structured state. It combines Structure2Vec graph embeddings with n-step Q-learning to produce a greedy meta-algorithm applicable to MVC, MAXCUT, and TSP, generalizing to larger graphs. Experimental results show strong solution quality, favorable time–quality trade-offs, and discovery of nontrivial heuristics, including strategies that balance degree and connectivity. This framework promises scalable, data-driven algorithm design for recurring graph-structured optimization problems.

Abstract

The design of good heuristics or approximation algorithms for NP-hard combinatorial optimization problems often requires significant specialized knowledge and trial-and-error. Can we automate this challenging, tedious process, and learn the algorithms instead? In many real-world applications, it is typically the case that the same optimization problem is solved again and again on a regular basis, maintaining the same problem structure but differing in the data. This provides an opportunity for learning heuristic algorithms that exploit the structure of such recurring problems. In this paper, we propose a unique combination of reinforcement learning and graph embedding to address this challenge. The learned greedy policy behaves like a meta-algorithm that incrementally constructs a solution, and the action is determined by the output of a graph embedding network capturing the current state of the solution. We show that our framework can be applied to a diverse range of optimization problems over graphs, and learns effective algorithms for the Minimum Vertex Cover, Maximum Cut and Traveling Salesman problems.

Learning Combinatorial Optimization Algorithms over Graphs

TL;DR

Addresses automation of designing greedy heuristics for NP-hard graph optimization problems by learning a common Q-function over a graph-structured state. It combines Structure2Vec graph embeddings with n-step Q-learning to produce a greedy meta-algorithm applicable to MVC, MAXCUT, and TSP, generalizing to larger graphs. Experimental results show strong solution quality, favorable time–quality trade-offs, and discovery of nontrivial heuristics, including strategies that balance degree and connectivity. This framework promises scalable, data-driven algorithm design for recurring graph-structured optimization problems.

Abstract

The design of good heuristics or approximation algorithms for NP-hard combinatorial optimization problems often requires significant specialized knowledge and trial-and-error. Can we automate this challenging, tedious process, and learn the algorithms instead? In many real-world applications, it is typically the case that the same optimization problem is solved again and again on a regular basis, maintaining the same problem structure but differing in the data. This provides an opportunity for learning heuristic algorithms that exploit the structure of such recurring problems. In this paper, we propose a unique combination of reinforcement learning and graph embedding to address this challenge. The learned greedy policy behaves like a meta-algorithm that incrementally constructs a solution, and the action is determined by the output of a graph embedding network capturing the current state of the solution. We show that our framework can be applied to a diverse range of optimization problems over graphs, and learns effective algorithms for the Minimum Vertex Cover, Maximum Cut and Traveling Salesman problems.

Paper Structure

This paper contains 38 sections, 4 equations, 10 figures, 16 tables, 1 algorithm.

Figures (10)

  • Figure 1: Illustration of the proposed framework as applied to an instance of Minimum Vertex Cover. The middle part illustrates two iterations of the graph embedding, which results in node scores (green bars).
  • Figure 2: Approximation ratio on 1000 test graphs. Note that on MVC, our performance is pretty close to optimal. In this figure, training and testing graphs are generated according to the same distribution.
  • Figure 3: Time-approximation trade-off for MVC and MAXCUT. In this figure, each dot represents a solution found for a single problem instance, for 100 instances. For CPLEX, we also record the time and quality of each solution it finds, e.g. CPLEX-1st means the first feasible solution found by CPLEX.
  • Figure D.1: Approximation ratio on 1000 test graphs. Note that on MVC, our performance is pretty close to optimal. In this figure, training and testing graphs are generated according to the same distribution.
  • Figure D.2: S2V-DQN convergence measured by the held-out validation performance.
  • ...and 5 more figures