Table of Contents
Fetching ...

Graph Reinforcement Learning for Combinatorial Optimization: A Survey and Unifying Perspective

Victor-Alexandru Darvariu, Stephen Hailes, Mirco Musolesi

TL;DR

The paper presents Graph Reinforcement Learning as a unifying paradigm for solving combinatorial optimization problems on graphs, focusing on non-canonical problems where traditional algorithms are lacking. It distinguishes between Graph Structure Optimization (designing or modifying graph topology) and Graph Process Optimization (optimizing outcomes on fixed graphs), and surveys a wide range of RL methods (including DQN, PPO, A3C, MCTS) alongside graph neural networks (MPNN, GCN, GAT) used as function approximators. Key contributions include a taxonomy of problem families, a synthesis of algorithmic approaches, and a discussion of practical challenges such as framing problems as $MDP$s, reward design, scalability, generalization, and interpretability. Overall, the survey argues that Graph RL offers a flexible, constructive, data-driven toolkit that can complement and, in some cases, outperform traditional heuristics and exact solvers on complex, non-canonical graph optimization tasks, while also highlighting open research directions and practical considerations for deployment.

Abstract

Graphs are a natural representation for systems based on relations between connected entities. Combinatorial optimization problems, which arise when considering an objective function related to a process of interest on discrete structures, are often challenging due to the rapid growth of the solution space. The trial-and-error paradigm of Reinforcement Learning has recently emerged as a promising alternative to traditional methods, such as exact algorithms and (meta)heuristics, for discovering better decision-making strategies in a variety of disciplines including chemistry, computer science, and statistics. Despite the fact that they arose in markedly different fields, these techniques share significant commonalities. Therefore, we set out to synthesize this work in a unifying perspective that we term Graph Reinforcement Learning, interpreting it as a constructive decision-making method for graph problems. After covering the relevant technical background, we review works along the dividing line of whether the goal is to optimize graph structure given a process of interest, or to optimize the outcome of the process itself under fixed graph structure. Finally, we discuss the common challenges facing the field and open research questions. In contrast with other surveys, the present work focuses on non-canonical graph problems for which performant algorithms are typically not known and Reinforcement Learning is able to provide efficient and effective solutions.

Graph Reinforcement Learning for Combinatorial Optimization: A Survey and Unifying Perspective

TL;DR

The paper presents Graph Reinforcement Learning as a unifying paradigm for solving combinatorial optimization problems on graphs, focusing on non-canonical problems where traditional algorithms are lacking. It distinguishes between Graph Structure Optimization (designing or modifying graph topology) and Graph Process Optimization (optimizing outcomes on fixed graphs), and surveys a wide range of RL methods (including DQN, PPO, A3C, MCTS) alongside graph neural networks (MPNN, GCN, GAT) used as function approximators. Key contributions include a taxonomy of problem families, a synthesis of algorithmic approaches, and a discussion of practical challenges such as framing problems as s, reward design, scalability, generalization, and interpretability. Overall, the survey argues that Graph RL offers a flexible, constructive, data-driven toolkit that can complement and, in some cases, outperform traditional heuristics and exact solvers on complex, non-canonical graph optimization tasks, while also highlighting open research directions and practical considerations for deployment.

Abstract

Graphs are a natural representation for systems based on relations between connected entities. Combinatorial optimization problems, which arise when considering an objective function related to a process of interest on discrete structures, are often challenging due to the rapid growth of the solution space. The trial-and-error paradigm of Reinforcement Learning has recently emerged as a promising alternative to traditional methods, such as exact algorithms and (meta)heuristics, for discovering better decision-making strategies in a variety of disciplines including chemistry, computer science, and statistics. Despite the fact that they arose in markedly different fields, these techniques share significant commonalities. Therefore, we set out to synthesize this work in a unifying perspective that we term Graph Reinforcement Learning, interpreting it as a constructive decision-making method for graph problems. After covering the relevant technical background, we review works along the dividing line of whether the goal is to optimize graph structure given a process of interest, or to optimize the outcome of the process itself under fixed graph structure. Finally, we discuss the common challenges facing the field and open research questions. In contrast with other surveys, the present work focuses on non-canonical graph problems for which performant algorithms are typically not known and Reinforcement Learning is able to provide efficient and effective solutions.
Paper Structure (49 sections, 9 equations, 8 figures)

This paper contains 49 sections, 9 equations, 8 figures.

Figures (8)

  • Figure 1: Visual summary of the structure and topics of the present survey. $\mathcal{G}$ and $\mathcal{K}$ denote the sets of possible graph structures and graph control actions, respectively; $\mathcal{F}$ is a real-valued objective function that serves as the optimization target. The goal for Structure Optimization is to find the optimal structure $G$, while Process Optimization involves finding a set of optimal control actions $\kappa$.
  • Figure 2: Illustration of the neighborhood aggregation principle. To determine the features of a node, those of its neighbours are aggregated using learnable parametrizations, to which an activation function is applied. While the particulars depend on the architecture, many deep embedding methods follow this blueprint.
  • Figure 3: High-level illustration of how Graph Structure Optimization problems are approached with RL. The MDP starts from a (possibly empty) initial graph $G$ with an objective function value $\mathcal{F}(G)$. The topology of the graph is modified incrementally (e.g., via edge additions and removals) until a termination condition, such as the exhaustion of a modification budget is met. The goal is for the objective function value $\mathcal{F}(G^*)$ of the resulting graph $G^*$ to be maximally increased relative to the starting point.
  • Figure 4: Illustration of several action space designs for Graph Structure Optimization. Edge addition, removal, and rewiring can be formulated as the selection of a single node per timestep, yielding an $\mathcal{O}(|V|)$ action space. The topological changes are encapsulated in the definition of the transition function. Constraints are commonly used to exclude invalid actions (e.g., the actions corresponding to the addition of an edge that is already part of the graph will be forbidden).
  • Figure 5: High-level summary of works that use Graph Reinforcement Learning for structure optimization (Section \ref{['sec:structure']}).
  • ...and 3 more figures