Table of Contents
Fetching ...

Sequential Stochastic Combinatorial Optimization Using Hierarchal Reinforcement Learning

Xinsong Feng, Zihan Yu, Yanhai Xiong, Haipeng Chen

TL;DR

The paper tackles sequential stochastic combinatorial optimization (SSCO) by proposing WS-option, a hierarchical RL framework with two interdependent MDPs that jointly allocate budgets over time and select node sets. It introduces wake-sleep training and layer-wise method selection to stabilize learning and ensure convergence, supported by theoretical convergence results. Empirical results on adaptive influence maximization and route planning show superior performance and strong generalization to larger graphs and real-world data. The approach offers a scalable, generalizable solution to bi-level optimization in SSCO and has practical implications for adaptive resource allocation in networks and planning problems.

Abstract

Reinforcement learning (RL) has emerged as a promising tool for combinatorial optimization (CO) problems due to its ability to learn fast, effective, and generalizable solutions. Nonetheless, existing works mostly focus on one-shot deterministic CO, while sequential stochastic CO (SSCO) has rarely been studied despite its broad applications such as adaptive influence maximization (IM) and infectious disease intervention. In this paper, we study the SSCO problem where we first decide the budget (e.g., number of seed nodes in adaptive IM) allocation for all time steps, and then select a set of nodes for each time step. The few existing studies on SSCO simplify the problems by assuming a uniformly distributed budget allocation over the time horizon, yielding suboptimal solutions. We propose a generic hierarchical RL (HRL) framework called wake-sleep option (WS-option), a two-layer option-based framework that simultaneously decides adaptive budget allocation on the higher layer and node selection on the lower layer. WS-option starts with a coherent formulation of the two-layer Markov decision processes (MDPs), capturing the interdependencies between the two layers of decisions. Building on this, WS-option employs several innovative designs to balance the model's training stability and computational efficiency, preventing the vicious cyclic interference issue between the two layers. Empirical results show that WS-option exhibits significantly improved effectiveness and generalizability compared to traditional methods. Moreover, the learned model can be generalized to larger graphs, which significantly reduces the overhead of computational resources.

Sequential Stochastic Combinatorial Optimization Using Hierarchal Reinforcement Learning

TL;DR

The paper tackles sequential stochastic combinatorial optimization (SSCO) by proposing WS-option, a hierarchical RL framework with two interdependent MDPs that jointly allocate budgets over time and select node sets. It introduces wake-sleep training and layer-wise method selection to stabilize learning and ensure convergence, supported by theoretical convergence results. Empirical results on adaptive influence maximization and route planning show superior performance and strong generalization to larger graphs and real-world data. The approach offers a scalable, generalizable solution to bi-level optimization in SSCO and has practical implications for adaptive resource allocation in networks and planning problems.

Abstract

Reinforcement learning (RL) has emerged as a promising tool for combinatorial optimization (CO) problems due to its ability to learn fast, effective, and generalizable solutions. Nonetheless, existing works mostly focus on one-shot deterministic CO, while sequential stochastic CO (SSCO) has rarely been studied despite its broad applications such as adaptive influence maximization (IM) and infectious disease intervention. In this paper, we study the SSCO problem where we first decide the budget (e.g., number of seed nodes in adaptive IM) allocation for all time steps, and then select a set of nodes for each time step. The few existing studies on SSCO simplify the problems by assuming a uniformly distributed budget allocation over the time horizon, yielding suboptimal solutions. We propose a generic hierarchical RL (HRL) framework called wake-sleep option (WS-option), a two-layer option-based framework that simultaneously decides adaptive budget allocation on the higher layer and node selection on the lower layer. WS-option starts with a coherent formulation of the two-layer Markov decision processes (MDPs), capturing the interdependencies between the two layers of decisions. Building on this, WS-option employs several innovative designs to balance the model's training stability and computational efficiency, preventing the vicious cyclic interference issue between the two layers. Empirical results show that WS-option exhibits significantly improved effectiveness and generalizability compared to traditional methods. Moreover, the learned model can be generalized to larger graphs, which significantly reduces the overhead of computational resources.

Paper Structure

This paper contains 47 sections, 4 theorems, 15 equations, 8 figures, 10 tables, 3 algorithms.

Key Result

Theorem 1

(Intra-option policy convergence). In our WS-option framework, given any Markov transition $(s_\tau, o_\tau, a_\tau, r_\tau, s_{\tau+1}, o_{\tau+1})$, the Q-value function $q^{II}(s_\tau, o_\tau, a_\tau)$ converges to the optimal Q-value function $q^{II}_*(s_\tau, o_\tau, a_\tau)$ with probability 1

Figures (8)

  • Figure 1: Hierarchical MDPs for SSCO
  • Figure 2: Q-values learned for the AIM problem $(T=10, K=20)$.
  • Figure 3: Wake-sleep training procedure
  • Figure 4: Network architecture
  • Figure 5: Cumulative reward during training for AIM $T=10, K=20$
  • ...and 3 more figures

Theorems & Definitions (6)

  • Theorem 1
  • Theorem 2
  • Theorem
  • proof
  • Theorem
  • proof