iScheduler: Reinforcement Learning-Driven Continual Optimization for Large-Scale Resource Investment Problems

Yi-Xiang Hu; Yuke Wang; Feng Wu; Zirui Huang; Shuli Zeng; Xiang-Yang Li

iScheduler: Reinforcement Learning-Driven Continual Optimization for Large-Scale Resource Investment Problems

Yi-Xiang Hu, Yuke Wang, Feng Wu, Zirui Huang, Shuli Zeng, Xiang-Yang Li

TL;DR

This work tackles large-scale Resource Investment Problems (RIP) where scheduling precedence-constrained tasks under shared renewable resources incurs provisioning costs. It reframes RIP solving as an iterative Markov Decision Process over decomposed subproblems, enabling RL-driven adaptive ordering of process scheduling and a learning-based mechanism to select among local solution options. Key contributions include the iScheduler framework, a process-level decomposition with a process interaction graph, a reconfiguration-aware training interface, and the L-RIPLIB industrial-scale benchmark. Empirical results show iScheduler achieves competitive resource costs while reducing time-to-feasibility by up to 43x compared to strong baselines, and effectively reused schedules under dynamic updates to lower reconfiguration latency. Overall, the approach demonstrates a scalable, learning-guided continual optimization paradigm for industrial RIP instances with practical impact in data centers and cloud platforms.

Abstract

Scheduling precedence-constrained tasks under shared renewable resources is central to modern computing platforms. The Resource Investment Problem (RIP) models this setting by minimizing the cost of provisioned renewable resources under precedence and timing constraints. Exact mixed-integer programming and constraint programming become impractically slow on large instances, and dynamic updates require schedule revisions under tight latency budgets. We present iScheduler, a reinforcement-learning-driven iterative scheduling framework that formulates RIP solving as a Markov decision process over decomposed subproblems and constructs schedules through sequential process selection. The framework accelerates optimization and supports reconfiguration by reusing unchanged process schedules and rescheduling only affected processes. We also release L-RIPLIB, an industrial-scale benchmark derived from cloud-platform workloads with 1,000 instances of 2,500-10,000 tasks. Experiments show that iScheduler attains competitive resource costs while reducing time to feasibility by up to 43$\times$ against strong commercial baselines.

iScheduler: Reinforcement Learning-Driven Continual Optimization for Large-Scale Resource Investment Problems

TL;DR

Abstract

against strong commercial baselines.

Paper Structure (24 sections, 11 equations, 7 figures, 6 tables, 2 algorithms)

This paper contains 24 sections, 11 equations, 7 figures, 6 tables, 2 algorithms.

Introduction
Related Work
Resource Investment Problem
Traditional RIP Benchmarks
Scalability Challenge
iScheduler Framework
Decomposition
Subproblem Construction
Why Scheduling Order Matters
MDP Formulation
State Feature Representation
Learning-Based Solution Selection
RL Training and Execution
Experimental Evaluation
Setup
...and 9 more sections

Figures (7)

Figure 1: Overview of the iScheduler framework. (1) The RIP is first represented as a task-level DAG ($G_T$), where nodes represent tasks and edges denote precedence constraints. Tasks are grouped into processes by taking the weakly connected components of $G_T$ (ignoring edge directions). A process-level graph $G_{PL}$ is then constructed over these processes, where an edge indicates overlapping feasible time windows (potential resource contention). (2) At each iteration, the iScheduler agent observes the current scheduling state and Resource Pool Usage ($\mathrm{RPU}=\{u_k(t)\}_{k\in \mathcal{R}}$), and selects an unscheduled process $\mathcal{P}_v$ from $G_{PL}$. (3) A subproblem $\mathrm{RIP}_v$ is constructed and solved to generate multiple candidate schedules for tasks in $T_{\mathcal{P}_v}$, from which one solution is selected and committed. (4) The committed start times $(S_i)_{i\in T_{\mathcal{P}_v}}$, $G_{PL}$, and RPU are updated accordingly. This iterative procedure continues until all processes in $G_{PL}$ have been scheduled, resulting in the final schedule. Double-bordered nodes indicate tasks or processes with changed parameters (i.e., reconfiguration requests), and red highlights the currently selected process or solution during scheduling.
Figure 2: Task structures of three processes in an RIP. Each block represents a task, annotated as $(d_i, e_i, l_i, r_{i,1})$, where $d_i$ is the duration, $e_i$ the earliest start time, $l_i$ the deadline, and $r_{i,1}$ the demand for resource 1. Directed arrows denote precedence constraints: a task must finish before any successor can start.
Figure 3: Effect of scheduling order and local solution selection on final performance. Each $\mathcal{P}_i$ denotes a process, and each task in a process is labeled as $\mathrm{P}_i\mathrm{T}_j$, where $i$ indexes the process and $j$ indexes the task within that process. Different scheduling orders and local schedule choices lead to different resource usage outcomes. Case 1: Scheduling $\mathcal{P}_1 \rightarrow \mathcal{P}_2 \rightarrow \mathcal{P}_3$ results in total resource usage $R_1 = 5$. Case 2: Using the same scheduling order but selecting an alternative local solution for $\mathcal{P}_2$ yields $R_1 = 4$. Case 3: Changing the scheduling order to $\mathcal{P}_2 \rightarrow \mathcal{P}_3 \rightarrow \mathcal{P}_1$ further reduces usage to $R_1 = 3$.
Figure 4: State transition: When one of the three considered action nodes (in green) is selected, it transitions to a new state and updates the associated features (in blue).
Figure 5: Runtime versus problem size (number of variables, log scale). Each point corresponds to one test instance.
...and 2 more figures

iScheduler: Reinforcement Learning-Driven Continual Optimization for Large-Scale Resource Investment Problems

TL;DR

Abstract

iScheduler: Reinforcement Learning-Driven Continual Optimization for Large-Scale Resource Investment Problems

Authors

TL;DR

Abstract

Table of Contents

Figures (7)