COIN: Chance-Constrained Imitation Learning for Uncertainty-aware Adaptive Resource Oversubscription Policy

Lu Wang; Mayukh Das; Fangkai Yang; Chao Duo; Bo Qiao; Hang Dong; Si Qin; Chetan Bansal; Qingwei Lin; Saravan Rajmohan; Dongmei Zhang; Qi Zhang

COIN: Chance-Constrained Imitation Learning for Uncertainty-aware Adaptive Resource Oversubscription Policy

Lu Wang, Mayukh Das, Fangkai Yang, Chao Duo, Bo Qiao, Hang Dong, Si Qin, Chetan Bansal, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

TL;DR

Coin tackles uncertainty in telemetry by introducing chance-constrained imitation learning to learn oversubscription policies that balance resource efficiency and congestion risk. It transforms the stochastic constraint into a deterministic form under Gaussian assumptions, uses a backward value function for satisfiability estimation, and employs ensemble value learning to capture variance in cost values. A safety-layer policy update projects actions to satisfy the constraint, while the policy is trained via imitation loss; experiments across cloud and airline domains show approximately 3-4× improvements in efficiency and safety over baselines. The approach yields robust, offline-learned policies that are practical for real systems, enabling adaptive oversubscription with probabilistic safety guarantees.

Abstract

We address the challenge of learning safe and robust decision policies in presence of uncertainty in context of the real scientific problem of adaptive resource oversubscription to enhance resource efficiency while ensuring safety against resource congestion risk. Traditional supervised prediction or forecasting models are ineffective in learning adaptive policies whereas standard online optimization or reinforcement learning is difficult to deploy on real systems. Offline methods such as imitation learning (IL) are ideal since we can directly leverage historical resource usage telemetry. But, the underlying aleatoric uncertainty in such telemetry is a critical bottleneck. We solve this with our proposed novel chance-constrained imitation learning framework, which ensures implicit safety against uncertainty in a principled manner via a combination of stochastic (chance) constraints on resource congestion risk and ensemble value functions. This leads to substantial ($\approx 3-4\times$) improvement in resource efficiency and safety in many oversubscription scenarios, including resource management in cloud services.

COIN: Chance-Constrained Imitation Learning for Uncertainty-aware Adaptive Resource Oversubscription Policy

TL;DR

Abstract

) improvement in resource efficiency and safety in many oversubscription scenarios, including resource management in cloud services.

Paper Structure (24 sections, 3 theorems, 14 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 24 sections, 3 theorems, 14 equations, 4 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Method
Problem Setting
Uncertain Telemetry.
Imitation Learning from uncertain trajectories.
Problem Formulation and Solution Overview
Satisfying the chance constraint
Transforming to deterministic constraint
Backward value function for satisfiability estimation.
Practical Implementation Design
Chance constrained optimization via policy gradient
Ensemble Learning to estimate the variance of cost value
Algorithm
Experiments
...and 9 more sections

Key Result

Lemma 1

A feasible solution to the deterministic constraint in Equation eq: determisnisticConstraint is always a feasible solution to the original chance constraint in Equation eq: imcc. (Proof in Appendix A.2)

Figures (4)

Figure 1: CPU utilization of a sampled subscriber (left), where the variance band represents $25^{th}$ and $75^{th}$ percentiles. Four points (marked as red crosses) are sampled to give their probability density function on the right.
Figure 2: Solution Overview of $\texttt{Coin}$
Figure 3: Convergence curves for vCPU Oversubscription
Figure 4: Convergence curves for Airline Overbooking

Theorems & Definitions (3)

Lemma 1: Deterministic $\approx$ Chance constraint
Lemma 2: Backward value $\thicksim$ Forward Markov chain
Lemma 3: Stochasticity $\models$ aleatoric uncertainty

COIN: Chance-Constrained Imitation Learning for Uncertainty-aware Adaptive Resource Oversubscription Policy

TL;DR

Abstract

COIN: Chance-Constrained Imitation Learning for Uncertainty-aware Adaptive Resource Oversubscription Policy

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (3)