Decision Support under Prediction-Induced Censoring

Yan Chen; Ruyi Huang; Cheng Liu

Decision Support under Prediction-Induced Censoring

Yan Chen, Ruyi Huang, Cheng Liu

TL;DR

An adaptive approach named PIC-Reinforcement Learning (PIC-RL), a closed-loop framework that transforms censoring from a data quality problem into a decision signal, and provides theoretical guarantees that the feedback design corrects the selection bias inherent in naive learning.

Abstract

In many data-driven online decision systems, actions determine not only operational costs but also the data availability for future learning -- a phenomenon termed Prediction-Induced Censoring (PIC). This challenge is particularly acute in large-scale resource allocation for generative AI (GenAI) serving: insufficient capacity triggers shortages but hides the true demand, leaving the system with only a "greater-than" constraint. Standard decision-making approaches that rely on uncensored data suffer from selection bias, often locking the system into a self-reinforcing low-provisioning trap. To break this loop, this paper proposes an adaptive approach named PIC-Reinforcement Learning (PIC-RL), a closed-loop framework that transforms censoring from a data quality problem into a decision signal. PIC-RL integrates (1) Uncertainty-Aware Demand Prediction to manage the information-cost trade-off, (2) Pessimistic Surrogate Inference to construct decision-aligned conservative feedback from shortage events, and (3) Dual-Timescale Adaptation to stabilize online learning against distribution drift. The analysis provides theoretical guarantees that the feedback design corrects the selection bias inherent in naive learning. Experiments on production Alibaba GenAI traces demonstrate that PIC-RL consistently outperforms state-of-the-art baselines, reducing service degradation by up to 50% while maintaining cost efficiency.

Decision Support under Prediction-Induced Censoring

TL;DR

Abstract

Paper Structure (35 sections, 2 theorems, 15 equations, 8 figures, 3 tables, 1 algorithm)

This paper contains 35 sections, 2 theorems, 15 equations, 8 figures, 3 tables, 1 algorithm.

Introduction
Problem Definition
Protocol: Offline-to-Online under Drift.
Methodology
Phase 1: Uncertainty-Aware Prediction
Phase 2: Offline Pretraining
Policy and Value Networks
State Design
Learnable Feedback Under Censoring
The Bias of Uncensored-Only Learning.
Surrogate Reward Construction (Pessimistic Surrogate Inference).
Offline RL Training
Phase 3: Online RL and Dual-Timescale Adaptation
Theorem 1 (Dual-Timescale Stability).
Experiments
...and 20 more sections

Key Result

Proposition 1

Consider a naive learner that fits demand using a mixture of (i) historical uncensored observations and (ii) current censored observations under threshold action $A_t$, with mixture weight $\rho \in (0,1]$ on the censored stream. Its effective target mean is $\mu_{\text{mix}}(A_t) = (1-\rho)\mathbb{

Figures (8)

Figure 1: The PIC-RL Framework. A three-phase architecture transforming censoring from a missing-label problem into a supervision signal: (1) Uncertainty-Aware Prediction, (2) Offline Pre-training, and (3) Online RL and Dual-Timescale Online Adaptation.
Figure 2: Verification of Proposition 1 (Instability). (a) Naive learning exhibits systematic negative bias. (b) Cumulative error confirms that the system inevitably drifts into a "censoring trap," even with historical data replay.
Figure 3: Mechanisms of Proposition 2. (a) Strict monotonicity of the surrogate gap ensures gradient consistency ($\partial r/\partial a > 0$). (b) The pessimism factor $\Psi(n)$ amplifies reward signals super-linearly to enable escape from censoring traps.
Figure 4: Phase 1 Training Dynamics. (a) Rapid NLL convergence confirms the model learns the full demand distribution. (b) Stable validation MAE demonstrates robust generalization without overfitting.
Figure 5: Phase 2 Offline Pretraining. (a) 79% reduction in value loss validates the critic's ability to learn from censored feedback. (b) The derived policy effectively anticipates demand spikes (red crosses indicate censored events).
...and 3 more figures

Theorems & Definitions (4)

Proposition 1: Instability under Mixture
proof
Proposition 2: Consistency and Escape of Surrogate Reward
proof

Decision Support under Prediction-Induced Censoring

TL;DR

Abstract

Decision Support under Prediction-Induced Censoring

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (4)