Table of Contents
Fetching ...

Learning to Allocate Resources with Censored Feedback

Giovanni Montanari, Côme Fiegel, Corentin Pla, Aadirupa Saha, Vianney Perchet

TL;DR

The paper addresses online resource allocation across $K$ arms under censored feedback, where a reward requires both arm activation with probability $p_i$ and budget surpassing a latent threshold $X_{t,i} \sim G(\lambda_i)$. It introduces RA-UCB, an optimistic, batched-estimation algorithm that decouples reward collection from parameter estimation, achieving $\tilde{O}(\sqrt{T})$ regret (and poly-log improvements under stronger assumptions) in the known-budget setting, and proves a fundamental $\Omega(T^{1/3})$ lower bound. To handle unknown per-round budgets, the paper extends to MG-UCB, which uses within-round switching and a water-filling procedure, preserving the same regret guarantees. The approach is validated on real-world datasets (EdNet and Criteo-derived benchmarks), demonstrating practical effectiveness for online advertising and adaptive education scenarios. Overall, the work advances theoretical understanding of censored-threshold bandits and delivers practical, near-optimal algorithms for online resource allocation with incomplete feedback.

Abstract

We study the online resource allocation problem in which at each round, a budget $B$ must be allocated across $K$ arms under censored feedback. An arm yields a reward if and only if two conditions are satisfied: (i) the arm is activated according to an arm-specific Bernoulli random variable with unknown parameter, and (ii) the allocated budget exceeds a random threshold drawn from a parametric distribution with unknown parameter. Over $T$ rounds, the learner must jointly estimate the unknown parameters and allocate the budget so as to maximize cumulative reward facing the exploration--exploitation trade-off. We prove an information-theoretic regret lower bound $Ω(T^{1/3})$, demonstrating the intrinsic difficulty of the problem. We then propose RA-UCB, an optimistic algorithm that leverages non-trivial parameter estimation and confidence bounds. When the budget $B$ is known at the beginning of each round, RA-UCB achieves a regret of order $\widetilde{\mathcal{O}}(\sqrt{T})$, and even $\mathcal{O}(\mathrm{poly}\text{-}\log T)$ under stronger assumptions. As for unknown, round dependent budget, we introduce MG-UCB, which allows within-round switching and infinitesimal allocations, and matches the regret guarantees of RA-UCB. We then validate our theoretical results through experiments on real-world datasets.

Learning to Allocate Resources with Censored Feedback

TL;DR

The paper addresses online resource allocation across arms under censored feedback, where a reward requires both arm activation with probability and budget surpassing a latent threshold . It introduces RA-UCB, an optimistic, batched-estimation algorithm that decouples reward collection from parameter estimation, achieving regret (and poly-log improvements under stronger assumptions) in the known-budget setting, and proves a fundamental lower bound. To handle unknown per-round budgets, the paper extends to MG-UCB, which uses within-round switching and a water-filling procedure, preserving the same regret guarantees. The approach is validated on real-world datasets (EdNet and Criteo-derived benchmarks), demonstrating practical effectiveness for online advertising and adaptive education scenarios. Overall, the work advances theoretical understanding of censored-threshold bandits and delivers practical, near-optimal algorithms for online resource allocation with incomplete feedback.

Abstract

We study the online resource allocation problem in which at each round, a budget must be allocated across arms under censored feedback. An arm yields a reward if and only if two conditions are satisfied: (i) the arm is activated according to an arm-specific Bernoulli random variable with unknown parameter, and (ii) the allocated budget exceeds a random threshold drawn from a parametric distribution with unknown parameter. Over rounds, the learner must jointly estimate the unknown parameters and allocate the budget so as to maximize cumulative reward facing the exploration--exploitation trade-off. We prove an information-theoretic regret lower bound , demonstrating the intrinsic difficulty of the problem. We then propose RA-UCB, an optimistic algorithm that leverages non-trivial parameter estimation and confidence bounds. When the budget is known at the beginning of each round, RA-UCB achieves a regret of order , and even under stronger assumptions. As for unknown, round dependent budget, we introduce MG-UCB, which allows within-round switching and infinitesimal allocations, and matches the regret guarantees of RA-UCB. We then validate our theoretical results through experiments on real-world datasets.
Paper Structure (49 sections, 18 theorems, 296 equations, 7 figures, 2 tables, 4 algorithms)

This paper contains 49 sections, 18 theorems, 296 equations, 7 figures, 2 tables, 4 algorithms.

Key Result

Theorem 3.1

Assume that $K\leq T$. Then, no algorithm guarantees for any parameter an expected regret of

Figures (7)

  • Figure 1: EdNet-KT3 quiz benchmark. Confidence intervals are computed over $5$ independent runs using $5$ batches of $1{,}000$ different users. $B=700s, K=20, T=1000$
  • Figure 2: Log-scale comparison of empirical and theoretical regret bounds for RA-UCB.
  • Figure 3: Comparison between RA-UCB (blue) and an Explore-Then-Commit baseline (RA-ETC) for $T=10{,}000$.
  • Figure 4: Comparison between RA-UCB (blue) and a naive baseline without confidence bounds (NO UCB).
  • Figure 5: Response-time distribution analysis for a representative EdNet-KT3 question (q4135). Empirical PDF (left), CDF (middle), and Q–Q plot (right) with fitted truncated Weibull model. Response times are fully observed, enabling direct validation of the Weibull threshold assumption.
  • ...and 2 more figures

Theorems & Definitions (31)

  • Theorem 3.1
  • Lemma 4.2
  • Lemma 4.3
  • Lemma 4.6
  • Remark 4.7
  • Theorem 4.8
  • Theorem 4.11
  • Lemma 5.2
  • Remark 5.3: Implementability
  • Lemma B.1
  • ...and 21 more