Learning to Cover: Online Learning and Optimization with Irreversible Decisions

Alexandre Jacquillat; Michael Lingzhi Li

Learning to Cover: Online Learning and Optimization with Irreversible Decisions

Alexandre Jacquillat, Michael Lingzhi Li

TL;DR

This paper introduces a novel online learning and optimization framework where a decision-maker opens facilities over a finite horizon to achieve a large, probabilistic coverage target, with facility success being uncertain and decisions irreversible. It develops a statistical learning foundation showing the online classifier converges to the Bayes-optimal classifier at rate O(1/√n) under mild margin conditions, and then builds an asymptotically optimal, implementable algorithm that achieves sub-linear regret as m grows, with explicit rates depending on learning rate r and irreducible error 1−p. The authors extend the core results to a networked bipartite facility-customer graph, deriving analogous sub-linear regret bounds and providing both exact and heuristic solution methods, along with concentration arguments for dependent coverage events. From a managerial perspective, the results advocate for limited pilot online learning before widespread expansion, offering fast convergence to near-optimal decisions and robustness to offline data and model misspecification. Overall, the work blends statistical learning with online optimization to deliver a principled approach for planning under learning uncertainty in large-scale, irreversible facility-location problems.

Abstract

We define an online learning and optimization problem with discrete and irreversible decisions contributing toward a coverage target. In each period, a decision-maker selects facilities to open, receives information on the success of each one, and updates a classification model to guide future decisions. The goal is to minimize facility openings under a chance constraint reflecting the coverage target, in an asymptotic regime characterized by a large target number of facilities $m\to\infty$ but a finite horizon $T \in \mathcal{Z}_+$. We prove that, under statistical conditions, the online classifier converges to the Bayes-optimal classifier at a rate of at best $\mathcal{O}(1/\sqrt n)$. Thus, we formulate our online learning and optimization problem, with a generalized learning rate $r>0$ and a residual error $1-p$. We derive an asymptotically optimal algorithm and an asymptotically tight lower bound. The regret grows in $Θ\left(m^{\frac{1-r}{1-r^T}}\right)$ if $p=1$ (perfect learning) or in $Θ\left(\max\left\{m^{\frac{1-r}{1-r^T}},\sqrt{m}\right\}\right)$ otherwise; in particular, the regret rate is sub-linear and converges exponentially fast to its infinite-horizon limit. We extend this result to a more complicated facility location setting in a bipartite facility-customer graph with a target on customer coverage. Throughout, constructive proofs identify a policy featuring limited exploration initially and fast exploitation later on once uncertainty gets mitigated. These results uncover the benefits of limited online learning and optimization through pilot programs prior to full-fledged expansion.

Learning to Cover: Online Learning and Optimization with Irreversible Decisions

TL;DR

Abstract

but a finite horizon

. We prove that, under statistical conditions, the online classifier converges to the Bayes-optimal classifier at a rate of at best

. Thus, we formulate our online learning and optimization problem, with a generalized learning rate

and a residual error

. We derive an asymptotically optimal algorithm and an asymptotically tight lower bound. The regret grows in

(perfect learning) or in

otherwise; in particular, the regret rate is sub-linear and converges exponentially fast to its infinite-horizon limit. We extend this result to a more complicated facility location setting in a bipartite facility-customer graph with a target on customer coverage. Throughout, constructive proofs identify a policy featuring limited exploration initially and fast exploitation later on once uncertainty gets mitigated. These results uncover the benefits of limited online learning and optimization through pilot programs prior to full-fledged expansion.

Paper Structure (53 sections, 31 theorems, 211 equations, 10 figures, 2 tables, 6 algorithms)

This paper contains 53 sections, 31 theorems, 211 equations, 10 figures, 2 tables, 6 algorithms.

Introduction
Literature Review
The online learning and optimization setting
The decision-making environment
The learning environment
A statistical learning setup.
Summary
The Core Model: Target on Facilities
Problem formulation
An asymptotically optimal algorithm with sub-linear regret
Implications, applications and extensions
Discussion of results and managerial implications
Asymptotically-tight bounds.
Sub-linear regret: benefits of online learning.
Exponential convergence: fast learning.
...and 38 more sections

Key Result

Proposition 1

Assume that we estimate $\theta$ using the maximum likelihood estimator (MLE): and build the classifier $h_t(\boldsymbol{x}_{i}) = \mathbf{1}\left\{f_{\widehat{\theta}_t}(x_{i})>\frac{1}{2}\right\}$. Define the error rate $\mu_t = \mathbb{P}(S_{it}\neq 1 \mid i\in\mathcal{J}_t)$. Assume that $N_t \to \infty$ as $m \to \infty$ for all $t$. Under Assumptions ass:regime-ass:sampl

Figures (10)

Figure 1: Illustration of the evolution of the online classifier over time. Light/dark dots: facilities with positive/negative realizations; red dots: selected facilities.
Figure 2: Timeline of events. [White/black dots: facilities with positive/negative predictions; blue dots: facility opening attempts; green/red dots: successfully/unsuccessfully opened facilities.]
Figure 3: Impact of learning rate and time horizon on asymptotic regret rate ($p=1$).
Figure 4: Number of samples required to identify $m=n/10$ positive samples across datasets ($\delta=5\%$, $T=2$).
Figure 5: Comparison of static, semi-adaptive adjustment, and adaptive algorithms.
...and 5 more figures

Theorems & Definitions (38)

Example 1: Clinical trials
Example 2: Retail stores
Example 3: Sustainable infrastructure
Proposition 1
Corollary 1: Learning environment
Corollary 2: Success Distribution
Theorem 1
Lemma 1: Asymptotic solution of Problem \ref{['prob:main_det']}
Remark 1
Lemma 2: Upper bound
...and 28 more

Learning to Cover: Online Learning and Optimization with Irreversible Decisions

TL;DR

Abstract

Learning to Cover: Online Learning and Optimization with Irreversible Decisions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (38)