Table of Contents
Fetching ...

Learning to Cover: Online Learning and Optimization with Irreversible Decisions

Alexandre Jacquillat, Michael Lingzhi Li

TL;DR

This paper introduces a novel online learning and optimization framework where a decision-maker opens facilities over a finite horizon to achieve a large, probabilistic coverage target, with facility success being uncertain and decisions irreversible. It develops a statistical learning foundation showing the online classifier converges to the Bayes-optimal classifier at rate O(1/√n) under mild margin conditions, and then builds an asymptotically optimal, implementable algorithm that achieves sub-linear regret as m grows, with explicit rates depending on learning rate r and irreducible error 1−p. The authors extend the core results to a networked bipartite facility-customer graph, deriving analogous sub-linear regret bounds and providing both exact and heuristic solution methods, along with concentration arguments for dependent coverage events. From a managerial perspective, the results advocate for limited pilot online learning before widespread expansion, offering fast convergence to near-optimal decisions and robustness to offline data and model misspecification. Overall, the work blends statistical learning with online optimization to deliver a principled approach for planning under learning uncertainty in large-scale, irreversible facility-location problems.

Abstract

We define an online learning and optimization problem with discrete and irreversible decisions contributing toward a coverage target. In each period, a decision-maker selects facilities to open, receives information on the success of each one, and updates a classification model to guide future decisions. The goal is to minimize facility openings under a chance constraint reflecting the coverage target, in an asymptotic regime characterized by a large target number of facilities $m\to\infty$ but a finite horizon $T \in \mathcal{Z}_+$. We prove that, under statistical conditions, the online classifier converges to the Bayes-optimal classifier at a rate of at best $\mathcal{O}(1/\sqrt n)$. Thus, we formulate our online learning and optimization problem, with a generalized learning rate $r>0$ and a residual error $1-p$. We derive an asymptotically optimal algorithm and an asymptotically tight lower bound. The regret grows in $Θ\left(m^{\frac{1-r}{1-r^T}}\right)$ if $p=1$ (perfect learning) or in $Θ\left(\max\left\{m^{\frac{1-r}{1-r^T}},\sqrt{m}\right\}\right)$ otherwise; in particular, the regret rate is sub-linear and converges exponentially fast to its infinite-horizon limit. We extend this result to a more complicated facility location setting in a bipartite facility-customer graph with a target on customer coverage. Throughout, constructive proofs identify a policy featuring limited exploration initially and fast exploitation later on once uncertainty gets mitigated. These results uncover the benefits of limited online learning and optimization through pilot programs prior to full-fledged expansion.

Learning to Cover: Online Learning and Optimization with Irreversible Decisions

TL;DR

This paper introduces a novel online learning and optimization framework where a decision-maker opens facilities over a finite horizon to achieve a large, probabilistic coverage target, with facility success being uncertain and decisions irreversible. It develops a statistical learning foundation showing the online classifier converges to the Bayes-optimal classifier at rate O(1/√n) under mild margin conditions, and then builds an asymptotically optimal, implementable algorithm that achieves sub-linear regret as m grows, with explicit rates depending on learning rate r and irreducible error 1−p. The authors extend the core results to a networked bipartite facility-customer graph, deriving analogous sub-linear regret bounds and providing both exact and heuristic solution methods, along with concentration arguments for dependent coverage events. From a managerial perspective, the results advocate for limited pilot online learning before widespread expansion, offering fast convergence to near-optimal decisions and robustness to offline data and model misspecification. Overall, the work blends statistical learning with online optimization to deliver a principled approach for planning under learning uncertainty in large-scale, irreversible facility-location problems.

Abstract

We define an online learning and optimization problem with discrete and irreversible decisions contributing toward a coverage target. In each period, a decision-maker selects facilities to open, receives information on the success of each one, and updates a classification model to guide future decisions. The goal is to minimize facility openings under a chance constraint reflecting the coverage target, in an asymptotic regime characterized by a large target number of facilities but a finite horizon . We prove that, under statistical conditions, the online classifier converges to the Bayes-optimal classifier at a rate of at best . Thus, we formulate our online learning and optimization problem, with a generalized learning rate and a residual error . We derive an asymptotically optimal algorithm and an asymptotically tight lower bound. The regret grows in if (perfect learning) or in otherwise; in particular, the regret rate is sub-linear and converges exponentially fast to its infinite-horizon limit. We extend this result to a more complicated facility location setting in a bipartite facility-customer graph with a target on customer coverage. Throughout, constructive proofs identify a policy featuring limited exploration initially and fast exploitation later on once uncertainty gets mitigated. These results uncover the benefits of limited online learning and optimization through pilot programs prior to full-fledged expansion.
Paper Structure (53 sections, 31 theorems, 211 equations, 10 figures, 2 tables, 6 algorithms)

This paper contains 53 sections, 31 theorems, 211 equations, 10 figures, 2 tables, 6 algorithms.

Key Result

Proposition 1

Assume that we estimate $\theta$ using the maximum likelihood estimator (MLE): and build the classifier $h_t(\boldsymbol{x}_{i}) = \mathbf{1}\left\{f_{\widehat{\theta}_t}(x_{i})>\frac{1}{2}\right\}$. Define the error rate $\mu_t = \mathbb{P}(S_{it}\neq 1 \mid i\in\mathcal{J}_t)$. Assume that $N_t \to \infty$ as $m \to \infty$ for all $t$. Under Assumptions ass:regime-ass:sampl

Figures (10)

  • Figure 1: Illustration of the evolution of the online classifier over time. Light/dark dots: facilities with positive/negative realizations; red dots: selected facilities.
  • Figure 2: Timeline of events. [White/black dots: facilities with positive/negative predictions; blue dots: facility opening attempts; green/red dots: successfully/unsuccessfully opened facilities.]
  • Figure 3: Impact of learning rate and time horizon on asymptotic regret rate ($p=1$).
  • Figure 4: Number of samples required to identify $m=n/10$ positive samples across datasets ($\delta=5\%$, $T=2$).
  • Figure 5: Comparison of static, semi-adaptive adjustment, and adaptive algorithms.
  • ...and 5 more figures

Theorems & Definitions (38)

  • Example 1: Clinical trials
  • Example 2: Retail stores
  • Example 3: Sustainable infrastructure
  • Proposition 1
  • Corollary 1: Learning environment
  • Corollary 2: Success Distribution
  • Theorem 1
  • Lemma 1: Asymptotic solution of Problem \ref{['prob:main_det']}
  • Remark 1
  • Lemma 2: Upper bound
  • ...and 28 more