Learning to Cover: Online Learning and Optimization with Irreversible Decisions
Alexandre Jacquillat, Michael Lingzhi Li
TL;DR
This paper introduces a novel online learning and optimization framework where a decision-maker opens facilities over a finite horizon to achieve a large, probabilistic coverage target, with facility success being uncertain and decisions irreversible. It develops a statistical learning foundation showing the online classifier converges to the Bayes-optimal classifier at rate O(1/√n) under mild margin conditions, and then builds an asymptotically optimal, implementable algorithm that achieves sub-linear regret as m grows, with explicit rates depending on learning rate r and irreducible error 1−p. The authors extend the core results to a networked bipartite facility-customer graph, deriving analogous sub-linear regret bounds and providing both exact and heuristic solution methods, along with concentration arguments for dependent coverage events. From a managerial perspective, the results advocate for limited pilot online learning before widespread expansion, offering fast convergence to near-optimal decisions and robustness to offline data and model misspecification. Overall, the work blends statistical learning with online optimization to deliver a principled approach for planning under learning uncertainty in large-scale, irreversible facility-location problems.
Abstract
We define an online learning and optimization problem with discrete and irreversible decisions contributing toward a coverage target. In each period, a decision-maker selects facilities to open, receives information on the success of each one, and updates a classification model to guide future decisions. The goal is to minimize facility openings under a chance constraint reflecting the coverage target, in an asymptotic regime characterized by a large target number of facilities $m\to\infty$ but a finite horizon $T \in \mathcal{Z}_+$. We prove that, under statistical conditions, the online classifier converges to the Bayes-optimal classifier at a rate of at best $\mathcal{O}(1/\sqrt n)$. Thus, we formulate our online learning and optimization problem, with a generalized learning rate $r>0$ and a residual error $1-p$. We derive an asymptotically optimal algorithm and an asymptotically tight lower bound. The regret grows in $Θ\left(m^{\frac{1-r}{1-r^T}}\right)$ if $p=1$ (perfect learning) or in $Θ\left(\max\left\{m^{\frac{1-r}{1-r^T}},\sqrt{m}\right\}\right)$ otherwise; in particular, the regret rate is sub-linear and converges exponentially fast to its infinite-horizon limit. We extend this result to a more complicated facility location setting in a bipartite facility-customer graph with a target on customer coverage. Throughout, constructive proofs identify a policy featuring limited exploration initially and fast exploitation later on once uncertainty gets mitigated. These results uncover the benefits of limited online learning and optimization through pilot programs prior to full-fledged expansion.
