A threshold for online balancing of sparse i.i.d. vectors
Dylan J. Altschuler, Konstantin Tikhomirov
TL;DR
This work analyzes online vector balancing for iid $d$-sparse binary vectors with horizon $T=Θ(n)$, introducing a mean-field model $A\sim\mathcal{M}_{n,T,d}$ and a prefix-discrepancy framework. It proves a matching pair of results: any online algorithm incurs at least $Ω(\log\log n)$ discrepancy (for a broad range of $d$) while an explicit, near-linear-time online algorithm achieves $O(\log\log n)$ prefix discrepancy with high probability, revealing a sharp gap between online and offline Beck–Fiala in the average-case setting. Notably, in the admissible sparsity range, the optimal online discrepancy is independent of $d$ and the column norms, highlighting a threshold phenomenon. The analysis leverages a spread-based lower bound and a delicate exceptional-row control in the online setting, with extensions to broader time horizons via padding and concatenation.
Abstract
Consider the task of \textit{online} vector balancing for stochastic arrivals $(X_i)_{i \in [T]}$, where the time horizon satisfies $T = Θ(n)$, and the $X_i$ are i.i.d uniform $d$--sparse $n$--dimensional binary vectors, with $2\leq d \le (\log\log n)^2/\log\log\log n$. We show that for this range of parameters, every online algorithm incurs discrepancy at least $Ω(\log \log n)$, and there is an efficient algorithm which achieves a matching discrepancy bound of $O(\log\log n)$ w.h.p. This establishes an asymptotic gap, both existential and algorithmic, between the online and offline versions of the average--case Beck--Fiala problem. Strikingly, the optimal online discrepancy in the considered setting is order $\log \log n$, independent of $d$ and the norms of the vectors $(X_i)_i$. Our assumptions on $d$ are nearly optimal, as this independence ceases when $d=ω((\log\log n)^2)$.
