Product distribution learning with imperfect advice
Arnab Bhattacharyya, Davin Choo, Philips George John, Themis Gouleakis
TL;DR
The paper addresses learning product distributions on $\{0,1\}^d$ with imperfect advisory information. By combining a tolerant mean tester, a block-based analysis, and a constrained mean-estimation (LASSO) step, the authors design a polynomial-time algorithm that achieves $\mathrm{d_{TV}}(P,\widehat{P})\le \varepsilon$ with sublinear in $d$ samples, provided $\|\mathbf{p}-\mathbf{q}\|_1$ is sufficiently small and the distribution is $\tau$-balanced. The main result gives a sample bound of $\tilde{O}\bigl( \frac{d}{\varepsilon^2} \bigl( d^{-\eta} + \min\{1, \frac{\|\mathbf{p}-\mathbf{q}\|_1^2}{d^{1-4\eta}\varepsilon^2}\} \bigr) \bigr)$, highlighting that improved efficiency is achievable when the advice is accurate, while maintaining robustness if the advice is poor. The work also establishes lower bounds showing necessity of the balancedness assumption and limits on sublinear-sample learning when advice is inadequate or the distribution is unbalanced. Overall, this work advances learning with predictions in the discrete, high-dimensional setting and suggests avenues for extending the framework to other complex models.
Abstract
Given i.i.d.~samples from an unknown distribution $P$, the goal of distribution learning is to recover the parameters of a distribution that is close to $P$. When $P$ belongs to the class of product distributions on the Boolean hypercube $\{0,1\}^d$, it is known that $Ω(d/\varepsilon^2)$ samples are necessary to learn $P$ within total variation (TV) distance $\varepsilon$. We revisit this problem when the learner is also given as advice the parameters of a product distribution $Q$. We show that there is an efficient algorithm to learn $P$ within TV distance $\varepsilon$ that has sample complexity $\tilde{O}(d^{1-η}/\varepsilon^2)$, if $\|\mathbf{p} - \mathbf{q}\|_1 < \varepsilon d^{0.5 - Ω(η)}$. Here, $\mathbf{p}$ and $\mathbf{q}$ are the mean vectors of $P$ and $Q$ respectively, and no bound on $\|\mathbf{p} - \mathbf{q}\|_1$ is known to the algorithm a priori.
