Product distribution learning with imperfect advice

Arnab Bhattacharyya; Davin Choo; Philips George John; Themis Gouleakis

Product distribution learning with imperfect advice

Arnab Bhattacharyya, Davin Choo, Philips George John, Themis Gouleakis

TL;DR

The paper addresses learning product distributions on $\{0,1\}^d$ with imperfect advisory information. By combining a tolerant mean tester, a block-based analysis, and a constrained mean-estimation (LASSO) step, the authors design a polynomial-time algorithm that achieves $\mathrm{d_{TV}}(P,\widehat{P})\le \varepsilon$ with sublinear in $d$ samples, provided $\|\mathbf{p}-\mathbf{q}\|_1$ is sufficiently small and the distribution is $\tau$-balanced. The main result gives a sample bound of $\tilde{O}\bigl( \frac{d}{\varepsilon^2} \bigl( d^{-\eta} + \min\{1, \frac{\|\mathbf{p}-\mathbf{q}\|_1^2}{d^{1-4\eta}\varepsilon^2}\} \bigr) \bigr)$, highlighting that improved efficiency is achievable when the advice is accurate, while maintaining robustness if the advice is poor. The work also establishes lower bounds showing necessity of the balancedness assumption and limits on sublinear-sample learning when advice is inadequate or the distribution is unbalanced. Overall, this work advances learning with predictions in the discrete, high-dimensional setting and suggests avenues for extending the framework to other complex models.

Abstract

Given i.i.d.~samples from an unknown distribution $P$, the goal of distribution learning is to recover the parameters of a distribution that is close to $P$. When $P$ belongs to the class of product distributions on the Boolean hypercube $\{0,1\}^d$, it is known that $Ω(d/\varepsilon^2)$ samples are necessary to learn $P$ within total variation (TV) distance $\varepsilon$. We revisit this problem when the learner is also given as advice the parameters of a product distribution $Q$. We show that there is an efficient algorithm to learn $P$ within TV distance $\varepsilon$ that has sample complexity $\tilde{O}(d^{1-η}/\varepsilon^2)$, if $\|\mathbf{p} - \mathbf{q}\|_1 < \varepsilon d^{0.5 - Ω(η)}$. Here, $\mathbf{p}$ and $\mathbf{q}$ are the mean vectors of $P$ and $Q$ respectively, and no bound on $\|\mathbf{p} - \mathbf{q}\|_1$ is known to the algorithm a priori.

Product distribution learning with imperfect advice

TL;DR

The paper addresses learning product distributions on

with imperfect advisory information. By combining a tolerant mean tester, a block-based analysis, and a constrained mean-estimation (LASSO) step, the authors design a polynomial-time algorithm that achieves

with sublinear in

samples, provided

is sufficiently small and the distribution is

-balanced. The main result gives a sample bound of

, highlighting that improved efficiency is achievable when the advice is accurate, while maintaining robustness if the advice is poor. The work also establishes lower bounds showing necessity of the balancedness assumption and limits on sublinear-sample learning when advice is inadequate or the distribution is unbalanced. Overall, this work advances learning with predictions in the discrete, high-dimensional setting and suggests avenues for extending the framework to other complex models.

Abstract

Given i.i.d.~samples from an unknown distribution

, the goal of distribution learning is to recover the parameters of a distribution that is close to

. When

belongs to the class of product distributions on the Boolean hypercube

, it is known that

samples are necessary to learn

within total variation (TV) distance

. We revisit this problem when the learner is also given as advice the parameters of a product distribution

. We show that there is an efficient algorithm to learn

within TV distance

that has sample complexity

, if

. Here,

and

are the mean vectors of

and

respectively, and no bound on

is known to the algorithm a priori.

Product distribution learning with imperfect advice

TL;DR

Abstract

Product distribution learning with imperfect advice

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (18)