Table of Contents
Fetching ...

The Sample Complexity of Approximate Rejection Sampling with Applications to Smoothed Online Learning

Adam Block, Yury Polyanskiy

TL;DR

This paper studies the problem of drawing a sample from a target distribution $\nu$ using $n$ i.i.d. samples from a base distribution $\mu$ under a general $f$-divergence constraint $D_f(\nu \|\mu) \le D$. It establishes near-tight upper and lower bounds on the required sample size, showing that a modified rejection sampler achieves $\mathrm{TV}(P_{X_{j^*}}, \nu) \le \varepsilon$ when $n \ge \frac{2}{1-\varepsilon} \log(\frac{2}{\varepsilon}) (f')^{-1}(\frac{4 D_f(\nu \|\mu)}{\varepsilon}) \lor 2$, with linear $f'$ making approximate sampling impossible and superlinear $f'$ enabling tightness up to polylog factors. The results connect to smoothed online learning by introducing $f$-smoothed adversaries and deriving minimax regret bounds; Renyi-smoothed settings yield rates close to the known bounds as $\lambda$ grows, while KL-smoothed adversaries incur slower, $T^{2/3}$-type rates. The paper also develops oracle-efficient algorithms that preserve no-regret under $f$-smoothed constraints and compares sampling strategies for mean estimation across function classes. Overall, it provides a unified information-theoretic treatment of sampling under $f$-divergence constraints with broad implications for online learning and robust statistics.

Abstract

Suppose we are given access to $n$ independent samples from distribution $μ$ and we wish to output one of them with the goal of making the output distributed as close as possible to a target distribution $ν$. In this work we show that the optimal total variation distance as a function of $n$ is given by $\tildeΘ(\frac{D}{f'(n)})$ over the class of all pairs $ν,μ$ with a bounded $f$-divergence $D_f(ν\|μ)\leq D$. Previously, this question was studied only for the case when the Radon-Nikodym derivative of $ν$ with respect to $μ$ is uniformly bounded. We then consider an application in the seemingly very different field of smoothed online learning, where we show that recent results on the minimax regret and the regret of oracle-efficient algorithms still hold even under relaxed constraints on the adversary (to have bounded $f$-divergence, as opposed to bounded Radon-Nikodym derivative). Finally, we also study efficacy of importance sampling for mean estimates uniform over a function class and compare importance sampling with rejection sampling.

The Sample Complexity of Approximate Rejection Sampling with Applications to Smoothed Online Learning

TL;DR

This paper studies the problem of drawing a sample from a target distribution using i.i.d. samples from a base distribution under a general -divergence constraint . It establishes near-tight upper and lower bounds on the required sample size, showing that a modified rejection sampler achieves when , with linear making approximate sampling impossible and superlinear enabling tightness up to polylog factors. The results connect to smoothed online learning by introducing -smoothed adversaries and deriving minimax regret bounds; Renyi-smoothed settings yield rates close to the known bounds as grows, while KL-smoothed adversaries incur slower, -type rates. The paper also develops oracle-efficient algorithms that preserve no-regret under -smoothed constraints and compares sampling strategies for mean estimation across function classes. Overall, it provides a unified information-theoretic treatment of sampling under -divergence constraints with broad implications for online learning and robust statistics.

Abstract

Suppose we are given access to independent samples from distribution and we wish to output one of them with the goal of making the output distributed as close as possible to a target distribution . In this work we show that the optimal total variation distance as a function of is given by over the class of all pairs with a bounded -divergence . Previously, this question was studied only for the case when the Radon-Nikodym derivative of with respect to is uniformly bounded. We then consider an application in the seemingly very different field of smoothed online learning, where we show that recent results on the minimax regret and the regret of oracle-efficient algorithms still hold even under relaxed constraints on the adversary (to have bounded -divergence, as opposed to bounded Radon-Nikodym derivative). Finally, we also study efficacy of importance sampling for mean estimates uniform over a function class and compare importance sampling with rejection sampling.
Paper Structure (21 sections, 35 theorems, 161 equations)

This paper contains 21 sections, 35 theorems, 161 equations.

Key Result

Theorem 3

Suppose that $\mu, \nu$ are probability distributions on some set $\mathcal{X}$ and suppose that $X_1, \dots, X_n \sim \mu$ are independent. Fix some $f$ satisfying the conditions in Definition def:fdivergence. For $\varepsilon > 0$, if then there exists a selection rule $j^{\ast}$ satisfying $\mathop{\mathrm{TV}}\nolimits\left( P_{X_{j^{\ast}}}, \nu \right) \leq \varepsilon$.

Theorems & Definitions (74)

  • Definition 1
  • Remark 2
  • Example 1: Total Variation
  • Example 2: KL Divergence
  • Example 3: Renyi Divergence
  • Example 4: $\mathcal{E}_\gamma$ Divergence
  • Theorem 3: Upper Bound
  • Example 5: Total Variation
  • Example 6: KL Divergence
  • Example 7: Renyi Divergence
  • ...and 64 more