Table of Contents
Fetching ...

A simple polynomial-time approximation algorithm for the total variation distance between two product distributions

Weiming Feng, Heng Guo, Mark Jerrum, Jiaheng Wang

TL;DR

This work provides a polynomial-time Monte Carlo method to approximate the total variation distance $d_{\mathrm{TV}}(P,Q)$ between two product distributions $P=\bigotimes_{i=1}^n P_i$ and $Q=\bigotimes_{i=1}^n Q_i$, addressing the hardness of exact computation. It leverages a coordinate-wise greedy coupling to define a sampling distribution $\pi$ conditioned on $X\neq Y$ and uses a likelihood-ratio estimator $f(\omega)$ comparing an optimal coupling to the greedy one. The algorithm achieves a relative error within $\pm \varepsilon$ with probability at least $1-\delta$ in time $O\left(\frac{n^2}{\varepsilon^2} \log \frac{1}{\delta}\right)$, and accommodates varying domain sizes per coordinate. The approach generalizes beyond restricted Boolean domains and relies on a median-of-means scheme to ensure reliability. This provides a practical, scalable tool for TV-distance estimation where exact computation is intractable.

Abstract

We give a simple polynomial-time approximation algorithm for the total variation distance between two product distributions.

A simple polynomial-time approximation algorithm for the total variation distance between two product distributions

TL;DR

This work provides a polynomial-time Monte Carlo method to approximate the total variation distance between two product distributions and , addressing the hardness of exact computation. It leverages a coordinate-wise greedy coupling to define a sampling distribution conditioned on and uses a likelihood-ratio estimator comparing an optimal coupling to the greedy one. The algorithm achieves a relative error within with probability at least in time , and accommodates varying domain sizes per coordinate. The approach generalizes beyond restricted Boolean domains and relies on a median-of-means scheme to ensure reliability. This provides a practical, scalable tool for TV-distance estimation where exact computation is intractable.

Abstract

We give a simple polynomial-time approximation algorithm for the total variation distance between two product distributions.
Paper Structure (3 sections, 4 theorems, 13 equations)

This paper contains 3 sections, 4 theorems, 13 equations.

Key Result

Theorem 1.1

Let $[q] = \{1,2,\ldots,q\}$ be a finite set. There exists an algorithm such that given two product distributions $P,Q$ over $[q]^n$ and parameters $\varepsilon > 0$ and $0 <\delta < 1$, it outputs a random value $\widehat{d}$ in time $O(\frac{n^2}{\varepsilon^2} \log \frac{1}{\delta})$ such that $(

Theorems & Definitions (4)

  • Theorem 1.1
  • Lemma 3.1
  • Lemma 3.2
  • Lemma 3.3