A simple polynomial-time approximation algorithm for the total variation distance between two product distributions

Weiming Feng; Heng Guo; Mark Jerrum; Jiaheng Wang

A simple polynomial-time approximation algorithm for the total variation distance between two product distributions

Weiming Feng, Heng Guo, Mark Jerrum, Jiaheng Wang

TL;DR

This work provides a polynomial-time Monte Carlo method to approximate the total variation distance $d_{\mathrm{TV}}(P,Q)$ between two product distributions $P=\bigotimes_{i=1}^n P_i$ and $Q=\bigotimes_{i=1}^n Q_i$, addressing the hardness of exact computation. It leverages a coordinate-wise greedy coupling to define a sampling distribution $\pi$ conditioned on $X\neq Y$ and uses a likelihood-ratio estimator $f(\omega)$ comparing an optimal coupling to the greedy one. The algorithm achieves a relative error within $\pm \varepsilon$ with probability at least $1-\delta$ in time $O\left(\frac{n^2}{\varepsilon^2} \log \frac{1}{\delta}\right)$, and accommodates varying domain sizes per coordinate. The approach generalizes beyond restricted Boolean domains and relies on a median-of-means scheme to ensure reliability. This provides a practical, scalable tool for TV-distance estimation where exact computation is intractable.

Abstract

We give a simple polynomial-time approximation algorithm for the total variation distance between two product distributions.

A simple polynomial-time approximation algorithm for the total variation distance between two product distributions

TL;DR

This work provides a polynomial-time Monte Carlo method to approximate the total variation distance

between two product distributions

and

, addressing the hardness of exact computation. It leverages a coordinate-wise greedy coupling to define a sampling distribution

conditioned on

and uses a likelihood-ratio estimator

comparing an optimal coupling to the greedy one. The algorithm achieves a relative error within

with probability at least

in time

, and accommodates varying domain sizes per coordinate. The approach generalizes beyond restricted Boolean domains and relies on a median-of-means scheme to ensure reliability. This provides a practical, scalable tool for TV-distance estimation where exact computation is intractable.

Abstract

We give a simple polynomial-time approximation algorithm for the total variation distance between two product distributions.

Paper Structure (3 sections, 4 theorems, 13 equations)

This paper contains 3 sections, 4 theorems, 13 equations.

Introduction
Preliminaries
Algorithm

Key Result

Theorem 1.1

Let $[q] = \{1,2,\ldots,q\}$ be a finite set. There exists an algorithm such that given two product distributions $P,Q$ over $[q]^n$ and parameters $\varepsilon > 0$ and $0 <\delta < 1$, it outputs a random value $\widehat{d}$ in time $O(\frac{n^2}{\varepsilon^2} \log \frac{1}{\delta})$ such that $(

Theorems & Definitions (4)

Theorem 1.1
Lemma 3.1
Lemma 3.2
Lemma 3.3

A simple polynomial-time approximation algorithm for the total variation distance between two product distributions

TL;DR

Abstract

A simple polynomial-time approximation algorithm for the total variation distance between two product distributions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (4)