Characterizing the Distinguishability of Product Distributions through Multicalibration

Cassandra Marcussen; Aaron Putterman; Salil Vadhan

Characterizing the Distinguishability of Product Distributions through Multicalibration

Cassandra Marcussen, Aaron Putterman, Salil Vadhan

TL;DR

The work presents a complexity-theoretic framework to compare computational indistinguishability of product distributions with their information-theoretic counterparts via multicalibration. It constructs intermediate distributions $\widetilde{X}_0,\widetilde{X}_1$ (and variants like $\widehat{X}_0$) through multicalibrated partitions, converting computational questions into statistical ones and yielding an instance-optimal characterization $k = Θ\left(d_H^{-2}(\widetilde{X}_0,\widetilde{X}_1)\right)$ for constant advantage. The framework recovers and clarifies prior results (e.g., Halevi-Rabin and Geier) while introducing a pseudo-Hellinger distance to quantify computational distinguishability in terms of information-theoretic distance. By linking efficient distinguishers to the partition-based representations and showing how to implement optimal distinguishers with small circuits, the paper provides a unified toolkit for analyzing distinguishing problems across general product distributions. The approach highlights a powerful translation from computational hardness to statistical indistinguishability, with potential implications for hardness amplification and cryptographic constructions.

Abstract

Given a sequence of samples $x_1, \dots , x_k$ promised to be drawn from one of two distributions $X_0, X_1$, a well-studied problem in statistics is to decide $\textit{which}$ distribution the samples are from. Information theoretically, the maximum advantage in distinguishing the two distributions given $k$ samples is captured by the total variation distance between $X_0^{\otimes k}$ and $X_1^{\otimes k}$. However, when we restrict our attention to $\textit{efficient distinguishers}$ (i.e., small circuits) of these two distributions, exactly characterizing the ability to distinguish $X_0^{\otimes k}$ and $X_1^{\otimes k}$ is more involved and less understood. In this work, we give a general way to reduce bounds on the computational indistinguishability of $X_0$ and $X_1$ to bounds on the $\textit{information-theoretic}$ indistinguishability of some specific, related variables $\widetilde{X}_0$ and $\widetilde{X}_1$. As a consequence, we prove a new, tight characterization of the number of samples $k$ needed to efficiently distinguish $X_0^{\otimes k}$ and $X_1^{\otimes k}$ with constant advantage as \[ k = Θ\left(d_H^{-2}\left(\widetilde{X}_0, \widetilde{X}_1\right)\right), \] which is the inverse of the squared Hellinger distance $d_H$ between two distributions $\widetilde{X}_0$ and $\widetilde{X}_1$ that are computationally indistinguishable from $X_0$ and $X_1$. Likewise, our framework can be used to re-derive a result of Halevi and Rabin (TCC 2008) and Geier (TCC 2022), proving nearly-tight bounds on how computational indistinguishability scales with the number of samples for arbitrary product distributions.

Characterizing the Distinguishability of Product Distributions through Multicalibration

TL;DR

(and variants like

) through multicalibrated partitions, converting computational questions into statistical ones and yielding an instance-optimal characterization

for constant advantage. The framework recovers and clarifies prior results (e.g., Halevi-Rabin and Geier) while introducing a pseudo-Hellinger distance to quantify computational distinguishability in terms of information-theoretic distance. By linking efficient distinguishers to the partition-based representations and showing how to implement optimal distinguishers with small circuits, the paper provides a unified toolkit for analyzing distinguishing problems across general product distributions. The approach highlights a powerful translation from computational hardness to statistical indistinguishability, with potential implications for hardness amplification and cryptographic constructions.

Abstract

Given a sequence of samples

promised to be drawn from one of two distributions

, a well-studied problem in statistics is to decide

distribution the samples are from. Information theoretically, the maximum advantage in distinguishing the two distributions given

samples is captured by the total variation distance between

and

. However, when we restrict our attention to

(i.e., small circuits) of these two distributions, exactly characterizing the ability to distinguish

and

is more involved and less understood. In this work, we give a general way to reduce bounds on the computational indistinguishability of

and

to bounds on the

indistinguishability of some specific, related variables

and

. As a consequence, we prove a new, tight characterization of the number of samples

needed to efficiently distinguish

and

with constant advantage as

which is the inverse of the squared Hellinger distance

between two distributions

and

that are computationally indistinguishable from

and

. Likewise, our framework can be used to re-derive a result of Halevi and Rabin (TCC 2008) and Geier (TCC 2022), proving nearly-tight bounds on how computational indistinguishability scales with the number of samples for arbitrary product distributions.

Characterizing the Distinguishability of Product Distributions through Multicalibration

TL;DR

Abstract

Characterizing the Distinguishability of Product Distributions through Multicalibration

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (65)