Table of Contents
Fetching ...

Characterizing the Distinguishability of Product Distributions through Multicalibration

Cassandra Marcussen, Aaron Putterman, Salil Vadhan

TL;DR

The work presents a complexity-theoretic framework to compare computational indistinguishability of product distributions with their information-theoretic counterparts via multicalibration. It constructs intermediate distributions $\widetilde{X}_0,\widetilde{X}_1$ (and variants like $\widehat{X}_0$) through multicalibrated partitions, converting computational questions into statistical ones and yielding an instance-optimal characterization $k = Θ\left(d_H^{-2}(\widetilde{X}_0,\widetilde{X}_1)\right)$ for constant advantage. The framework recovers and clarifies prior results (e.g., Halevi-Rabin and Geier) while introducing a pseudo-Hellinger distance to quantify computational distinguishability in terms of information-theoretic distance. By linking efficient distinguishers to the partition-based representations and showing how to implement optimal distinguishers with small circuits, the paper provides a unified toolkit for analyzing distinguishing problems across general product distributions. The approach highlights a powerful translation from computational hardness to statistical indistinguishability, with potential implications for hardness amplification and cryptographic constructions.

Abstract

Given a sequence of samples $x_1, \dots , x_k$ promised to be drawn from one of two distributions $X_0, X_1$, a well-studied problem in statistics is to decide $\textit{which}$ distribution the samples are from. Information theoretically, the maximum advantage in distinguishing the two distributions given $k$ samples is captured by the total variation distance between $X_0^{\otimes k}$ and $X_1^{\otimes k}$. However, when we restrict our attention to $\textit{efficient distinguishers}$ (i.e., small circuits) of these two distributions, exactly characterizing the ability to distinguish $X_0^{\otimes k}$ and $X_1^{\otimes k}$ is more involved and less understood. In this work, we give a general way to reduce bounds on the computational indistinguishability of $X_0$ and $X_1$ to bounds on the $\textit{information-theoretic}$ indistinguishability of some specific, related variables $\widetilde{X}_0$ and $\widetilde{X}_1$. As a consequence, we prove a new, tight characterization of the number of samples $k$ needed to efficiently distinguish $X_0^{\otimes k}$ and $X_1^{\otimes k}$ with constant advantage as \[ k = Θ\left(d_H^{-2}\left(\widetilde{X}_0, \widetilde{X}_1\right)\right), \] which is the inverse of the squared Hellinger distance $d_H$ between two distributions $\widetilde{X}_0$ and $\widetilde{X}_1$ that are computationally indistinguishable from $X_0$ and $X_1$. Likewise, our framework can be used to re-derive a result of Halevi and Rabin (TCC 2008) and Geier (TCC 2022), proving nearly-tight bounds on how computational indistinguishability scales with the number of samples for arbitrary product distributions.

Characterizing the Distinguishability of Product Distributions through Multicalibration

TL;DR

The work presents a complexity-theoretic framework to compare computational indistinguishability of product distributions with their information-theoretic counterparts via multicalibration. It constructs intermediate distributions (and variants like ) through multicalibrated partitions, converting computational questions into statistical ones and yielding an instance-optimal characterization for constant advantage. The framework recovers and clarifies prior results (e.g., Halevi-Rabin and Geier) while introducing a pseudo-Hellinger distance to quantify computational distinguishability in terms of information-theoretic distance. By linking efficient distinguishers to the partition-based representations and showing how to implement optimal distinguishers with small circuits, the paper provides a unified toolkit for analyzing distinguishing problems across general product distributions. The approach highlights a powerful translation from computational hardness to statistical indistinguishability, with potential implications for hardness amplification and cryptographic constructions.

Abstract

Given a sequence of samples promised to be drawn from one of two distributions , a well-studied problem in statistics is to decide distribution the samples are from. Information theoretically, the maximum advantage in distinguishing the two distributions given samples is captured by the total variation distance between and . However, when we restrict our attention to (i.e., small circuits) of these two distributions, exactly characterizing the ability to distinguish and is more involved and less understood. In this work, we give a general way to reduce bounds on the computational indistinguishability of and to bounds on the indistinguishability of some specific, related variables and . As a consequence, we prove a new, tight characterization of the number of samples needed to efficiently distinguish and with constant advantage as which is the inverse of the squared Hellinger distance between two distributions and that are computationally indistinguishable from and . Likewise, our framework can be used to re-derive a result of Halevi and Rabin (TCC 2008) and Geier (TCC 2022), proving nearly-tight bounds on how computational indistinguishability scales with the number of samples for arbitrary product distributions.

Paper Structure

This paper contains 22 sections, 22 theorems, 122 equations.

Key Result

Theorem 1.1

For every pair of random variables $X_0, X_1$ over $\mathcal{X}$, every integer $s$ and every $\varepsilon > 0$, there exist random variables $\widetilde{X}_0, \widetilde{X}_1$ such that for every $k>0$,

Theorems & Definitions (65)

  • Theorem 1.1
  • Remark 1.2
  • Remark 1.3
  • Definition 1.4
  • Theorem 1.5
  • Definition 1.6
  • Theorem 1.7
  • Definition 1.8
  • Definition 1.9
  • Definition 1.10
  • ...and 55 more