Testing Distributions of Huge Objects

Oded Goldreich; Dana Ron

Testing Distributions of Huge Objects

Oded Goldreich, Dana Ron

TL;DR

This work introduces the Testing Distributions of Huge Objects (DoHO) model, which blends distribution testing with property testing on very long objects by sampling distributions over $n$-bit strings and probing each sample at selected coordinates. Distance between distributions is defined via earth mover distance under the relative Hamming distance, enabling sublinear queries relative to object size while preserving meaningful proximity notions. The paper delivers general bounds linking DoHO query complexity to standard sample complexities, develops testers for natural properties such as support size, uniformity, and $m$-granularity, and extends to tuples of distributions and to distributions that arise as perturbations, random cyclic shifts, and random isomorphic copies of graphs. Additional contributions include testers for equality of distributions in the DoHO setting, a framework for self-correctable testable subsets, and a detailed exploration of how these ideas apply to structured objects like graphs and cyclic shifts. Overall, DoHO provides a versatile toolkit for analyzing distributions over huge objects with sublinear probing, with implications for genetics, large-scale data, and graph-related problems where full readout is impractical.

Abstract

We initiate a study of a new model of property testing that is a hybrid of testing properties of distributions and testing properties of strings. Specifically, the new model refers to testing properties of distributions, but these are distributions over huge objects (i.e., very long strings). Accordingly, the model accounts for the total number of local probes into these objects (resp., queries to the strings) as well as for the distance between objects (resp., strings), and the distance between distributions is defined as the earth mover's distance with respect to the relative Hamming distance between strings. We study the query complexity of testing in this new model, focusing on three directions. First, we try to relate the query complexity of testing properties in the new model to the sample complexity of testing these properties in the standard distribution testing model. Second, we consider the complexity of testing properties that arise naturally in the new model (e.g., distributions that capture random variations of fixed strings). Third, we consider the complexity of testing properties that were extensively studied in the standard distribution testing model: Two such cases are uniform distributions and pairs of identical distributions.

Testing Distributions of Huge Objects

TL;DR

This work introduces the Testing Distributions of Huge Objects (DoHO) model, which blends distribution testing with property testing on very long objects by sampling distributions over

-bit strings and probing each sample at selected coordinates. Distance between distributions is defined via earth mover distance under the relative Hamming distance, enabling sublinear queries relative to object size while preserving meaningful proximity notions. The paper delivers general bounds linking DoHO query complexity to standard sample complexities, develops testers for natural properties such as support size, uniformity, and

-granularity, and extends to tuples of distributions and to distributions that arise as perturbations, random cyclic shifts, and random isomorphic copies of graphs. Additional contributions include testers for equality of distributions in the DoHO setting, a framework for self-correctable testable subsets, and a detailed exploration of how these ideas apply to structured objects like graphs and cyclic shifts. Overall, DoHO provides a versatile toolkit for analyzing distributions over huge objects with sublinear probing, with implications for genetics, large-scale data, and graph-related problems where full readout is impractical.

Abstract

Paper Structure (45 sections, 24 theorems, 4 equations)

This paper contains 45 sections, 24 theorems, 4 equations.

Introduction
The new model
Generalization.
The standard notions of testing as special cases (and other observations)
Our Results
Some general bounds on the query complexity of testing in the DoHO model
An opposite extreme.
Testing previously studied properties of distributions
Tuples of distributions.
Distributions as variations of an ideal object
Noisy versions of a string, where we bound the noise level.
Random cyclic-shifts of a string.
Random isomorphic copies of a graph (represented by its adjacency matrix).
Orientation and Techniques
Conventions.
...and 30 more sections

Key Result

Theorem 1.4

(From testing strings for membership in $\Pi$ to testing distributions for membership in ${\cal D}_\Pi$): If the query complexity of testing $\Pi$ is $q$, then the query complexity of testing ${\cal D}_\Pi$ in the DoHO model is at most $q'$ such that $q'(n,\epsilon)={\widetilde{O}}(1/\epsilon)\cdot

Theorems & Definitions (32)

definition 1.1
definition 1.2
Theorem 1.4
Theorem 1.6
Theorem 1.8
Theorem 1.9
Theorem 1.10
Theorem 1.11
claim 1.12
definition 2.1
...and 22 more

Testing Distributions of Huge Objects

TL;DR

Abstract

Testing Distributions of Huge Objects

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (32)