Optimal Prediction-Augmented Algorithms for Testing Independence of Distributions

Maryam Aliakbarpour; Alireza Azizi; Ria Stevens

Optimal Prediction-Augmented Algorithms for Testing Independence of Distributions

Maryam Aliakbarpour, Alireza Azizi, Ria Stevens

TL;DR

This work designs a bivariate independence tester for discrete distributions that adaptively reduces sample complexity based on the prediction error, and matches minimax lower bounds demonstrating that the testers achieve optimal sample complexity.

Abstract

Independence testing is a fundamental problem in statistical inference: given samples from a joint distribution $p$ over multiple random variables, the goal is to determine whether $p$ is a product distribution or is $ε$-far from all product distributions in total variation distance. In the non-parametric finite-sample regime, this task is notoriously expensive, as the minimax sample complexity scales polynomially with the support size. In this work, we move beyond these worst-case limitations by leveraging the framework of \textit{augmented distribution testing}. We design independence testers that incorporate auxiliary, but potentially untrustworthy, predictive information. Our framework ensures that the tester remains robust, maintaining worst-case validity regardless of the prediction's quality, while significantly improving sample efficiency when the prediction is accurate. Our main contributions include: (i) a bivariate independence tester for discrete distributions that adaptively reduces sample complexity based on the prediction error; (ii) a generalization to the high-dimensional multivariate setting for testing the independence of $d$ random variables; and (iii) matching minimax lower bounds demonstrating that our testers achieve optimal sample complexity.

Optimal Prediction-Augmented Algorithms for Testing Independence of Distributions

TL;DR

Abstract

Independence testing is a fundamental problem in statistical inference: given samples from a joint distribution

over multiple random variables, the goal is to determine whether

is a product distribution or is

-far from all product distributions in total variation distance. In the non-parametric finite-sample regime, this task is notoriously expensive, as the minimax sample complexity scales polynomially with the support size. In this work, we move beyond these worst-case limitations by leveraging the framework of \textit{augmented distribution testing}. We design independence testers that incorporate auxiliary, but potentially untrustworthy, predictive information. Our framework ensures that the tester remains robust, maintaining worst-case validity regardless of the prediction's quality, while significantly improving sample efficiency when the prediction is accurate. Our main contributions include: (i) a bivariate independence tester for discrete distributions that adaptively reduces sample complexity based on the prediction error; (ii) a generalization to the high-dimensional multivariate setting for testing the independence of

random variables; and (iii) matching minimax lower bounds demonstrating that our testers achieve optimal sample complexity.

Paper Structure (45 sections, 20 theorems, 139 equations, 4 algorithms)

This paper contains 45 sections, 20 theorems, 139 equations, 4 algorithms.

Introduction
Problem Statement
Our Main Result
Technical Overview
Background: Flattening
Standard Flattening.
Augmented Flattening.
Closeness Testing via Flattening.
Independence Testing via Flattening.
Overview of our Results
Upper Bound for Two-Dimensional Augmented Independence Testing.
Lower Bound for Two-Dimensional Augmented Independence Testing.
Upper Bound for $d$-Dimensional Augmented Independence Testing.
Lower Bound for $d$-Dimensional Augmented Independence Testing.
Related Works
...and 30 more sections

Key Result

Theorem 2

Let $d \geq 2$. Let $p$ be an unknown distribution over $n_1 \times \ldots \times n_d$ , and let $\hat{p}$ be a known prediction for $p$ over the same domain. Let $N \coloneqq \prod_{i=1}^d n_i$ denote the total domain size. For every $\epsilon \in (0, 1)$ and $\alpha \in [0, 1]$, the sample complex

Theorems & Definitions (38)

Definition 1.1: Augmented Independence Tester
Remark 1: Finding the prediction error.
Theorem 2: Informal version of Theorems \ref{['thm:2D-UB']}, \ref{['thrm: HighDim']}, \ref{['thm:2D-LB']} and \ref{['thm:LB-d-dim-testing']}
Remark 3: Success Amplification
Theorem 4
proof
Theorem 5
proof
Lemma 3.1
proof
...and 28 more

Optimal Prediction-Augmented Algorithms for Testing Independence of Distributions

TL;DR

Abstract

Optimal Prediction-Augmented Algorithms for Testing Independence of Distributions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (38)