Table of Contents
Fetching ...

Optimal Prediction-Augmented Algorithms for Testing Independence of Distributions

Maryam Aliakbarpour, Alireza Azizi, Ria Stevens

TL;DR

This work designs a bivariate independence tester for discrete distributions that adaptively reduces sample complexity based on the prediction error, and matches minimax lower bounds demonstrating that the testers achieve optimal sample complexity.

Abstract

Independence testing is a fundamental problem in statistical inference: given samples from a joint distribution $p$ over multiple random variables, the goal is to determine whether $p$ is a product distribution or is $ε$-far from all product distributions in total variation distance. In the non-parametric finite-sample regime, this task is notoriously expensive, as the minimax sample complexity scales polynomially with the support size. In this work, we move beyond these worst-case limitations by leveraging the framework of \textit{augmented distribution testing}. We design independence testers that incorporate auxiliary, but potentially untrustworthy, predictive information. Our framework ensures that the tester remains robust, maintaining worst-case validity regardless of the prediction's quality, while significantly improving sample efficiency when the prediction is accurate. Our main contributions include: (i) a bivariate independence tester for discrete distributions that adaptively reduces sample complexity based on the prediction error; (ii) a generalization to the high-dimensional multivariate setting for testing the independence of $d$ random variables; and (iii) matching minimax lower bounds demonstrating that our testers achieve optimal sample complexity.

Optimal Prediction-Augmented Algorithms for Testing Independence of Distributions

TL;DR

This work designs a bivariate independence tester for discrete distributions that adaptively reduces sample complexity based on the prediction error, and matches minimax lower bounds demonstrating that the testers achieve optimal sample complexity.

Abstract

Independence testing is a fundamental problem in statistical inference: given samples from a joint distribution over multiple random variables, the goal is to determine whether is a product distribution or is -far from all product distributions in total variation distance. In the non-parametric finite-sample regime, this task is notoriously expensive, as the minimax sample complexity scales polynomially with the support size. In this work, we move beyond these worst-case limitations by leveraging the framework of \textit{augmented distribution testing}. We design independence testers that incorporate auxiliary, but potentially untrustworthy, predictive information. Our framework ensures that the tester remains robust, maintaining worst-case validity regardless of the prediction's quality, while significantly improving sample efficiency when the prediction is accurate. Our main contributions include: (i) a bivariate independence tester for discrete distributions that adaptively reduces sample complexity based on the prediction error; (ii) a generalization to the high-dimensional multivariate setting for testing the independence of random variables; and (iii) matching minimax lower bounds demonstrating that our testers achieve optimal sample complexity.
Paper Structure (45 sections, 20 theorems, 139 equations, 4 algorithms)

This paper contains 45 sections, 20 theorems, 139 equations, 4 algorithms.

Key Result

Theorem 2

Let $d \geq 2$. Let $p$ be an unknown distribution over $n_1 \times \ldots \times n_d$ , and let $\hat{p}$ be a known prediction for $p$ over the same domain. Let $N \coloneqq \prod_{i=1}^d n_i$ denote the total domain size. For every $\epsilon \in (0, 1)$ and $\alpha \in [0, 1]$, the sample complex

Theorems & Definitions (38)

  • Definition 1.1: Augmented Independence Tester
  • Remark 1: Finding the prediction error.
  • Theorem 2: Informal version of Theorems \ref{['thm:2D-UB']}, \ref{['thrm: HighDim']}, \ref{['thm:2D-LB']} and \ref{['thm:LB-d-dim-testing']}
  • Remark 3: Success Amplification
  • Theorem 4
  • proof
  • Theorem 5
  • proof
  • Lemma 3.1
  • proof
  • ...and 28 more