Table of Contents
Fetching ...

A Conditional Distribution Equality Testing Framework using Deep Generative Learning

Siming Zheng, Tong Wang, Meifang Lan, Yuanyuan Lin

TL;DR

This work tackles the problem of testing equality of conditional distributions $\mathbb{P}_{1,Y|X}$ and $\mathbb{P}_{2,Y|X}$ under covariate shift and causal invariance. It introduces a general conditional-generative framework that transforms conditional testing into an unconditional two-sample test via data splitting, enabling flexible integration with neural-network–based generators. As a concrete instantiation, the authors develop the Generative Classification-Accuracy-Based Conditional Distribution Equality Test (GCA-CDET) using mixture density networks (MDNs) to learn $\mathbb{P}_{1,Y|X}$ and a classification-based test to decide equality; they prove convergence rates for the MDN generator and the testing-consistency of GCA-CDET, and demonstrate strong empirical performance on synthetic data and real datasets (Wine Quality and HIV-1 Drug Resistance). The framework is designed to handle high-dimensional covariates and imbalanced samples and can accommodate other state-of-the-art conditional generative models, with theoretical guarantees and practical evidence supporting its utility in covariate-shift and causal-invariance settings.

Abstract

In this paper, we propose a general framework for testing the conditional distribution equality in a two-sample problem, which is most relevant to covariate shift and causal discovery. Our framework is built on neural network-based generative methods and sample splitting techniques by transforming the conditional testing problem into an unconditional one. We introduce the generative classification accuracy-based conditional distribution equality test (GCA-CDET) to illustrate the proposed framework. We establish the convergence rate for the learned generator by deriving new results related to the recently-developed offset Rademacher complexity and prove the testing consistency of GCA-CDET under mild conditions.Empirically, we conduct numerical studies including synthetic datasets and two real-world datasets, demonstrating the effectiveness of our approach. Additional discussions on the optimality of the proposed framework are provided in the online supplementary material.

A Conditional Distribution Equality Testing Framework using Deep Generative Learning

TL;DR

This work tackles the problem of testing equality of conditional distributions and under covariate shift and causal invariance. It introduces a general conditional-generative framework that transforms conditional testing into an unconditional two-sample test via data splitting, enabling flexible integration with neural-network–based generators. As a concrete instantiation, the authors develop the Generative Classification-Accuracy-Based Conditional Distribution Equality Test (GCA-CDET) using mixture density networks (MDNs) to learn and a classification-based test to decide equality; they prove convergence rates for the MDN generator and the testing-consistency of GCA-CDET, and demonstrate strong empirical performance on synthetic data and real datasets (Wine Quality and HIV-1 Drug Resistance). The framework is designed to handle high-dimensional covariates and imbalanced samples and can accommodate other state-of-the-art conditional generative models, with theoretical guarantees and practical evidence supporting its utility in covariate-shift and causal-invariance settings.

Abstract

In this paper, we propose a general framework for testing the conditional distribution equality in a two-sample problem, which is most relevant to covariate shift and causal discovery. Our framework is built on neural network-based generative methods and sample splitting techniques by transforming the conditional testing problem into an unconditional one. We introduce the generative classification accuracy-based conditional distribution equality test (GCA-CDET) to illustrate the proposed framework. We establish the convergence rate for the learned generator by deriving new results related to the recently-developed offset Rademacher complexity and prove the testing consistency of GCA-CDET under mild conditions.Empirically, we conduct numerical studies including synthetic datasets and two real-world datasets, demonstrating the effectiveness of our approach. Additional discussions on the optimality of the proposed framework are provided in the online supplementary material.

Paper Structure

This paper contains 19 sections, 3 theorems, 25 equations, 5 tables, 1 algorithm.

Key Result

Theorem 4.1

Under Assumptions f1_joint_holder_cond_general & F_net_cond_general, it holds that where $c_{p}=2p^2+4p+4$ and $C$ is an absolute constant depending on $\beta,c_1,c_2,M,p,d$.

Theorems & Definitions (9)

  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Definition 1: Hölder class
  • Theorem 4.1: Nonasymptotic upper bound of the MDNs-based conditional density estimator
  • Remark 5
  • Corollary 4.1: Nonasymptotic upper bound of the conditional generator in total variation distance
  • Theorem 4.2: Testing consistency of GCA-CDET