Table of Contents
Fetching ...

General Frameworks for Conditional Two-Sample Testing

Seongchan Lee, Suman Cha, Ilmun Kim

TL;DR

This work establishes a hardness result for conditional two-sample testing, introduces two general frameworks that implicitly or explicitly target specific classes of distributions for their validity and power, and transforms the problem into comparing marginal distributions with estimated density ratios.

Abstract

We study the problem of conditional two-sample testing, which aims to determine whether two populations have the same distribution after accounting for confounding factors. This problem commonly arises in various applications, such as domain adaptation and algorithmic fairness, where comparing two groups is essential while controlling for confounding variables. We begin by establishing a hardness result for conditional two-sample testing, demonstrating that no valid test can have significant power against any single alternative without proper assumptions. We then introduce two general frameworks that implicitly or explicitly target specific classes of distributions for their validity and power. Our first framework allows us to convert any conditional independence test into a conditional two-sample test in a black-box manner, while preserving the asymptotic properties of the original conditional independence test. The second framework transforms the problem into comparing marginal distributions with estimated density ratios, which allows us to leverage existing methods for marginal two-sample testing. We demonstrate this idea in a concrete manner with classification and kernel-based methods. Finally, simulation studies are conducted to illustrate the proposed frameworks in finite-sample scenarios.

General Frameworks for Conditional Two-Sample Testing

TL;DR

This work establishes a hardness result for conditional two-sample testing, introduces two general frameworks that implicitly or explicitly target specific classes of distributions for their validity and power, and transforms the problem into comparing marginal distributions with estimated density ratios.

Abstract

We study the problem of conditional two-sample testing, which aims to determine whether two populations have the same distribution after accounting for confounding factors. This problem commonly arises in various applications, such as domain adaptation and algorithmic fairness, where comparing two groups is essential while controlling for confounding variables. We begin by establishing a hardness result for conditional two-sample testing, demonstrating that no valid test can have significant power against any single alternative without proper assumptions. We then introduce two general frameworks that implicitly or explicitly target specific classes of distributions for their validity and power. Our first framework allows us to convert any conditional independence test into a conditional two-sample test in a black-box manner, while preserving the asymptotic properties of the original conditional independence test. The second framework transforms the problem into comparing marginal distributions with estimated density ratios, which allows us to leverage existing methods for marginal two-sample testing. We demonstrate this idea in a concrete manner with classification and kernel-based methods. Finally, simulation studies are conducted to illustrate the proposed frameworks in finite-sample scenarios.

Paper Structure

This paper contains 38 sections, 10 theorems, 70 equations, 6 figures, 7 tables, 1 algorithm.

Key Result

Theorem 1

Let $n_1,n_2 \in \mathbb{N}$ with $n=n_1+n_2$, $\alpha \in (0,1)$ and $M \in (0,\infty]$. For $\{(X_i,Y_i,Z_i)\}_{i=1}^n \overset{\mathrm{i.i.d.}}{\sim} P_{XYZ}\coloneqq P$, consider a test $\phi : \{(X_i,Y_i,Z_i)\}_{i=1}^n \mapsto \{0,1\}$. Suppose that $\phi$ controls the type I error at level $\ Then the power of $\phi$ conditional on $N_1=n_1$ and $N_2 = n_2$ is at most $\alpha$ for any $P \i

Figures (6)

  • Figure 1: Rejection rates for Scenario 1 under null and alternative hypotheses, shown for both unbounded (U) and bounded (B) settings. Results are averaged over 500 repetitions with significance level $\alpha = 0.05$.
  • Figure 2: Rejection rates for Scenario 2 under null and alternative hypotheses, shown for both unbounded (U) and bounded (B) settings. Results are averaged over 500 repetitions with significance level $\alpha = 0.05$.
  • Figure 3: Rejection rates for Scenario 3 under null and alternative hypotheses, shown for both unbounded (U) and bounded (B) settings. Results are averaged over 500 repetitions with significance level $\alpha = 0.05$.
  • Figure 4: Performance comparison of DRT methods on diamonds and superconductivity datasets using LL and KLR for density ratio estimation. Rejection rates are averaged over 500 repetitions with $\alpha = 0.05$, under null (top) and alternative (bottom) hypotheses.
  • Figure 5: Log-scaled mean squared errors of marginal density ratio $r_X(x)$ (left) and conditional density ratio $r_{Y|X}(y|x)$ (right) estimates for LL and KLR methods across various sample sizes. Results are shown for diamonds and superconductivity datasets, based on median values from 500 simulations under the null hypothesis.
  • ...and 1 more figures

Theorems & Definitions (16)

  • Theorem 1
  • Theorem 2
  • Example 1: Stable case
  • Example 2: Unstable case
  • Example 3
  • Example 4
  • Example 5
  • Example 6
  • Theorem 3
  • Corollary 1
  • ...and 6 more