Table of Contents
Fetching ...

ReLU integral probability metric and its applications

Yuha Park, Kunwoong Kim, Insung Kong, Yongdai Kim

TL;DR

The proposed IPM leverages a specific parametric family of discriminators, such as single-node neural networks with ReLU activation, to effectively distinguish between distributions, making it applicable in high-dimensional settings.

Abstract

We propose a parametric integral probability metric (IPM) to measure the discrepancy between two probability measures. The proposed IPM leverages a specific parametric family of discriminators, such as single-node neural networks with ReLU activation, to effectively distinguish between distributions, making it applicable in high-dimensional settings. By optimizing over the parameters of the chosen discriminator class, the proposed IPM demonstrates that its estimators have good convergence rates and can serve as a surrogate for other IPMs that use smooth nonparametric discriminator classes. We present an efficient algorithm for practical computation, offering a simple implementation and requiring fewer hyperparameters. Furthermore, we explore its applications in various tasks, such as covariate balancing for causal inference and fair representation learning. Across such diverse applications, we demonstrate that the proposed IPM provides strong theoretical guarantees, and empirical experiments show that it achieves comparable or even superior performance to other methods.

ReLU integral probability metric and its applications

TL;DR

The proposed IPM leverages a specific parametric family of discriminators, such as single-node neural networks with ReLU activation, to effectively distinguish between distributions, making it applicable in high-dimensional settings.

Abstract

We propose a parametric integral probability metric (IPM) to measure the discrepancy between two probability measures. The proposed IPM leverages a specific parametric family of discriminators, such as single-node neural networks with ReLU activation, to effectively distinguish between distributions, making it applicable in high-dimensional settings. By optimizing over the parameters of the chosen discriminator class, the proposed IPM demonstrates that its estimators have good convergence rates and can serve as a surrogate for other IPMs that use smooth nonparametric discriminator classes. We present an efficient algorithm for practical computation, offering a simple implementation and requiring fewer hyperparameters. Furthermore, we explore its applications in various tasks, such as covariate balancing for causal inference and fair representation learning. Across such diverse applications, we demonstrate that the proposed IPM provides strong theoretical guarantees, and empirical experiments show that it achieves comparable or even superior performance to other methods.

Paper Structure

This paper contains 51 sections, 12 theorems, 69 equations, 9 figures, 3 tables, 1 algorithm.

Key Result

Proposition 1

$d_{\mathcal{F}_{\textup{ReLU}}}(\mathcal{P}, \mathcal{Q})=0 \text{ if and only if } \mathcal{P} \equiv \mathcal{Q}$ for two probability measures $\mathcal{P}$ and $\mathcal{Q}$.

Figures (9)

  • Figure 1: Single-layered NN (ReLU activation) prediction head: Pareto-front lines of fairness level $\Delta \overline{\textup{DP}}$ and Acc. (Left) Adult, (Center) Dutch, (Right) Crime.
  • Figure 2: Single-layered NN (ReLU activation) prediction head: Pareto-front lines of fairness level $\Delta \textup{DP}$ and Acc. (Left) Adult, (Center) Dutch, (Right) Crime.
  • Figure 3: Single-layered NN (ReLU activation) prediction head: Pareto-front lines of fairness level $\Delta \textup{SDP}$ and Acc. (Left) Adult, (Center) Dutch, (Right) Crime.
  • Figure 4: Linear prediction head: Pareto-front lines of fairness level $\Delta \overline{\textup{DP}}$ and Acc. (Left) Adult, (Center) Dutch, (Right) Crime.
  • Figure 5: Linear prediction head: Pareto-front lines of fairness level $\Delta \textup{DP}$ and Acc. (Left) Adult, (Center) Dutch, (Right) Crime.
  • ...and 4 more figures

Theorems & Definitions (20)

  • Proposition 1
  • Remark 1
  • Theorem 2: ReLU-IPM bounds Hölder-IPM
  • Remark 2
  • Theorem 3: Rate of convergence of the empirical ReLU-IPM
  • Theorem 4
  • Theorem 5: Level of group fairness
  • Definition 1: Bounded differences property
  • Definition 2: Rademacher random variable
  • Definition 3: $L_p(\mathcal{Q})$-norm
  • ...and 10 more