Towards Practical Non-Adversarial Distribution Matching

Ziyu Gong; Ben Usman; Han Zhao; David I. Inouye

Towards Practical Non-Adversarial Distribution Matching

Ziyu Gong, Ben Usman, Han Zhao, David I. Inouye

TL;DR

The paper tackles the instability of adversarial distribution matching by introducing a non-adversarial, VAE-based matching framework (VAUB) that provides upper bounds on the Generalized Jensen–Shannon Divergence while remaining model-agnostic. By relaxing invertibility and incorporating a mutual information–preserving reconstruction term, VAUB enables plug-and-play replacement of adversarial losses in standard pipelines, such as domain adaptation and fairness models. The authors further extend the approach with Noisy Jensen–Shannon Divergence (NJSD) and corresponding noisy upper bounds (NAUB, NVAUB) to mitigate vanishing gradients and local minima, and they connect these methods to fairness literature through a nuanced analysis of priors and MI terms. Empirical results on toy and benchmark datasets show improved stability and competitive performance when replacing adversarial losses with VAUB, highlighting the practical impact for robust invariant representation learning across domains.

Abstract

Distribution matching can be used to learn invariant representations with applications in fairness and robustness. Most prior works resort to adversarial matching methods but the resulting minimax problems are unstable and challenging to optimize. Non-adversarial likelihood-based approaches either require model invertibility, impose constraints on the latent prior, or lack a generic framework for distribution matching. To overcome these limitations, we propose a non-adversarial VAE-based matching method that can be applied to any model pipeline. We develop a set of alignment upper bounds for distribution matching (including a noisy bound) that have VAE-like objectives but with a different perspective. We carefully compare our method to prior VAE-based matching approaches both theoretically and empirically. Finally, we demonstrate that our novel matching losses can replace adversarial losses in standard invariant representation learning pipelines without modifying the original architectures -- thereby significantly broadening the applicability of non-adversarial matching methods.

Towards Practical Non-Adversarial Distribution Matching

TL;DR

Abstract

Paper Structure (56 sections, 11 theorems, 30 equations, 10 figures, 6 tables)

This paper contains 56 sections, 11 theorems, 30 equations, 10 figures, 6 tables.

INTRODUCTION
Notation
BACKGROUND
Adversarial Methods
Fair VAE Methods
Flow-based Methods
RELAXING INVERTIBILITY CONSTRAINT OF AUB VIA VAES
VAE-based Alignment Upper Bound (VAUB)
Preserving Mutual Information via Reconstruction Loss
Plug-and-Play Matching Loss
NOISY JENSEN-SHANNON DIVERGENCE
REVISITING VAE-BASED MATCHING METHODS FROM FAIRNESS LITERATURE
EXPERIMENTS
Simulated Experiments
Non-Matching Dimensions between Latent Space and Input Space
...and 41 more sections

Key Result

Theorem 1

VAUB is an upper bound on GJSD between the latent distributions $\{q(\bm{z}|d)\}_{d=1}^k$ with a bound gap of $\mathop{\mathrm{KL}}\nolimits(q(\bm{z}), p(\bm{z})) + \mathbb{E}_{q(d)q(\bm{z}|d)}[\mathop{\mathrm{KL}}\nolimits(q(\bm{x}|\bm{z},d),p(\bm{x}|\bm{z},d))]$ that can be made tight if the $p(\b

Figures (10)

Figure 1: This figure shows that the loss reaches a plateau during learning if VAUB is used. (a) shows the loss convergence graph for VAUB and NVAUB, while (b) visualizes the latent distribution $p(\bm{z})$ with histogram density estimation of $z_i \sim q(z|x, d=i), i\in{0,1}$ at the red circle in figure (a). Notice that the latent distribution matches the mixture of the domains but the latent domain distributions are not yet aligned.
Figure 3: This figure illustrates that Noisy JSD can reduce the vanishing gradient problem and smooth over local minimum compared to theoretic JSD. In Case 1 (a), we consider the (Noisy) JSD between two Gaussian distributions whose variance are the same but whose means are different. The gradient of JSD can vanish to zero as it reaches its maximum value as seen by the plateau regions on the top curve in (b) but this can be alleviated with noise as seen in bottom curves in (b). In Case 2 (c), we consider the (Noisy) JSD between a Gaussian mixture model where the mixture components are the same but the overall means are different. For this case, a local minimum of JSD occurs when only one mixture component overlaps as seen in the top curve of (d). However, Noisy JSD can smooth out this local minimum so that there are no local minimum.
Figure : (a) Original dist.
Figure : (a) DANN loss vs test accuracy
Figure : (a) Original dist.
...and 5 more figures

Theorems & Definitions (25)

Definition 1: VAE Alignment Upper Bound (VAUB)
Theorem 1: VAUB is an upper bound on GJSD
Proposition 2
Definition 2: Plug-and-play matching loss
Definition 3: Noisy JSD
Proposition 3
Theorem 4: Noisy alignment upper bounds
Proposition 5
Theorem 6: GJSD Upper Bound from cho2022cooperative
Lemma 7: Entropy Change of Variables from cho2022cooperative
...and 15 more

Towards Practical Non-Adversarial Distribution Matching

TL;DR

Abstract

Towards Practical Non-Adversarial Distribution Matching

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (25)