Table of Contents
Fetching ...

SONA: Learning Conditional, Unconditional, and Mismatching-Aware Discriminator

Yuhta Takida, Satoshi Hayakawa, Takashi Shibuya, Masaaki Imaizumi, Naoki Murata, Bac Nguyen, Toshimitsu Uesaka, Chieh-Hsin Lai, Yuki Mitsufuji

TL;DR

SONA tackles the core challenge of conditional generation in GANs by decoupling authenticity from conditional alignment through a discriminator with separate naturalness and alignment projections. It introduces three synergistic components: unconditional discrimination via a sliced-Wasserstein–based SAN objective, matching-aware discrimination using Bradley–Terry–style mismatched samples, and an adaptive weighting scheme that balances these goals during training. Theoretical results connect SAN to a meaningful distance between data and generator distributions, while BT-based losses yield conditional and mismatching guidance, culminating in a robust overall objective. Empirically, SONA surpasses state-of-the-art discriminators on class-conditional benchmarks and shows strong performance in text-to-image tasks, demonstrating versatility and practical impact for high-fidelity, well-aligned conditional generation.

Abstract

Deep generative models have made significant advances in generating complex content, yet conditional generation remains a fundamental challenge. Existing conditional generative adversarial networks often struggle to balance the dual objectives of assessing authenticity and conditional alignment of input samples within their conditional discriminators. To address this, we propose a novel discriminator design that integrates three key capabilities: unconditional discrimination, matching-aware supervision to enhance alignment sensitivity, and adaptive weighting to dynamically balance all objectives. Specifically, we introduce Sum of Naturalness and Alignment (SONA), which employs separate projections for naturalness (authenticity) and alignment in the final layer with an inductive bias, supported by dedicated objective functions and an adaptive weighting mechanism. Extensive experiments on class-conditional generation tasks show that \ours achieves superior sample quality and conditional alignment compared to state-of-the-art methods. Furthermore, we demonstrate its effectiveness in text-to-image generation, confirming the versatility and robustness of our approach.

SONA: Learning Conditional, Unconditional, and Mismatching-Aware Discriminator

TL;DR

SONA tackles the core challenge of conditional generation in GANs by decoupling authenticity from conditional alignment through a discriminator with separate naturalness and alignment projections. It introduces three synergistic components: unconditional discrimination via a sliced-Wasserstein–based SAN objective, matching-aware discrimination using Bradley–Terry–style mismatched samples, and an adaptive weighting scheme that balances these goals during training. Theoretical results connect SAN to a meaningful distance between data and generator distributions, while BT-based losses yield conditional and mismatching guidance, culminating in a robust overall objective. Empirically, SONA surpasses state-of-the-art discriminators on class-conditional benchmarks and shows strong performance in text-to-image tasks, demonstrating versatility and practical impact for high-fidelity, well-aligned conditional generation.

Abstract

Deep generative models have made significant advances in generating complex content, yet conditional generation remains a fundamental challenge. Existing conditional generative adversarial networks often struggle to balance the dual objectives of assessing authenticity and conditional alignment of input samples within their conditional discriminators. To address this, we propose a novel discriminator design that integrates three key capabilities: unconditional discrimination, matching-aware supervision to enhance alignment sensitivity, and adaptive weighting to dynamically balance all objectives. Specifically, we introduce Sum of Naturalness and Alignment (SONA), which employs separate projections for naturalness (authenticity) and alignment in the final layer with an inductive bias, supported by dedicated objective functions and an adaptive weighting mechanism. Extensive experiments on class-conditional generation tasks show that \ours achieves superior sample quality and conditional alignment compared to state-of-the-art methods. Furthermore, we demonstrate its effectiveness in text-to-image generation, confirming the versatility and robustness of our approach.

Paper Structure

This paper contains 46 sections, 7 theorems, 33 equations, 9 figures, 7 tables, 1 algorithm.

Key Result

Proposition 1

Assume $p_{{{\text{\rm data}}}}(y)$ is a constant regardless of $y\in Y$, e.g., a uniform distribution. The function $\tilde{f}$ maximizes $\mathcal{V}_{\textsc{CE}}$ if $\tilde{f}(x,y)=\log p_{{{\text{\rm data}}}}(y|x)+r_X(x)$ for an arbitrary function $r_X:X\to\mathbb{R}$.

Figures (9)

  • Figure 1: Comparison of SONA with existing classifier- and projection-based methods for discriminator optimization. Our approach enables independent assessment of sample naturalness and alignment, supported by the proposed inductive bias (\ref{['ssec:method:discriminator_parametrization']}) and objectives (\ref{['ssec:method:unconditional_learning', 'ssec:method:bt_losses']}).
  • Figure 2: Empirical study on MoG using Wasserstein-2 distance (W2), Conditional Wasserstein-2 distance (cW2), and the number of failure cases (NF). See \ref{['ssec:analysis:mog']} and \ref{['sapp:experimental_details:mog']} for details.
  • Figure 3: Ground truth samples and generated samples from three baseline models. Different markers and colors represent samples from distinct classes among the $N=36$ total classes.
  • Figure 4: CIFAR10: (Left) Generated samples by SONA with BigGAN. (Right) Generated samples by SONA with StyleGAN-2.
  • Figure 5: TinyImageNet: Generated samples by SONA applied with DiffAug.
  • ...and 4 more figures

Theorems & Definitions (13)

  • Proposition 1: Log conditional probability maximizes $\mathcal{V}_{\textsc{CE}}$
  • Proposition 2: Informal; Unconditional discrimination by $\mathcal{V}_{\textsc{SAN}}$
  • Proposition 3: Conditional discrimiantion by $\mathcal{V}_{\textsc{BT-c}}$
  • Proposition 4: Log gap probability maximizes $\mathcal{V}_{\textsc{BT-m}}$
  • Lemma 5
  • proof
  • Definition 1
  • Definition 2
  • Lemma 6
  • proof
  • ...and 3 more