Table of Contents
Fetching ...

Learning Robust Diffusion Models from Imprecise Supervision

Dong-Dong Wu, Jiacheng Cui, Wei Wang, Zhiqiang Shen, Masashi Sugiyama

TL;DR

DMIS is proposed, a unified framework for training robust Diffusion Models from Imprecise Supervision from Imprecise Supervision, which is the first systematic study within diffusion models.

Abstract

Conditional diffusion models have achieved remarkable success in various generative tasks recently, but their training typically relies on large-scale datasets that inevitably contain imprecise information in conditional inputs. Such supervision, often stemming from noisy, ambiguous, or incomplete labels, will cause condition mismatch and degrade generation quality. To address this challenge, we propose DMIS, a unified framework for training robust Diffusion Models from Imprecise Supervision, which is the first systematic study within diffusion models. Our framework is derived from likelihood maximization and decomposes the objective into generative and classification components: the generative component models imprecise-label distributions, while the classification component leverages a diffusion classifier to infer class-posterior probabilities, with its efficiency further improved by an optimized timestep sampling strategy. Extensive experiments on diverse forms of imprecise supervision, covering tasks of image generation, weakly supervised learning, and noisy dataset condensation demonstrate that DMIS consistently produces high-quality and class-discriminative samples.

Learning Robust Diffusion Models from Imprecise Supervision

TL;DR

DMIS is proposed, a unified framework for training robust Diffusion Models from Imprecise Supervision from Imprecise Supervision, which is the first systematic study within diffusion models.

Abstract

Conditional diffusion models have achieved remarkable success in various generative tasks recently, but their training typically relies on large-scale datasets that inevitably contain imprecise information in conditional inputs. Such supervision, often stemming from noisy, ambiguous, or incomplete labels, will cause condition mismatch and degrade generation quality. To address this challenge, we propose DMIS, a unified framework for training robust Diffusion Models from Imprecise Supervision, which is the first systematic study within diffusion models. Our framework is derived from likelihood maximization and decomposes the objective into generative and classification components: the generative component models imprecise-label distributions, while the classification component leverages a diffusion classifier to infer class-posterior probabilities, with its efficiency further improved by an optimized timestep sampling strategy. Extensive experiments on diverse forms of imprecise supervision, covering tasks of image generation, weakly supervised learning, and noisy dataset condensation demonstrate that DMIS consistently produces high-quality and class-discriminative samples.

Paper Structure

This paper contains 41 sections, 5 theorems, 69 equations, 9 figures, 9 tables, 1 algorithm.

Key Result

Theorem 1

Under the class-conditional setting, for all $\mathbf{x}_t \in \mathcal{X}$, $z \subseteq \mathcal{Y}$, and $t \in [T]$,

Figures (9)

  • Figure 1: (a): Test accuracy (%) comparison on CIFAR-10 dataset under time complexity reduction technique from chen2023robust and ours. (b): Test accuracy (%) on CIFAR-10 dataset evaluated with only a single timestep per class. (c): Violin plot of class-wise $\textsc{Err}(\cdot,\cdot,y)$ computed across samples using a fixed subinterval length $\Delta$. Wider regions of the violin indicate higher density.
  • Figure 2: Comparison of conditionally generated images from $\textsl{Vanilla}$ (top) and our $\textsl{DMIS}$ model (bottom), each trained with 40% symmetric noise on Fashion-MNIST, CIFAR-10, and ImageNette.
  • Figure 3: Examples of randomly generated Fashion-MNIST images from Vanilla models trained under different types of imprecise supervision.
  • Figure 4: Examples of randomly generated CIFAR-10 images from Vanilla models trained under different types of imprecise supervision.
  • Figure 5: Examples of randomly generated ImageNette images from Vanilla models trained under different types of imprecise supervision.
  • ...and 4 more figures

Theorems & Definitions (8)

  • Remark 1
  • Theorem 1
  • Proposition 1
  • Definition 1: Approximated Posterior Noised Diffusion Classifier chen2024diffusion
  • Theorem 2
  • Theorem 3: Necessary Condition for Optimal Subinterval
  • Lemma 1
  • proof