Table of Contents
Fetching ...

Towards Uncertainty Quantification in Generative Model Learning

Giorgio Morales, Frederic Jurie, Jalal Fadili

TL;DR

This paper addresses the gap in uncertainty quantification for generative model learning by formalizing model-induced evaluation uncertainty and distinguishing it from existing sample-level UQ. It introduces ensemble-based precision-recall curve aggregation as a practical diagnostic: by projecting PR curves from multiple initializations onto a common recall grid and measuring the 10th–90th percentile spread, it quantifies how training variability affects distributional closeness. Preliminary experiments on synthetic data with diffusion models show that intermediate model depths can reduce uncertainty, while smaller training sets increase epistemic uncertainty, underscoring the method's potential to inform architecture and data decisions. The work highlights the need for robust, scalable UQ methods in generative modeling and outlines future directions to make uncertainty-aware evaluation standard in practice.

Abstract

While generative models have become increasingly prevalent across various domains, fundamental concerns regarding their reliability persist. A crucial yet understudied aspect of these models is the uncertainty quantification surrounding their distribution approximation capabilities. Current evaluation methodologies focus predominantly on measuring the closeness between the learned and the target distributions, neglecting the inherent uncertainty in these measurements. In this position paper, we formalize the problem of uncertainty quantification in generative model learning. We discuss potential research directions, including the use of ensemble-based precision-recall curves. Our preliminary experiments on synthetic datasets demonstrate the effectiveness of aggregated precision-recall curves in capturing model approximation uncertainty, enabling systematic comparison among different model architectures based on their uncertainty characteristics.

Towards Uncertainty Quantification in Generative Model Learning

TL;DR

This paper addresses the gap in uncertainty quantification for generative model learning by formalizing model-induced evaluation uncertainty and distinguishing it from existing sample-level UQ. It introduces ensemble-based precision-recall curve aggregation as a practical diagnostic: by projecting PR curves from multiple initializations onto a common recall grid and measuring the 10th–90th percentile spread, it quantifies how training variability affects distributional closeness. Preliminary experiments on synthetic data with diffusion models show that intermediate model depths can reduce uncertainty, while smaller training sets increase epistemic uncertainty, underscoring the method's potential to inform architecture and data decisions. The work highlights the need for robust, scalable UQ methods in generative modeling and outlines future directions to make uncertainty-aware evaluation standard in practice.

Abstract

While generative models have become increasingly prevalent across various domains, fundamental concerns regarding their reliability persist. A crucial yet understudied aspect of these models is the uncertainty quantification surrounding their distribution approximation capabilities. Current evaluation methodologies focus predominantly on measuring the closeness between the learned and the target distributions, neglecting the inherent uncertainty in these measurements. In this position paper, we formalize the problem of uncertainty quantification in generative model learning. We discuss potential research directions, including the use of ensemble-based precision-recall curves. Our preliminary experiments on synthetic datasets demonstrate the effectiveness of aggregated precision-recall curves in capturing model approximation uncertainty, enabling systematic comparison among different model architectures based on their uncertainty characteristics.

Paper Structure

This paper contains 7 sections, 1 equation, 5 figures.

Figures (5)

  • Figure 1: PR curves ensembles obtained with $C=$(a)$1$, (b)$2$, (c)$4$, and (d)$8$.
  • Figure 2: Real vs. generated samples ($C=4$). (a) Best-performing and (b) Worst-performing model.
  • Figure 3: PR curve ensembles ($C=2 \text{ vs. } 4$). Red areas indicate statistical significance.
  • Figure 4: PR curves obtained for $C=4$ and $N=$(a)$2500$, (b)$5000$, (c)$7500$, (d)$10000$.
  • Figure 5: Alternative PR curve uncertainty visualization showing radial dispersion at model complexities (a)$C=1$, (b)$C=2$, (c)$C=3$, and (d)$C=4$.