Table of Contents
Fetching ...

A Simple and Efficient Baseline for Zero-Shot Generative Classification

Zipeng Qi, Buhua Liu, Shiyan Zhang, Bao Li, Zhiqiang Xu, Haoyi Xiong, Zeke Xie

TL;DR

This work addresses the inefficiency of zero-shot diffusion-based classifiers by introducing the Gaussian Diffusion Classifier (GDC), which uses a Gaussian Mixture Model over DINOv2 embeddings of diffusion-generated reference images to perform fast, probabilistic classification without training data. The Methodology section details a two-phase pipeline: a Preparation Phase that builds per-class Gaussian components from diverse reference samples, and a Classification Phase that computes Bayes posteriors p(y|e) for test embeddings, enabling rapid inference. Empirical results demonstrate substantial accuracy gains over prior diffusion-based zero-shot methods (e.g., ImageNet gains >10 points) while achieving massive speedups (≈3×10^4× faster per image), with stronger diffusion models further boosting performance. The discussion highlights practical implications, error modes, and avenues for extending Gaussian-based diffusion classification to broader tasks and future diffusion improvements.

Abstract

Large diffusion models have become mainstream generative models in both academic studies and industrial AIGC applications. Recently, a number of works further explored how to employ the power of large diffusion models as zero-shot classifiers. While recent zero-shot diffusion-based classifiers have made performance advancement on benchmark datasets, they still suffered badly from extremely slow classification speed (e.g., ~1000 seconds per classifying single image on ImageNet). The extremely slow classification speed strongly prohibits existing zero-shot diffusion-based classifiers from practical applications. In this paper, we propose an embarrassingly simple and efficient zero-shot Gaussian Diffusion Classifiers (GDC) via pretrained text-to-image diffusion models and DINOv2. The proposed GDC can not only significantly surpass previous zero-shot diffusion-based classifiers by over 10 points (61.40% - 71.44%) on ImageNet, but also accelerate more than 30000 times (1000 - 0.03 seconds) classifying a single image on ImageNet. Additionally, it provides probability interpretation of the results. Our extensive experiments further demonstrate that GDC can achieve highly competitive zero-shot classification performance over various datasets and can promisingly self-improve with stronger diffusion models. To the best of our knowledge, the proposed GDC is the first zero-shot diffusionbased classifier that exhibits both competitive accuracy and practical efficiency.

A Simple and Efficient Baseline for Zero-Shot Generative Classification

TL;DR

This work addresses the inefficiency of zero-shot diffusion-based classifiers by introducing the Gaussian Diffusion Classifier (GDC), which uses a Gaussian Mixture Model over DINOv2 embeddings of diffusion-generated reference images to perform fast, probabilistic classification without training data. The Methodology section details a two-phase pipeline: a Preparation Phase that builds per-class Gaussian components from diverse reference samples, and a Classification Phase that computes Bayes posteriors p(y|e) for test embeddings, enabling rapid inference. Empirical results demonstrate substantial accuracy gains over prior diffusion-based zero-shot methods (e.g., ImageNet gains >10 points) while achieving massive speedups (≈3×10^4× faster per image), with stronger diffusion models further boosting performance. The discussion highlights practical implications, error modes, and avenues for extending Gaussian-based diffusion classification to broader tasks and future diffusion improvements.

Abstract

Large diffusion models have become mainstream generative models in both academic studies and industrial AIGC applications. Recently, a number of works further explored how to employ the power of large diffusion models as zero-shot classifiers. While recent zero-shot diffusion-based classifiers have made performance advancement on benchmark datasets, they still suffered badly from extremely slow classification speed (e.g., ~1000 seconds per classifying single image on ImageNet). The extremely slow classification speed strongly prohibits existing zero-shot diffusion-based classifiers from practical applications. In this paper, we propose an embarrassingly simple and efficient zero-shot Gaussian Diffusion Classifiers (GDC) via pretrained text-to-image diffusion models and DINOv2. The proposed GDC can not only significantly surpass previous zero-shot diffusion-based classifiers by over 10 points (61.40% - 71.44%) on ImageNet, but also accelerate more than 30000 times (1000 - 0.03 seconds) classifying a single image on ImageNet. Additionally, it provides probability interpretation of the results. Our extensive experiments further demonstrate that GDC can achieve highly competitive zero-shot classification performance over various datasets and can promisingly self-improve with stronger diffusion models. To the best of our knowledge, the proposed GDC is the first zero-shot diffusionbased classifier that exhibits both competitive accuracy and practical efficiency.

Paper Structure

This paper contains 13 sections, 16 equations, 8 figures, 5 tables, 1 algorithm.

Figures (8)

  • Figure 1: We visualize the distribution of image features for random 60 PCA components from a randomly picked class on CIFAR-10. The red curves representing fitted Gaussian distributions. The results indicate that the feature values approximate a Gaussian distribution, motivating our method.
  • Figure 2: The overview of our Gaussian Diffusion Classifiers (GDC), which consists of two phases: 1) Preparation Phase and 2) Gaussian-based Classification Phase. GDC not only can output the classification result but also the probability. For conciseness, we present only three reference images, while the influence of varying numbers of reference images is discussed in Section \ref{['sec:empirical']}. Algorithm \ref{['algo:gdc']} shows more details.
  • Figure 3: The test accuracy of GDC with various choices of $N$ on CIFAR10, CIFAR100 and ImageNet. The performance of GDC is pretty robust to the choice of $N$, when $N\geq 100$. Even if we have only one reference image per class, GDC performs still better than conventional Li's DC.
  • Figure 4: The classification accuracy of GDC monotonically increases with stronger diffusion models on ImageNet. Generation Performance: SDXL-turbo > SD 1.5 > SD 1.3 > SD 1.1.
  • Figure 5: GDC can correct some common label errors in ImageNet. We provide examples illustrating instances where the ImageNet dataset assigns incorrect labels but GDC can annotate correctly. Cleanlab northcutt2021pervasive, which focuses on correcting incorrect labels, can be accessed from this https://github.com/cleanlab/cleanlab. The results demonstrate that GDC yields equivalent correct results as Cleanlab for these hard cases and excels in distinguishing visually similar instances.
  • ...and 3 more figures