A Simple and Efficient Baseline for Zero-Shot Generative Classification

Zipeng Qi; Buhua Liu; Shiyan Zhang; Bao Li; Zhiqiang Xu; Haoyi Xiong; Zeke Xie

A Simple and Efficient Baseline for Zero-Shot Generative Classification

Zipeng Qi, Buhua Liu, Shiyan Zhang, Bao Li, Zhiqiang Xu, Haoyi Xiong, Zeke Xie

TL;DR

This work addresses the inefficiency of zero-shot diffusion-based classifiers by introducing the Gaussian Diffusion Classifier (GDC), which uses a Gaussian Mixture Model over DINOv2 embeddings of diffusion-generated reference images to perform fast, probabilistic classification without training data. The Methodology section details a two-phase pipeline: a Preparation Phase that builds per-class Gaussian components from diverse reference samples, and a Classification Phase that computes Bayes posteriors p(y|e) for test embeddings, enabling rapid inference. Empirical results demonstrate substantial accuracy gains over prior diffusion-based zero-shot methods (e.g., ImageNet gains >10 points) while achieving massive speedups (≈3×10^4× faster per image), with stronger diffusion models further boosting performance. The discussion highlights practical implications, error modes, and avenues for extending Gaussian-based diffusion classification to broader tasks and future diffusion improvements.

Abstract

Large diffusion models have become mainstream generative models in both academic studies and industrial AIGC applications. Recently, a number of works further explored how to employ the power of large diffusion models as zero-shot classifiers. While recent zero-shot diffusion-based classifiers have made performance advancement on benchmark datasets, they still suffered badly from extremely slow classification speed (e.g., ~1000 seconds per classifying single image on ImageNet). The extremely slow classification speed strongly prohibits existing zero-shot diffusion-based classifiers from practical applications. In this paper, we propose an embarrassingly simple and efficient zero-shot Gaussian Diffusion Classifiers (GDC) via pretrained text-to-image diffusion models and DINOv2. The proposed GDC can not only significantly surpass previous zero-shot diffusion-based classifiers by over 10 points (61.40% - 71.44%) on ImageNet, but also accelerate more than 30000 times (1000 - 0.03 seconds) classifying a single image on ImageNet. Additionally, it provides probability interpretation of the results. Our extensive experiments further demonstrate that GDC can achieve highly competitive zero-shot classification performance over various datasets and can promisingly self-improve with stronger diffusion models. To the best of our knowledge, the proposed GDC is the first zero-shot diffusionbased classifier that exhibits both competitive accuracy and practical efficiency.

A Simple and Efficient Baseline for Zero-Shot Generative Classification

TL;DR

Abstract

A Simple and Efficient Baseline for Zero-Shot Generative Classification

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)