Table of Contents
Fetching ...

Advancing Image Classification with Discrete Diffusion Classification Modeling

Omer Belhasin, Shelly Golan, Ran El-Yaniv, Michael Elad

TL;DR

DiDiCM introduces a discrete diffusion framework for image classification that directly models the posterior $P(c|y)$ of class labels given a degraded input. By formulating forward and reverse diffusion processes in the discrete label space and training a score-based model to approximate the Concrete Score, DiDiCM achieves robust accuracy under high uncertainty with only a few diffusion steps. The work also presents DiDiRN, a ResNet-based architecture augmented for diffusion-based classification, and demonstrates substantial gains over standard classifiers on ImageNet across varying corruption and data-scarcity conditions. Two inference strategies, DiDiCM-CP and DiDiCM-CL, offer a trade-off between computation and memory while preserving performance. Overall, the approach provides a principled, scalable method to propagate uncertainty through the classification process and improve reliability in challenging real-world settings.

Abstract

Image classification is a well-studied task in computer vision, and yet it remains challenging under high-uncertainty conditions, such as when input images are corrupted or training data are limited. Conventional classification approaches typically train models to directly predict class labels from input images, but this might lead to suboptimal performance in such scenarios. To address this issue, we propose Discrete Diffusion Classification Modeling (DiDiCM), a novel framework that leverages a diffusion-based procedure to model the posterior distribution of class labels conditioned on the input image. DiDiCM supports diffusion-based predictions either on class probabilities or on discrete class labels, providing flexibility in computation and memory trade-offs. We conduct a comprehensive empirical study demonstrating the superior performance of DiDiCM over standard classifiers, showing that a few diffusion iterations achieve higher classification accuracy on the ImageNet dataset compared to baselines, with accuracy gains increasing as the task becomes more challenging. We release our code at https://github.com/omerb01/didicm .

Advancing Image Classification with Discrete Diffusion Classification Modeling

TL;DR

DiDiCM introduces a discrete diffusion framework for image classification that directly models the posterior of class labels given a degraded input. By formulating forward and reverse diffusion processes in the discrete label space and training a score-based model to approximate the Concrete Score, DiDiCM achieves robust accuracy under high uncertainty with only a few diffusion steps. The work also presents DiDiRN, a ResNet-based architecture augmented for diffusion-based classification, and demonstrates substantial gains over standard classifiers on ImageNet across varying corruption and data-scarcity conditions. Two inference strategies, DiDiCM-CP and DiDiCM-CL, offer a trade-off between computation and memory while preserving performance. Overall, the approach provides a principled, scalable method to propagate uncertainty through the classification process and improve reliability in challenging real-world settings.

Abstract

Image classification is a well-studied task in computer vision, and yet it remains challenging under high-uncertainty conditions, such as when input images are corrupted or training data are limited. Conventional classification approaches typically train models to directly predict class labels from input images, but this might lead to suboptimal performance in such scenarios. To address this issue, we propose Discrete Diffusion Classification Modeling (DiDiCM), a novel framework that leverages a diffusion-based procedure to model the posterior distribution of class labels conditioned on the input image. DiDiCM supports diffusion-based predictions either on class probabilities or on discrete class labels, providing flexibility in computation and memory trade-offs. We conduct a comprehensive empirical study demonstrating the superior performance of DiDiCM over standard classifiers, showing that a few diffusion iterations achieve higher classification accuracy on the ImageNet dataset compared to baselines, with accuracy gains increasing as the task becomes more challenging. We release our code at https://github.com/omerb01/didicm .

Paper Structure

This paper contains 23 sections, 2 theorems, 28 equations, 7 figures, 5 tables, 3 algorithms.

Key Result

Theorem 1

Let $q_t \in \mathbb{R}^K$ satisfy Equation eq:forward_appendix and let $R = U \Lambda U^{-1}$ be the eigendecomposition of $R$, where $U \in \mathbb{R}^{K \times K}$ denotes the matrix of eigenvectors and $\Lambda \in \mathbb{R}^{K \times K}$ is the diagonal matrix of eigenvalues. Define $\overline

Figures (7)

  • Figure 1: ImageNet Top-5 Accuracy: DiDiCM vs. standard classifiers. DiDiRN-50 (comparable to ResNet-50) and ResNet-50 are both trained using the state-of-the-art recipe wightman2021resnet. DiDiCM shows superior top-5 accuracy across all uncertainty settings.
  • Figure 2: An illustration of our DiDiCM-CP showing the evolution of the top-5 class label probabilities over time for three images, demonstrating different classification challenges.
  • Figure 3: The Discrete Diffusion Residual Network (DiDiRN) architecture. DiDiRN preserves the core image-processing components of ResNet while adding conditioning modules to support the diffusion process of DiDiCM. Original ResNet modules are shown in blue, and the newly introduced components in green.
  • Figure 4: (a) DiDiCM (8 steps) vs. standard classifiers under varying uncertainty (see Appendix \ref{['app:training']} for augmentation policy). (b) NFEs vs. top-1 accuracy for DiDiCM-CP, DiDiCM-CL, and the standard classifier at resolution 56 using the full training set. Numbers indicate the sample count used. Red markers denote the best-performing DiDiCM-CP and DiDiCM-CL results.
  • Figure 5: ImageNet Top-1 and Top-5 accuracy gains across model sizes. Comparison of DiDiCM-CP (8-step) and standard classifiers at 56 input resolution, trained on 25% of the data.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Theorem 1: Closed-Form Solution for the Discrete Markovian Process
  • proof
  • Theorem 2: Approximated Solution for the Discrete Markovian Process
  • proof