Advancing Image Classification with Discrete Diffusion Classification Modeling
Omer Belhasin, Shelly Golan, Ran El-Yaniv, Michael Elad
TL;DR
DiDiCM introduces a discrete diffusion framework for image classification that directly models the posterior $P(c|y)$ of class labels given a degraded input. By formulating forward and reverse diffusion processes in the discrete label space and training a score-based model to approximate the Concrete Score, DiDiCM achieves robust accuracy under high uncertainty with only a few diffusion steps. The work also presents DiDiRN, a ResNet-based architecture augmented for diffusion-based classification, and demonstrates substantial gains over standard classifiers on ImageNet across varying corruption and data-scarcity conditions. Two inference strategies, DiDiCM-CP and DiDiCM-CL, offer a trade-off between computation and memory while preserving performance. Overall, the approach provides a principled, scalable method to propagate uncertainty through the classification process and improve reliability in challenging real-world settings.
Abstract
Image classification is a well-studied task in computer vision, and yet it remains challenging under high-uncertainty conditions, such as when input images are corrupted or training data are limited. Conventional classification approaches typically train models to directly predict class labels from input images, but this might lead to suboptimal performance in such scenarios. To address this issue, we propose Discrete Diffusion Classification Modeling (DiDiCM), a novel framework that leverages a diffusion-based procedure to model the posterior distribution of class labels conditioned on the input image. DiDiCM supports diffusion-based predictions either on class probabilities or on discrete class labels, providing flexibility in computation and memory trade-offs. We conduct a comprehensive empirical study demonstrating the superior performance of DiDiCM over standard classifiers, showing that a few diffusion iterations achieve higher classification accuracy on the ImageNet dataset compared to baselines, with accuracy gains increasing as the task becomes more challenging. We release our code at https://github.com/omerb01/didicm .
