$\bf{D^3}$QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection
Yanran Zhang, Bingyao Yu, Yu Zheng, Wenzhao Zheng, Yueqi Duan, Lei Chen, Jie Zhou, Jiwen Lu
TL;DR
The paper tackles the rising challenge of detecting autoregressive-generated images by leveraging the distinctive discrete latent-space patterns of AR models. It introduces D$^3$QE, a pipeline that combines quantization-error features with a Discrete Distribution Discrepancy-Aware Transformer (D$^3$AT) and CLIP-based semantic embeddings to discriminate real from AR-generated images. A new ARForensics dataset with 7 AR models and balanced real/generated samples enables robust evaluation, where D$^3$QE outperforms state-of-the-art baselines and shows strong cross-paradigm generalization as well as resilience to perturbations. The approach offers a principled way to exploit codebook frequency statistics and quantization residuals for forensic detection, with practical impact for safeguarding authenticity in digital media.
Abstract
The emergence of visual autoregressive (AR) models has revolutionized image generation while presenting new challenges for synthetic image detection. Unlike previous GAN or diffusion-based methods, AR models generate images through discrete token prediction, exhibiting both marked improvements in image synthesis quality and unique characteristics in their vector-quantized representations. In this paper, we propose to leverage Discrete Distribution Discrepancy-aware Quantization Error (D$^3$QE) for autoregressive-generated image detection that exploits the distinctive patterns and the frequency distribution bias of the codebook existing in real and fake images. We introduce a discrete distribution discrepancy-aware transformer that integrates dynamic codebook frequency statistics into its attention mechanism, fusing semantic features and quantization error latent. To evaluate our method, we construct a comprehensive dataset termed ARForensics covering 7 mainstream visual AR models. Experiments demonstrate superior detection accuracy and strong generalization of D$^3$QE across different AR models, with robustness to real-world perturbations. Code is available at \href{https://github.com/Zhangyr2022/D3QE}{https://github.com/Zhangyr2022/D3QE}.
