Table of Contents
Fetching ...

DINO-Detect: A Simple yet Effective Framework for Blur-Robust AI-Generated Image Detection

Jialiang Shen, Jiyang Zheng, Yunqi Xue, Huajie Chen, Yu Yao, Hui Kang, Ruiqi Liu, Helin Gong, Yang Yang, Dadong Wang, Tongliang Liu

TL;DR

This work tackles the brittleness of AI-generated image detectors under motion blur by introducing DINO-Detect, a blur-robust detector built on a frozen DINOv3 teacher and a sharp-to-blur distillation framework. The method aligns sharp and blurred representations through a combination of feature, logit, and ordinal contrastive losses, enabled by a carefully designed blur model and paired sampling. A new AIGI-Blur Benchmark evaluates robustness to realistic blur, and extensive experiments show state-of-the-art performance on both blurred and clean data, plus strong generalization to unseen generators and real-world degradations. The approach offers a practical route toward reliable media forensics in real-world scenarios, with broad implications for authentication and safety in digital content ecosystems.

Abstract

With growing concerns over image authenticity and digital safety, the field of AI-generated image (AIGI) detection has progressed rapidly. Yet, most AIGI detectors still struggle under real-world degradations, particularly motion blur, which frequently occurs in handheld photography, fast motion, and compressed video. Such blur distorts fine textures and suppresses high-frequency artifacts, causing severe performance drops in real-world settings. We address this limitation with a blur-robust AIGI detection framework based on teacher-student knowledge distillation. A high-capacity teacher (DINOv3), trained on clean (i.e., sharp) images, provides stable and semantically rich representations that serve as a reference for learning. By freezing the teacher to maintain its generalization ability, we distill its feature and logit responses from sharp images to a student trained on blurred counterparts, enabling the student to produce consistent representations under motion degradation. Extensive experiments benchmarks show that our method achieves state-of-the-art performance under both motion-blurred and clean conditions, demonstrating improved generalization and real-world applicability. Source codes will be released at: https://github.com/JiaLiangShen/Dino-Detect-for-blur-robust-AIGC-Detection.

DINO-Detect: A Simple yet Effective Framework for Blur-Robust AI-Generated Image Detection

TL;DR

This work tackles the brittleness of AI-generated image detectors under motion blur by introducing DINO-Detect, a blur-robust detector built on a frozen DINOv3 teacher and a sharp-to-blur distillation framework. The method aligns sharp and blurred representations through a combination of feature, logit, and ordinal contrastive losses, enabled by a carefully designed blur model and paired sampling. A new AIGI-Blur Benchmark evaluates robustness to realistic blur, and extensive experiments show state-of-the-art performance on both blurred and clean data, plus strong generalization to unseen generators and real-world degradations. The approach offers a practical route toward reliable media forensics in real-world scenarios, with broad implications for authentication and safety in digital content ecosystems.

Abstract

With growing concerns over image authenticity and digital safety, the field of AI-generated image (AIGI) detection has progressed rapidly. Yet, most AIGI detectors still struggle under real-world degradations, particularly motion blur, which frequently occurs in handheld photography, fast motion, and compressed video. Such blur distorts fine textures and suppresses high-frequency artifacts, causing severe performance drops in real-world settings. We address this limitation with a blur-robust AIGI detection framework based on teacher-student knowledge distillation. A high-capacity teacher (DINOv3), trained on clean (i.e., sharp) images, provides stable and semantically rich representations that serve as a reference for learning. By freezing the teacher to maintain its generalization ability, we distill its feature and logit responses from sharp images to a student trained on blurred counterparts, enabling the student to produce consistent representations under motion degradation. Extensive experiments benchmarks show that our method achieves state-of-the-art performance under both motion-blurred and clean conditions, demonstrating improved generalization and real-world applicability. Source codes will be released at: https://github.com/JiaLiangShen/Dino-Detect-for-blur-robust-AIGC-Detection.

Paper Structure

This paper contains 22 sections, 8 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Motion blur suppresses discriminative high-frequency artifacts, leading to detector failure. In sharp images, AI-generated content contains excess mid–high-frequency energy from upsampling discontinuities, enabling artifact-based detectors to distinguish real from fake. When motion blur acts as a strong low-pass filter, these spectral cues vanish, collapsing the decision boundary and causing misclassification of both blurred reals and fakes.
  • Figure 2: Illustration of our proposed DINO-Detect framework (Left) and the AIGI-blur evaluation benchmark (Right).
  • Figure 3: Impact of motion blur on model attention patterns. The plots show the average similarity between attention maps of clean and motion-blurred images across varying blur kernel sizes.
  • Figure 4: Patch-Level Structural Consistency under Motion Blur. We visualize the patch-wise similarity matrices of blurred fake (top row) and real (bottom row) images extracted by three models: CLIP-ViT radford2021learning, UnivFD ojha2023towards, and our DINO-Detect.
  • Figure :