Table of Contents
Fetching ...

DiffYOLO: Object Detection for Anti-Noise via YOLO and Diffusion Models

Yichen Liu, Huajian Zhang, Daqing Gao

TL;DR

This work tackles robust object detection under degraded image quality by augmenting a YOLOv5 detector with diffusion-model features (DiffYOLO). By extracting multi-level representations from a pretrained denoising diffusion probabilistic model and injecting them into the YOLO neck, the approach enables fine-tuning instead of full retraining and improves noise resilience on low-quality inputs while preserving performance on clean data. Experiments on the DeepPCB dataset show DiffYOLO outperforming the baseline under Gaussian, Salt & Pepper, and Poisson noise, with added benefits on high-quality images. The results suggest diffusion-inspired feature augmentation as a practical, resource-conscious path to robust industrial detection, though computational overhead and data drift remain important considerations for deployment.

Abstract

Object detection models represented by YOLO series have been widely used and have achieved great results on the high quality datasets, but not all the working conditions are ideal. To settle down the problem of locating targets on low quality datasets, the existing methods either train a new object detection network, or need a large collection of low-quality datasets to train. However, we propose a framework in this paper and apply it on the YOLO models called DiffYOLO. Specifically, we extract feature maps from the denoising diffusion probabilistic models to enhance the well-trained models, which allows us fine-tune YOLO on high-quality datasets and test on low-quality datasets. The results proved this framework can not only prove the performance on noisy datasets, but also prove the detection results on high-quality test datasets. We will supplement more experiments later (with various datasets and network architectures).

DiffYOLO: Object Detection for Anti-Noise via YOLO and Diffusion Models

TL;DR

This work tackles robust object detection under degraded image quality by augmenting a YOLOv5 detector with diffusion-model features (DiffYOLO). By extracting multi-level representations from a pretrained denoising diffusion probabilistic model and injecting them into the YOLO neck, the approach enables fine-tuning instead of full retraining and improves noise resilience on low-quality inputs while preserving performance on clean data. Experiments on the DeepPCB dataset show DiffYOLO outperforming the baseline under Gaussian, Salt & Pepper, and Poisson noise, with added benefits on high-quality images. The results suggest diffusion-inspired feature augmentation as a practical, resource-conscious path to robust industrial detection, though computational overhead and data drift remain important considerations for deployment.

Abstract

Object detection models represented by YOLO series have been widely used and have achieved great results on the high quality datasets, but not all the working conditions are ideal. To settle down the problem of locating targets on low quality datasets, the existing methods either train a new object detection network, or need a large collection of low-quality datasets to train. However, we propose a framework in this paper and apply it on the YOLO models called DiffYOLO. Specifically, we extract feature maps from the denoising diffusion probabilistic models to enhance the well-trained models, which allows us fine-tune YOLO on high-quality datasets and test on low-quality datasets. The results proved this framework can not only prove the performance on noisy datasets, but also prove the detection results on high-quality test datasets. We will supplement more experiments later (with various datasets and network architectures).
Paper Structure (10 sections, 2 equations, 2 figures, 4 tables)

This paper contains 10 sections, 2 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: (a)Defect detection results by YOLOv5 on the image with noise; (b)Defect detection results by DiffYOLO on the image with noise
  • Figure 2: The overall framework of DiffYolo.