DiffYOLO: Object Detection for Anti-Noise via YOLO and Diffusion Models
Yichen Liu, Huajian Zhang, Daqing Gao
TL;DR
This work tackles robust object detection under degraded image quality by augmenting a YOLOv5 detector with diffusion-model features (DiffYOLO). By extracting multi-level representations from a pretrained denoising diffusion probabilistic model and injecting them into the YOLO neck, the approach enables fine-tuning instead of full retraining and improves noise resilience on low-quality inputs while preserving performance on clean data. Experiments on the DeepPCB dataset show DiffYOLO outperforming the baseline under Gaussian, Salt & Pepper, and Poisson noise, with added benefits on high-quality images. The results suggest diffusion-inspired feature augmentation as a practical, resource-conscious path to robust industrial detection, though computational overhead and data drift remain important considerations for deployment.
Abstract
Object detection models represented by YOLO series have been widely used and have achieved great results on the high quality datasets, but not all the working conditions are ideal. To settle down the problem of locating targets on low quality datasets, the existing methods either train a new object detection network, or need a large collection of low-quality datasets to train. However, we propose a framework in this paper and apply it on the YOLO models called DiffYOLO. Specifically, we extract feature maps from the denoising diffusion probabilistic models to enhance the well-trained models, which allows us fine-tune YOLO on high-quality datasets and test on low-quality datasets. The results proved this framework can not only prove the performance on noisy datasets, but also prove the detection results on high-quality test datasets. We will supplement more experiments later (with various datasets and network architectures).
