Semantic-Deviation-Anchored Multi-Branch Fusion for Unsupervised Anomaly Detection and Localization in Unstructured Conveyor-Belt Coal Scenes
Wenping Jin, Yuyang Tang, Li Zhu
TL;DR
This work tackles unsupervised foreign-object anomaly detection and pixel-level localization in unstructured coal-stream scenes, where low contrast and background variability challenge existing methods. It introduces CoalAD and a three-branch fusion framework that jointly leverages object-level semantic composition, semantic attribution of global deviation, and texture-based local patterns to produce robust detection and precise localization. Through rigorous experiments and ablations, the approach demonstrates state-of-the-art performance on CoalAD, highlighting the benefit of integrating multi-level semantic cues with fine-grained texture evidence in highly perturbed industrial environments. The methodology has practical impact for safe, automated mining operations by enabling reliable anomaly perception and localization without requiring anomaly labels during training.
Abstract
Reliable foreign-object anomaly detection and pixel-level localization in conveyor-belt coal scenes are essential for safe and intelligent mining operations. This task is particularly challenging due to the highly unstructured environment: coal and gangue are randomly piled, backgrounds are complex and variable, and foreign objects often exhibit low contrast, deformation, occlusion, resulting in coupling with their surroundings. These characteristics weaken the stability and regularity assumptions that many anomaly detection methods rely on in structured industrial settings, leading to notable performance degradation. To support evaluation and comparison in this setting, we construct \textbf{CoalAD}, a benchmark for unsupervised foreign-object anomaly detection with pixel-level localization in coal-stream scenes. We further propose a complementary-cue collaborative perception framework that extracts and fuses complementary anomaly evidence from three perspectives: object-level semantic composition modeling, semantic-attribution-based global deviation analysis, and fine-grained texture matching. The fused outputs provide robust image-level anomaly scoring and accurate pixel-level localization. Experiments on CoalAD demonstrate that our method outperforms widely used baselines across the evaluated image-level and pixel-level metrics, and ablation studies validate the contribution of each component. The code is available at https://github.com/xjpp2016/USAD.
