Semantic-Deviation-Anchored Multi-Branch Fusion for Unsupervised Anomaly Detection and Localization in Unstructured Conveyor-Belt Coal Scenes

Wenping Jin; Yuyang Tang; Li Zhu

Semantic-Deviation-Anchored Multi-Branch Fusion for Unsupervised Anomaly Detection and Localization in Unstructured Conveyor-Belt Coal Scenes

Wenping Jin, Yuyang Tang, Li Zhu

TL;DR

This work tackles unsupervised foreign-object anomaly detection and pixel-level localization in unstructured coal-stream scenes, where low contrast and background variability challenge existing methods. It introduces CoalAD and a three-branch fusion framework that jointly leverages object-level semantic composition, semantic attribution of global deviation, and texture-based local patterns to produce robust detection and precise localization. Through rigorous experiments and ablations, the approach demonstrates state-of-the-art performance on CoalAD, highlighting the benefit of integrating multi-level semantic cues with fine-grained texture evidence in highly perturbed industrial environments. The methodology has practical impact for safe, automated mining operations by enabling reliable anomaly perception and localization without requiring anomaly labels during training.

Abstract

Reliable foreign-object anomaly detection and pixel-level localization in conveyor-belt coal scenes are essential for safe and intelligent mining operations. This task is particularly challenging due to the highly unstructured environment: coal and gangue are randomly piled, backgrounds are complex and variable, and foreign objects often exhibit low contrast, deformation, occlusion, resulting in coupling with their surroundings. These characteristics weaken the stability and regularity assumptions that many anomaly detection methods rely on in structured industrial settings, leading to notable performance degradation. To support evaluation and comparison in this setting, we construct \textbf{CoalAD}, a benchmark for unsupervised foreign-object anomaly detection with pixel-level localization in coal-stream scenes. We further propose a complementary-cue collaborative perception framework that extracts and fuses complementary anomaly evidence from three perspectives: object-level semantic composition modeling, semantic-attribution-based global deviation analysis, and fine-grained texture matching. The fused outputs provide robust image-level anomaly scoring and accurate pixel-level localization. Experiments on CoalAD demonstrate that our method outperforms widely used baselines across the evaluated image-level and pixel-level metrics, and ablation studies validate the contribution of each component. The code is available at https://github.com/xjpp2016/USAD.

Semantic-Deviation-Anchored Multi-Branch Fusion for Unsupervised Anomaly Detection and Localization in Unstructured Conveyor-Belt Coal Scenes

TL;DR

Abstract

Paper Structure (44 sections, 16 equations, 13 figures, 3 tables)

This paper contains 44 sections, 16 equations, 13 figures, 3 tables.

Introduction
Related Works
Industrial Visual Anomaly Detection and Localization Datasets
Metric-based Anomaly Detection and Localization in Feature Space
Reconstruction/Generation-based and Distillation-based Methods
Dataset
Data Source and Curation
Task Setting and Characteristics
Annotation and Ground-Truth Preparation
Data Splits and Statistics
Method
Overview
Object-level Branch
Branch Objective and Design Rationale
Feature Extractor: Frozen DINOv2-ViT
...and 29 more sections

Figures (13)

Figure 1: Unstructured characteristics and low-contrast anomaly examples in conveyor-belt coal-stream scenes. Normal samples are shown on the left (a--d) and anomalous samples on the right (e--h). Normal scenes exhibit randomly piled and intermixed coal and gangue with irregular sizes, shapes, and spatial distributions; meanwhile, belt wear patterns and coal dust introduce complex backgrounds. In anomalous scenes, foreign objects (e.g., wood, nets and/or ropes, and bags) are tightly coupled with the coal stream and thus hard to distinguish due to low contrast, occlusion, and discoloration, often yielding blurred boundaries.
Figure 2: Example foreign objects in the CoalAD dataset. The left six samples show wooden objects, while the right six depict other types (e.g., nets and metal parts). Notably, even within the same material category (wood), the appearances vary substantially due to compression, staining, and surface contamination.
Figure 3: Pipeline of the object-level branch. From normal data, frozen DINOv2 features are used to learn a global CLS distribution and two semantic anchors for the normal foreground (coal/gangue) and background (conveyor belt). At test time, global semantic deviation yields an image-level anomaly score, and per-patch deviation to these normal anchors yields pixel-level anomaly scores for localization.
Figure 4: Pipeline of the semantic attribution branch. The branch constructs a normal semantic baseline from mean-pooled patch features. At inference, it attributes the global semantic deviation of a test image to individual patches via closed-form ablation, producing an attribution-based anomaly score map for localization.
Figure 5: Pipeline of the texture branch. Based on the PatchCore framework, we build a memory bank of normal local features that capture fine-grained texture/structure patterns. At inference, nearest-neighbor distances to the normal feature bank provide patch-wise anomaly scores, which are aggregated into an image-level score and a pixel-level anomaly score map.
...and 8 more figures

Semantic-Deviation-Anchored Multi-Branch Fusion for Unsupervised Anomaly Detection and Localization in Unstructured Conveyor-Belt Coal Scenes

TL;DR

Abstract

Semantic-Deviation-Anchored Multi-Branch Fusion for Unsupervised Anomaly Detection and Localization in Unstructured Conveyor-Belt Coal Scenes

Authors

TL;DR

Abstract

Table of Contents

Figures (13)