Structural Teacher-Student Normality Learning for Multi-Class Anomaly Detection and Localization
Hanqiu Deng, Xingyu Li
TL;DR
This work identifies cross-class interference as a key failure mode of teacher-student distillation when applied to multi-class anomaly detection. It introduces Structural Teacher-Student Normality Learning (SNL), which combines structural distillation (spatial-channel alignment and affinity-based constraints) with a Central Residual Aggregation Module to learn compact normal representations. On the MVTecAD and VisA benchmarks, SNL substantially improves both anomaly detection and localization compared with baseline FD/RD methods and state-of-the-art unified models. The approach offers strong generalization to existing teacher-student networks and provides interpretable structural cues through affinity and residual-normality learning, enabling robust multi-class anomaly coverage in practical settings.
Abstract
Visual anomaly detection is a challenging open-set task aimed at identifying unknown anomalous patterns while modeling normal data. The knowledge distillation paradigm has shown remarkable performance in one-class anomaly detection by leveraging teacher-student network feature comparisons. However, extending this paradigm to multi-class anomaly detection introduces novel scalability challenges. In this study, we address the significant performance degradation observed in previous teacher-student models when applied to multi-class anomaly detection, which we identify as resulting from cross-class interference. To tackle this issue, we introduce a novel approach known as Structural Teacher-Student Normality Learning (SNL): (1) We propose spatial-channel distillation and intra-&inter-affinity distillation techniques to measure structural distance between the teacher and student networks. (2) We introduce a central residual aggregation module (CRAM) to encapsulate the normal representation space of the student network. We evaluate our proposed approach on two anomaly detection datasets, MVTecAD and VisA. Our method surpasses the state-of-the-art distillation-based algorithms by a significant margin of 3.9% and 1.5% on MVTecAD and 1.2% and 2.5% on VisA in the multi-class anomaly detection and localization tasks, respectively. Furthermore, our algorithm outperforms the current state-of-the-art unified models on both MVTecAD and VisA.
