ADNet: A Large-Scale and Extensible Multi-Domain Benchmark for Anomaly Detection Across 380 Real-World Categories
Hai Ling, Jia Guo, Zhulin Tao, Yunkang Cao, Donglin Di, Hongyan Xu, Xiu Su, Yang Song, Lei Fan
TL;DR
ADNet provides a large-scale, cross-domain anomaly-detection benchmark with 380 categories across five real-world domains and 196,294 RGB images, standardizing data into MVTec-style annotations and adding rich textual defect descriptions. The work reveals substantial scalability gaps for existing AD methods when moving from single-category to multi-category settings and across domains, with I-AUROC dropping from ~90% to ~78%, and introduces Dinomaly^m, a context-guided mixture-of-experts baseline that improves I-AUROC to 83.2% and P-AUROC to 93.1% on ADNet. Zero-shot cross-domain transfer remains challenging, underscoring the need for more transferable and context-aware anomaly representations, while few-shot experiments demonstrate data-efficient capabilities for multi-class models. Overall, ADNet establishes a standardized, extensible platform to push toward scalable anomaly-detection foundation models across diverse domains.
Abstract
Anomaly detection (AD) aims to identify defects using normal-only training data. Existing anomaly detection benchmarks (e.g., MVTec-AD with 15 categories) cover only a narrow range of categories, limiting the evaluation of cross-context generalization and scalability. We introduce ADNet, a large-scale, multi-domain benchmark comprising 380 categories aggregated from 49 publicly available datasets across Electronics, Industry, Agrifood, Infrastructure, and Medical domains. The benchmark includes a total of 196,294 RGB images, consisting of 116,192 normal samples for training and 80,102 test images, of which 60,311 are anomalous. All images are standardized with MVTec-style pixel-level annotations and structured text descriptions spanning both spatial and visual attributes, enabling multimodal anomaly detection tasks. Extensive experiments reveal a clear scalability challenge: existing state-of-the-art methods achieve 90.6% I-AUROC in one-for-one settings but drop to 78.5% when scaling to all 380 categories in a multi-class setting. To address this, we propose Dinomaly-m, a context-guided Mixture-of-Experts extension of Dinomaly that expands decoder capacity without increasing inference cost. It achieves 83.2% I-AUROC and 93.1% P-AUROC, demonstrating superior performance over existing approaches. ADNet is designed as a standardized and extensible benchmark, supporting the community in expanding anomaly detection datasets across diverse domains and providing a scalable foundation for future anomaly detection foundation models. Dataset: https://grainnet.github.io/ADNet
