Revisiting Multimodal Fusion for 3D Anomaly Detection from an Architectural Perspective
Kaifang Long, Guoyang Xie, Lianbo Ma, Jiaqi Liu, Zhichao Lu
TL;DR
The paper investigates how multimodal fusion topology affects 3D anomaly detection, showing that intra- and inter-module fusion designs significantly influence performance. It introduces 3D-ADNAS, a two-level NAS framework that jointly searches modality-specific modules and fusion strategies to optimize 3D-AD, achieving improvements in accuracy, speed, and memory across Eyecandies and MVTec 3D-AD, and showing potential for few-shot tasks. The approach combines theoretical DST-based insights with empirical NAS-driven architecture search to reveal a practical path toward architecture-aware 3D-AD. This work highlights the practical impact of fusion architecture design for industrial-quality 3D anomaly detection and offers a scalable methodology for deploying efficient multimodal systems.
Abstract
Existing efforts to boost multimodal fusion of 3D anomaly detection (3D-AD) primarily concentrate on devising more effective multimodal fusion strategies. However, little attention was devoted to analyzing the role of multimodal fusion architecture (topology) design in contributing to 3D-AD. In this paper, we aim to bridge this gap and present a systematic study on the impact of multimodal fusion architecture design on 3D-AD. This work considers the multimodal fusion architecture design at the intra-module fusion level, i.e., independent modality-specific modules, involving early, middle or late multimodal features with specific fusion operations, and also at the inter-module fusion level, i.e., the strategies to fuse those modules. In both cases, we first derive insights through theoretically and experimentally exploring how architectural designs influence 3D-AD. Then, we extend SOTA neural architecture search (NAS) paradigm and propose 3D-ADNAS to simultaneously search across multimodal fusion strategies and modality-specific modules for the first time.Extensive experiments show that 3D-ADNAS obtains consistent improvements in 3D-AD across various model capacities in terms of accuracy, frame rate, and memory usage, and it exhibits great potential in dealing with few-shot 3D-AD tasks.
