Cross-Modal Mapping and Dual-Branch Reconstruction for 2D-3D Multimodal Industrial Anomaly Detection

Radia Daci; Vito Renò; Cosimo Patruno; Angelo Cardellicchio; Abdelmalik Taleb-Ahmed; Marco Leo; Cosimo Distante

Cross-Modal Mapping and Dual-Branch Reconstruction for 2D-3D Multimodal Industrial Anomaly Detection

Radia Daci, Vito Renò, Cosimo Patruno, Angelo Cardellicchio, Abdelmalik Taleb-Ahmed, Marco Leo, Cosimo Distante

TL;DR

CMDR-IAD is introduced, a lightweight and modality-flexible unsupervised framework for reliable anomaly detection in 2D+3D multimodal as well as single-modality as well as single-modality (2D-only or 3D-only) settings, demonstrating strong effectiveness under practical industrial conditions.

Abstract

Multimodal industrial anomaly detection benefits from integrating RGB appearance with 3D surface geometry, yet existing \emph{unsupervised} approaches commonly rely on memory banks, teacher-student architectures, or fragile fusion schemes, limiting robustness under noisy depth, weak texture, or missing modalities. This paper introduces \textbf{CMDR-IAD}, a lightweight and modality-flexible unsupervised framework for reliable anomaly detection in 2D+3D multimodal as well as single-modality (2D-only or 3D-only) settings. \textbf{CMDR-IAD} combines bidirectional 2D$\leftrightarrow$3D cross-modal mapping to model appearance-geometry consistency with dual-branch reconstruction that independently captures normal texture and geometric structure. A two-part fusion strategy integrates these cues: a reliability-gated mapping anomaly highlights spatially consistent texture-geometry discrepancies, while a confidence-weighted reconstruction anomaly adaptively balances appearance and geometric deviations, yielding stable and precise anomaly localization even in depth-sparse or low-texture regions. On the MVTec 3D-AD benchmark, CMDR-IAD achieves state-of-the-art performance while operating without memory banks, reaching 97.3\% image-level AUROC (I-AUROC), 99.6\% pixel-level AUROC (P-AUROC), and 97.6\% AUPRO. On a real-world polyurethane cutting dataset, the 3D-only variant attains 92.6\% I-AUROC and 92.5\% P-AUROC, demonstrating strong effectiveness under practical industrial conditions. These results highlight the framework's robustness, modality flexibility, and the effectiveness of the proposed fusion strategies for industrial visual inspection. Our source code is available at https://github.com/ECGAI-Research/CMDR-IAD/

Cross-Modal Mapping and Dual-Branch Reconstruction for 2D-3D Multimodal Industrial Anomaly Detection

TL;DR

Abstract

3D cross-modal mapping to model appearance-geometry consistency with dual-branch reconstruction that independently captures normal texture and geometric structure. A two-part fusion strategy integrates these cues: a reliability-gated mapping anomaly highlights spatially consistent texture-geometry discrepancies, while a confidence-weighted reconstruction anomaly adaptively balances appearance and geometric deviations, yielding stable and precise anomaly localization even in depth-sparse or low-texture regions. On the MVTec 3D-AD benchmark, CMDR-IAD achieves state-of-the-art performance while operating without memory banks, reaching 97.3\% image-level AUROC (I-AUROC), 99.6\% pixel-level AUROC (P-AUROC), and 97.6\% AUPRO. On a real-world polyurethane cutting dataset, the 3D-only variant attains 92.6\% I-AUROC and 92.5\% P-AUROC, demonstrating strong effectiveness under practical industrial conditions. These results highlight the framework's robustness, modality flexibility, and the effectiveness of the proposed fusion strategies for industrial visual inspection. Our source code is available at https://github.com/ECGAI-Research/CMDR-IAD/

Paper Structure (33 sections, 38 equations, 10 figures, 10 tables)

This paper contains 33 sections, 38 equations, 10 figures, 10 tables.

Introduction
Related Work
Unsupervised 2D Industrial Anomaly Detection
3D-Only Industrial Anomaly Detection
Multimodal RGB--3D Industrial Anomaly Detection
Methodology
Multimodal Feature Extractors
Cross-Modal Mapping Networks
Dual-Branch Reconstruction Modules
Training Objective
Anomaly Scoring and Reliability-Aware Multimodal Fusion
3D-Only Inference Mode for Polyurethane Cuts
Datasets and Preprocessing
MVTec 3D--AD Setup
Polyurethane 3D Dataset and Preprocessing Pipeline
...and 18 more sections

Figures (10)

Figure 1: Performance, speed and memory occupancy of Multimodal Anomaly Detection methods. The chart reports defect segmentation performance (AUPRO@30%) vs inference speed.
Figure 2: Overview of the proposed CMDR--IAD framework. The method integrates RGB images and point clouds through multimodal feature extraction, cross-modal mapping, and dual-branch 2D--3D reconstruction. Cross-modal discrepancy maps and reconstruction errors are fused into the final anomaly map $\Psi$, enabling robust detection of both appearance- and geometry-related defects.
Figure 3: Overview of the 2D Reconstruction Branch. Projected features are enhanced using a sparse-attention block and an MLP refinement module with residual connections, then decoded through ConvTranspose2D layers to generate the reconstructed 2D feature map.
Figure 4: Overview of the 3D Reconstruction Branch. After projection and ConvTranspose1D upsampling, a gating block refines the features through sequential 1D convolutions and a sigmoid mask, which is multiplied with a residual pathway to produce the final 3D reconstruction.
Figure 5: Qualitative anomaly localization results on the MVTec 3D-AD dataset using 2D-only features. From top to bottom, the rows show the RGB image, point cloud, ground truth, AST (2D), M3DM (2D), MTSJM (2D), and the proposed CMDR-IAD (2D). Warmer colors indicate higher anomaly scores, while cooler colors denote normal regions.
...and 5 more figures

Cross-Modal Mapping and Dual-Branch Reconstruction for 2D-3D Multimodal Industrial Anomaly Detection

TL;DR

Abstract

Cross-Modal Mapping and Dual-Branch Reconstruction for 2D-3D Multimodal Industrial Anomaly Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (10)