Looking 3D: Anomaly Detection with 2D-3D Alignment

Ankan Bhunia; Changjian Li; Hakan Bilen

Looking 3D: Anomaly Detection with 2D-3D Alignment

Ankan Bhunia, Changjian Li, Hakan Bilen

TL;DR

A novel transformer-based approach is proposed that explicitly learns the correspondence between the query image and reference 3D shape via feature alignment and leverages a customized attention mechanism for anomaly detection.

Abstract

Automatic anomaly detection based on visual cues holds practical significance in various domains, such as manufacturing and product quality assessment. This paper introduces a new conditional anomaly detection problem, which involves identifying anomalies in a query image by comparing it to a reference shape. To address this challenge, we have created a large dataset, BrokenChairs-180K, consisting of around 180K images, with diverse anomalies, geometries, and textures paired with 8,143 reference 3D shapes. To tackle this task, we have proposed a novel transformer-based approach that explicitly learns the correspondence between the query image and reference 3D shape via feature alignment and leverages a customized attention mechanism for anomaly detection. Our approach has been rigorously evaluated through comprehensive experiments, serving as a benchmark for future research in this domain.

Looking 3D: Anomaly Detection with 2D-3D Alignment

TL;DR

Abstract

Paper Structure (14 sections, 8 equations, 8 figures, 3 tables)

This paper contains 14 sections, 8 equations, 8 figures, 3 tables.

Introduction
Related Work
Building BrokenChairs-180K Dataset
Creating Anomaly from 3D Objects
Photo-realistic Rendering of 3D objects
Proposed Method
Overview
Correspondence Matching Transformer
3D Positional Encoding (3DPE).
Correspondence-Guided Attention (CGA).
View-Agnostic Local Feature Alignment
Experiments
Results
Conclusion

Figures (8)

Figure 1: We propose a new conditional AD task that aims to identify and localize anomalies in a query image by comparing it to a reference shape. The anomalous region is shown in a yellow bounding box. For instance, the right leg of the blue sofa is rectangular unlike the cylindrical one in its reference shape.
Figure 2: Example anomaly instances from our BrokenChairs-180K dataset. Our dataset consists of around 100$K$ anomaly images. In the top row, some example anomaly instances are shown, along with the ground truth bounding boxes and segmentation masks in the bottom row. The red mask is used to indicate parts with anomalies, and a green contour line highlights their respective regions prior to applying any anomaly, and the bounding box is shown as blue rectangular boxes. (figure best viewed in zoom)
Figure 3: Overall architecture of our proposed CMT framework for conditional AD task. Our CMT takes the following inputs: the query image $\bm{q}$ and the rendered multi-view images $\{\bm{v}_{n}\}_{n=1}^N$. We extract query features $\bm{f}^q$ and multi-view features $\bm{F}^v$ using the encoder $\varphi$. Additionally, we use 3D positional encoding (3DPE) to obtain 3D positional features $\bm{P}^v$ for the multi-view images. Next, $\bm{F}^v$ and $\bm{P}^v$ are concatenated and fed to the correspondence-guided attention (CGA) network, denoted as $\phi$, along with the query features $\bm{f}^q$. The CGA network selectively conditions the final prediction on a small subset of the most related patches from multi-view images through a top-$k$ sparse cross-attention (TKCA) mechanism. The view-agnostic local feature alignment (VLFA) serves to align the encoder output features to achieve view-agnostic representation through semi-supervised learning.
Figure 4: Our proposed correspondence-guided attention (CGA). The CGA comprises $B$ transformer-based blocks, each consisting of a standard self-attention module followed by a top-$k$ sparse cross-attention (TKCA) module.
Figure 5: Top-$k$ sparse attention-span visualization. For the query point (yellow), similarity heatmaps (first row) and top-$k$ attention-span (second row) across multiple views are shown.
...and 3 more figures

Looking 3D: Anomaly Detection with 2D-3D Alignment

TL;DR

Abstract

Looking 3D: Anomaly Detection with 2D-3D Alignment

Authors

TL;DR

Abstract

Table of Contents

Figures (8)