Table of Contents
Fetching ...

View-Invariant Pixelwise Anomaly Detection in Multi-object Scenes with Adaptive View Synthesis

Subin Varghese, Vedhus Hoskere

TL;DR

Scene AD addresses unsupervised pixel-level anomaly localization under unconstrained, multi-view, multi-object conditions. The authors propose OmniAD, a refined Reverse Distillation with a ResNeXt backbone and ERF-expanding student attention, augmented by NeRF-based view synthesis strategies (INV and QANV) to improve generalization across viewpoints. They introduce ToyCity, a real-image multi-object multi-view benchmark, and demonstrate that OmniAD with NVS augmentations achieves a substantial improvement over baselines (e.g., a 64.33% relative gain in pixel-wise $F_1$ over RD without augmentation) and generalizes to MAD-Real and fixed-view datasets like MVTec-AD. The work provides the Scene AD task definition, the ToyCity benchmark, view-synthesis augmentation methods, and the OmniAD model as a robust baseline for view-invariant anomaly detection in real-world scenes.

Abstract

The built environment, encompassing critical infrastructure such as bridges and buildings, requires diligent monitoring of unexpected anomalies or deviations from a normal state in captured imagery. Anomaly detection methods could aid in automating this task; however, deploying anomaly detection effectively in such environments presents significant challenges that have not been evaluated before. These challenges include camera viewpoints that vary, the presence of multiple objects within a scene, and the absence of labeled anomaly data for training. To address these comprehensively, we introduce and formalize Scene Anomaly Detection (Scene AD) as the task of unsupervised, pixel-wise anomaly localization under these specific real-world conditions. Evaluating progress in Scene AD required the development of ToyCity, the first multi-object, multi-view real-image dataset, for unsupervised anomaly detection. Our initial evaluations using ToyCity revealed that established anomaly detection baselines struggle to achieve robust pixel-level localization. To address this, two data augmentation strategies were created to generate additional synthetic images of non-anomalous regions to enhance generalizability. However, the addition of these synthetic images alone only provided minor improvements. Thus, OmniAD, a refinement of the Reverse Distillation methodology, was created to establish a stronger baseline. Our experiments demonstrate that OmniAD, when used with augmented views, yields a 64.33\% increase in pixel-wise \(F_1\) score over Reverse Distillation with no augmentation. Collectively, this work offers the Scene AD task definition, the ToyCity benchmark, the view synthesis augmentation approaches, and the OmniAD method. Project Page: https://drags99.github.io/OmniAD/

View-Invariant Pixelwise Anomaly Detection in Multi-object Scenes with Adaptive View Synthesis

TL;DR

Scene AD addresses unsupervised pixel-level anomaly localization under unconstrained, multi-view, multi-object conditions. The authors propose OmniAD, a refined Reverse Distillation with a ResNeXt backbone and ERF-expanding student attention, augmented by NeRF-based view synthesis strategies (INV and QANV) to improve generalization across viewpoints. They introduce ToyCity, a real-image multi-object multi-view benchmark, and demonstrate that OmniAD with NVS augmentations achieves a substantial improvement over baselines (e.g., a 64.33% relative gain in pixel-wise over RD without augmentation) and generalizes to MAD-Real and fixed-view datasets like MVTec-AD. The work provides the Scene AD task definition, the ToyCity benchmark, view-synthesis augmentation methods, and the OmniAD model as a robust baseline for view-invariant anomaly detection in real-world scenes.

Abstract

The built environment, encompassing critical infrastructure such as bridges and buildings, requires diligent monitoring of unexpected anomalies or deviations from a normal state in captured imagery. Anomaly detection methods could aid in automating this task; however, deploying anomaly detection effectively in such environments presents significant challenges that have not been evaluated before. These challenges include camera viewpoints that vary, the presence of multiple objects within a scene, and the absence of labeled anomaly data for training. To address these comprehensively, we introduce and formalize Scene Anomaly Detection (Scene AD) as the task of unsupervised, pixel-wise anomaly localization under these specific real-world conditions. Evaluating progress in Scene AD required the development of ToyCity, the first multi-object, multi-view real-image dataset, for unsupervised anomaly detection. Our initial evaluations using ToyCity revealed that established anomaly detection baselines struggle to achieve robust pixel-level localization. To address this, two data augmentation strategies were created to generate additional synthetic images of non-anomalous regions to enhance generalizability. However, the addition of these synthetic images alone only provided minor improvements. Thus, OmniAD, a refinement of the Reverse Distillation methodology, was created to establish a stronger baseline. Our experiments demonstrate that OmniAD, when used with augmented views, yields a 64.33\% increase in pixel-wise score over Reverse Distillation with no augmentation. Collectively, this work offers the Scene AD task definition, the ToyCity benchmark, the view synthesis augmentation approaches, and the OmniAD method. Project Page: https://drags99.github.io/OmniAD/

Paper Structure

This paper contains 26 sections, 4 equations, 13 figures, 13 tables.

Figures (13)

  • Figure 1: (a) Typical images used in the built environment. (b) Images available for benchmarking unsupervised anomaly detection bergmann2019mvteczou2022spotbergmann2022beyond. The top row for (a) and (b) consist of non-anomalous images, while the bottom is anomalous.
  • Figure 2: In Scene AD, we observe variations in camera positions between non-anomalous views (green) and query views that may contain anomalies (red). In contrast, traditional AD methods typically feature aligned views between the non-anomalous and anomalous images.
  • Figure 3: Overview of our methodology is depicted here. We utilize the non-anomalous views from a Scene AD to generate a non-anomalous SfM model. A novel view selection strategy utilizes anomaly views and the SfM model to generate novel views to augment the non-anomalous dataset. OmniAD is then trained on the augmented non-anomalous dataset for anomaly detection.
  • Figure 4: Overview of our novel view selection strategy. Utilizing the non-anomalous dataset, we generate a structure from motion (SfM) 3D model and accompanying camera pose for each image. Leveraging these non-anomalous camera poses, we design a trajectory that connects them, allowing for interpolation. Subsequently, we utilize a NeRF for Novel View Synthesis (NVS) to generate images that present unique perspectives along the interpolated path. Query images can also be used to determine more novel views.
  • Figure 5: The provided sequence illustrates camera interpolation, where the pose transitions from the starting image to the ending image, are used to synthesize intermediate views. These interpolated views, not present in the original dataset, bridge the gap between the start and end poses, introducing new perspectives to the dataset.
  • ...and 8 more figures