Table of Contents
Fetching ...

SplatPose & Detect: Pose-Agnostic 3D Anomaly Detection

Mathis Kruse, Marco Rudolph, Dominik Woiwode, Bodo Rosenhahn

TL;DR

This work tackles pose-variant 3D anomaly detection by encoding multi-view objects as a 3D Gaussian splat cloud and refining pose via differentiable SE(3) transformations. The approach enables rendering defect-free views at arbitrary poses and detects anomalies through cross-view feature comparisons, achieving state-of-the-art speed and accuracy on the MAD benchmark. By significantly reducing training and inference costs compared to NeRF-based and OmniAD baselines, SplatPose demonstrates strong data efficiency, including robust performance with sparse training data. The method’s practical impact lies in enabling fast, pose-robust 3D anomaly detection suitable for industrial deployment and real-time QA workflows.

Abstract

Detecting anomalies in images has become a well-explored problem in both academia and industry. State-of-the-art algorithms are able to detect defects in increasingly difficult settings and data modalities. However, most current methods are not suited to address 3D objects captured from differing poses. While solutions using Neural Radiance Fields (NeRFs) have been proposed, they suffer from excessive computation requirements, which hinder real-world usability. For this reason, we propose the novel 3D Gaussian splatting-based framework SplatPose which, given multi-view images of a 3D object, accurately estimates the pose of unseen views in a differentiable manner, and detects anomalies in them. We achieve state-of-the-art results in both training and inference speed, and detection performance, even when using less training data than competing methods. We thoroughly evaluate our framework using the recently proposed Pose-agnostic Anomaly Detection benchmark and its multi-pose anomaly detection (MAD) data set.

SplatPose & Detect: Pose-Agnostic 3D Anomaly Detection

TL;DR

This work tackles pose-variant 3D anomaly detection by encoding multi-view objects as a 3D Gaussian splat cloud and refining pose via differentiable SE(3) transformations. The approach enables rendering defect-free views at arbitrary poses and detects anomalies through cross-view feature comparisons, achieving state-of-the-art speed and accuracy on the MAD benchmark. By significantly reducing training and inference costs compared to NeRF-based and OmniAD baselines, SplatPose demonstrates strong data efficiency, including robust performance with sparse training data. The method’s practical impact lies in enabling fast, pose-robust 3D anomaly detection suitable for industrial deployment and real-time QA workflows.

Abstract

Detecting anomalies in images has become a well-explored problem in both academia and industry. State-of-the-art algorithms are able to detect defects in increasingly difficult settings and data modalities. However, most current methods are not suited to address 3D objects captured from differing poses. While solutions using Neural Radiance Fields (NeRFs) have been proposed, they suffer from excessive computation requirements, which hinder real-world usability. For this reason, we propose the novel 3D Gaussian splatting-based framework SplatPose which, given multi-view images of a 3D object, accurately estimates the pose of unseen views in a differentiable manner, and detects anomalies in them. We achieve state-of-the-art results in both training and inference speed, and detection performance, even when using less training data than competing methods. We thoroughly evaluate our framework using the recently proposed Pose-agnostic Anomaly Detection benchmark and its multi-pose anomaly detection (MAD) data set.
Paper Structure (24 sections, 8 equations, 5 figures, 12 tables)

This paper contains 24 sections, 8 equations, 5 figures, 12 tables.

Figures (5)

  • Figure 1: Example of SplatPose. A cloud representation is built from multi-view training images. During inference, query images with unknown poses are aligned and an anomaly map localizes their defects within the 3D object, irrespective of pose.
  • Figure 2: Overview of our pipeline. Multi-view training images are represented in a 3D point cloud of Gaussians. The unknown camera pose of a query image is first coarsely estimated and then iteratively refined by applying a pose transformation on the 3D point cloud before the differentiable renderer. The final anomaly-free rendering is then compared to the original test image to perform pixel-wise comparison for anomaly detection.
  • Figure 3: Influence on all detection metrics and inference speed, when changing the number of pose estimation steps $k$ from $25$ to $300$, with the performance saturating around $k = 175$. Since OmniAD has inference times magnitudes larger than , we do not include it in this experiment.
  • Figure 4: Examples of pose estimation using . Starting from OmniAD's coarse pose madsim, we refine it to match the ground truth. Examples are from MAD and the NeRF synthetic data.
  • Figure 5: Quantitative comparison of performance for both OmniAD when using between $20\%$ and $100\%$ of the available training data. We show the anomaly maps achieved by feature matching for both methods. Best viewed in color.