PiCo: Active Manifold Canonicalization for Robust Robotic Visual Anomaly Detection

Teng Yan; Binkai Liu; Shuai Liu; Yue Yu; Bingzhuo Zhong

PiCo: Active Manifold Canonicalization for Robust Robotic Visual Anomaly Detection

Teng Yan, Binkai Liu, Shuai Liu, Yue Yu, Bingzhuo Zhong

Abstract

Industrial deployment of robotic visual anomaly detection (VAD) is fundamentally constrained by passive perception under diverse 6-DoF pose configurations and unstable operating conditions such as illumination changes and shadows, where intrinsic semantic anomalies and physical disturbances coexist and interact. To overcome these limitations, a paradigm shift from passive feature learning to Active Canonicalization is proposed. PiCo (Pose-in-Condition Canonicalization) is introduced as a unified framework that actively projects observations onto a condition-invariant canonical manifold. PiCo operates through a cascaded mechanism. The first stage, Active Physical Canonicalization, enables a robotic agent to reorient objects in order to reduce geometric uncertainty at its source. The second stage, Neural Latent Canonicalization, adopts a three-stage denoising hierarchy consisting of photometric processing at the input level, latent refinement at the feature level, and contextual reasoning at the semantic level, progressively eliminating nuisance factors across representational scales. Extensive evaluations on the large-scale M2AD benchmark demonstrate the superiority of this paradigm. PiCo achieves a state-of-the-art 93.7% O-AUROC, representing a 3.7% improvement over prior methods in static settings, and attains 98.5% accuracy in active closed-loop scenarios. These results demonstrate that active manifold canonicalization is critical for robust embodied perception.

PiCo: Active Manifold Canonicalization for Robust Robotic Visual Anomaly Detection

Abstract

Paper Structure (18 sections, 8 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 18 sections, 8 equations, 5 figures, 2 tables, 1 algorithm.

Introduction
Related Works
Unsupervised Visual Anomaly Detection
Invariance Learning and Disentanglement
Active Perception in Robotics
Methodology
Preliminary Knowledge
Stage I: Environmental Normalization via Illumination-preprocessing
Stage II: Latent Canonicalization via Spectral Filtering
Stage III: Contextual Canonicalization via Spatial-Aware Decoder
Active Canonicalization Policy
Experiments and Experience
Experimental Setup
Datasets
Comparative Analysis on $M^2AD$ (Addressing Q1)
...and 3 more sections

Figures (5)

Figure 1: Passive Learning vs Active Canonicalization: PiCo Framework for VAD. Conventional visual anomaly detection relies on passive feature learning under fixed viewpoints and illumination, resulting in limited robustness to pose variations, uneven lighting, and specular artifacts. In contrast, PiCo integrates Active Physical Canonicalization and Neural Latent Canonicalization to actively reduce condition-induced variability, achieving 98.5% accuracy under closed-loop operation.
Figure 2: Multi-stage Canonicalization Pipeline of the PiCo Framework. This cascaded architecture progressively disentangles semantics from physical nuisances across three neural levels: photometric (Stage I), latent spectral (Stage II), and global contextual (Stage III). The resulting invariant representations ultimately drive the active physical canonicalization policy.
Figure 3: Modular Architecture of PiCo's Canonicalization Mechanism. The framework integrates a photometric preprocessing module for lighting estimation, a dual-path bottleneck MLP for latent feature refinement, and contextual canonicalization. Ultimately, reconstruction uncertainty serves as a real-time feedback signal, driving the robotic agent to actively seek a geometrically optimal canonical view.
Figure 5: Qualitative anomaly detection results on the $\mathbf{M^2AD}$ benchmark. The rows display normal image, anomalous image, PiCo's predicted anomaly map masked by ground truth. Evaluated under highly fluctuating illumination and viewpoints, PiCo precisely localizes semantic defects while suppressing environmental nuisances.
Figure 6: Quantitative evaluation of active canonicalization on 50 hard cases.(a) Uncertainty collapse: Active robotic re-orientation forces widely dispersed, high-uncertainty observations to converge into a tight, low-uncertainty manifold. (b) Accuracy comparison: While passive SOTA baselines fail under severe physical occlusions (peaking at 68.2%), PiCo's active pipeline breaks this physical ceiling, achieving 98.5% accuracy at the canonical view.

PiCo: Active Manifold Canonicalization for Robust Robotic Visual Anomaly Detection

Abstract

PiCo: Active Manifold Canonicalization for Robust Robotic Visual Anomaly Detection

Authors

Abstract

Table of Contents

Figures (5)