Table of Contents
Fetching ...

PiCo: Active Manifold Canonicalization for Robust Robotic Visual Anomaly Detection

Teng Yan, Binkai Liu, Shuai Liu, Yue Yu, Bingzhuo Zhong

Abstract

Industrial deployment of robotic visual anomaly detection (VAD) is fundamentally constrained by passive perception under diverse 6-DoF pose configurations and unstable operating conditions such as illumination changes and shadows, where intrinsic semantic anomalies and physical disturbances coexist and interact. To overcome these limitations, a paradigm shift from passive feature learning to Active Canonicalization is proposed. PiCo (Pose-in-Condition Canonicalization) is introduced as a unified framework that actively projects observations onto a condition-invariant canonical manifold. PiCo operates through a cascaded mechanism. The first stage, Active Physical Canonicalization, enables a robotic agent to reorient objects in order to reduce geometric uncertainty at its source. The second stage, Neural Latent Canonicalization, adopts a three-stage denoising hierarchy consisting of photometric processing at the input level, latent refinement at the feature level, and contextual reasoning at the semantic level, progressively eliminating nuisance factors across representational scales. Extensive evaluations on the large-scale M2AD benchmark demonstrate the superiority of this paradigm. PiCo achieves a state-of-the-art 93.7% O-AUROC, representing a 3.7% improvement over prior methods in static settings, and attains 98.5% accuracy in active closed-loop scenarios. These results demonstrate that active manifold canonicalization is critical for robust embodied perception.

PiCo: Active Manifold Canonicalization for Robust Robotic Visual Anomaly Detection

Abstract

Industrial deployment of robotic visual anomaly detection (VAD) is fundamentally constrained by passive perception under diverse 6-DoF pose configurations and unstable operating conditions such as illumination changes and shadows, where intrinsic semantic anomalies and physical disturbances coexist and interact. To overcome these limitations, a paradigm shift from passive feature learning to Active Canonicalization is proposed. PiCo (Pose-in-Condition Canonicalization) is introduced as a unified framework that actively projects observations onto a condition-invariant canonical manifold. PiCo operates through a cascaded mechanism. The first stage, Active Physical Canonicalization, enables a robotic agent to reorient objects in order to reduce geometric uncertainty at its source. The second stage, Neural Latent Canonicalization, adopts a three-stage denoising hierarchy consisting of photometric processing at the input level, latent refinement at the feature level, and contextual reasoning at the semantic level, progressively eliminating nuisance factors across representational scales. Extensive evaluations on the large-scale M2AD benchmark demonstrate the superiority of this paradigm. PiCo achieves a state-of-the-art 93.7% O-AUROC, representing a 3.7% improvement over prior methods in static settings, and attains 98.5% accuracy in active closed-loop scenarios. These results demonstrate that active manifold canonicalization is critical for robust embodied perception.
Paper Structure (18 sections, 8 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 18 sections, 8 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: Passive Learning vs Active Canonicalization: PiCo Framework for VAD. Conventional visual anomaly detection relies on passive feature learning under fixed viewpoints and illumination, resulting in limited robustness to pose variations, uneven lighting, and specular artifacts. In contrast, PiCo integrates Active Physical Canonicalization and Neural Latent Canonicalization to actively reduce condition-induced variability, achieving 98.5% accuracy under closed-loop operation.
  • Figure 2: Multi-stage Canonicalization Pipeline of the PiCo Framework. This cascaded architecture progressively disentangles semantics from physical nuisances across three neural levels: photometric (Stage I), latent spectral (Stage II), and global contextual (Stage III). The resulting invariant representations ultimately drive the active physical canonicalization policy.
  • Figure 3: Modular Architecture of PiCo's Canonicalization Mechanism. The framework integrates a photometric preprocessing module for lighting estimation, a dual-path bottleneck MLP for latent feature refinement, and contextual canonicalization. Ultimately, reconstruction uncertainty serves as a real-time feedback signal, driving the robotic agent to actively seek a geometrically optimal canonical view.
  • Figure 5: Qualitative anomaly detection results on the $\mathbf{M^2AD}$ benchmark. The rows display normal image, anomalous image, PiCo's predicted anomaly map masked by ground truth. Evaluated under highly fluctuating illumination and viewpoints, PiCo precisely localizes semantic defects while suppressing environmental nuisances.
  • Figure 6: Quantitative evaluation of active canonicalization on 50 hard cases.(a) Uncertainty collapse: Active robotic re-orientation forces widely dispersed, high-uncertainty observations to converge into a tight, low-uncertainty manifold. (b) Accuracy comparison: While passive SOTA baselines fail under severe physical occlusions (peaking at 68.2%), PiCo's active pipeline breaks this physical ceiling, achieving 98.5% accuracy at the canonical view.