Canonical Space Representation for 4D Panoptic Segmentation of Articulated Objects
Manuel Gomes, Bogdan Raducanu, Miguel Oliveira
TL;DR
This work tackles 4D panoptic segmentation for articulated objects by introducing Artic4D, a synthetic but realistic benchmark with 4D sensor data and rich annotations. It then proposes CanonSeg4D, a segmentation framework that learns a canonical representation for each movable part, enabling articulation-invariant, temporally consistent part clustering via a PST-Transformer backbone, a semantic head, and a canonical module with offset-based losses. Extensive experiments on Artic4D show CanonSeg4D achieving superior $LSTQ$ scores, especially in highly articulated scenarios, outperforming state-of-the-art methods by leveraging temporal context and canonical alignment. The results demonstrate the strength of temporal modeling and canonical-space representations for dynamic object understanding, with implications for robotic manipulation and real-world perception pipelines.
Abstract
Articulated object perception presents significant challenges in computer vision, particularly because most existing methods ignore temporal dynamics despite the inherently dynamic nature of such objects. The use of 4D temporal data has not been thoroughly explored in articulated object perception and remains unexamined for panoptic segmentation. The lack of a benchmark dataset further hurt this field. To this end, we introduce Artic4D as a new dataset derived from PartNet Mobility and augmented with synthetic sensor data, featuring 4D panoptic annotations and articulation parameters. Building on this dataset, we propose CanonSeg4D, a novel 4D panoptic segmentation framework. This approach explicitly estimates per-frame offsets mapping observed object parts to a learned canonical space, thereby enhancing part-level segmentation. The framework employs this canonical representation to achieve consistent alignment of object parts across sequential frames. Comprehensive experiments on Artic4D demonstrate that the proposed CanonSeg4D outperforms state of the art approaches in panoptic segmentation accuracy in more complex scenarios. These findings highlight the effectiveness of temporal modeling and canonical alignment in dynamic object understanding, and pave the way for future advances in 4D articulated object perception.
