PIRATR: Parametric Object Inference for Robotic Applications with Transformers in 3D Point Clouds
Michael Schwingshackl, Fabio F. Oberweger, Mario Niedermeyer, Huemer Johannes, Markus Murschitz
TL;DR
PIRATR tackles the challenge of robotic perception in outdoor environments by jointly estimating multi-class 6-DoF poses and class-specific parametric attributes from occluded LiDAR point clouds. It extends the PI3DETR framework with class-specific prediction heads, a geometry-aware matching mechanism, and a Chamfer-based loss to handle parametric objects under occlusion. Trained entirely on synthetic data with realistic LiDAR simulation, PIRATR demonstrates strong synthetic-to-real transfer on real-world forklift data, achieving a real-world mAP of 0.919 and robust pose estimation across grippers, loading platforms, and pallets. The work bridges geometric reasoning and actionable world models, enabling scalable, simulation-trained perception for dynamic robotic applications and outlining clear directions for broader class support and temporal integration.
Abstract
We present PIRATR, an end-to-end 3D object detection framework for robotic use cases in point clouds. Extending PI3DETR, our method streamlines parametric 3D object detection by jointly estimating multi-class 6-DoF poses and class-specific parametric attributes directly from occlusion-affected point cloud data. This formulation enables not only geometric localization but also the estimation of task-relevant properties for parametric objects, such as a gripper's opening, where the 3D model is adjusted according to simple, predefined rules. The architecture employs modular, class-specific heads, making it straightforward to extend to novel object types without re-designing the pipeline. We validate PIRATR on an automated forklift platform, focusing on three structurally and functionally diverse categories: crane grippers, loading platforms, and pallets. Trained entirely in a synthetic environment, PIRATR generalizes effectively to real outdoor LiDAR scans, achieving a detection mAP of 0.919 without additional fine-tuning. PIRATR establishes a new paradigm of pose-aware, parameterized perception. This bridges the gap between low-level geometric reasoning and actionable world models, paving the way for scalable, simulation-trained perception systems that can be deployed in dynamic robotic environments. Code available at https://github.com/swingaxe/piratr.
