Table of Contents
Fetching ...

PIRATR: Parametric Object Inference for Robotic Applications with Transformers in 3D Point Clouds

Michael Schwingshackl, Fabio F. Oberweger, Mario Niedermeyer, Huemer Johannes, Markus Murschitz

TL;DR

PIRATR tackles the challenge of robotic perception in outdoor environments by jointly estimating multi-class 6-DoF poses and class-specific parametric attributes from occluded LiDAR point clouds. It extends the PI3DETR framework with class-specific prediction heads, a geometry-aware matching mechanism, and a Chamfer-based loss to handle parametric objects under occlusion. Trained entirely on synthetic data with realistic LiDAR simulation, PIRATR demonstrates strong synthetic-to-real transfer on real-world forklift data, achieving a real-world mAP of 0.919 and robust pose estimation across grippers, loading platforms, and pallets. The work bridges geometric reasoning and actionable world models, enabling scalable, simulation-trained perception for dynamic robotic applications and outlining clear directions for broader class support and temporal integration.

Abstract

We present PIRATR, an end-to-end 3D object detection framework for robotic use cases in point clouds. Extending PI3DETR, our method streamlines parametric 3D object detection by jointly estimating multi-class 6-DoF poses and class-specific parametric attributes directly from occlusion-affected point cloud data. This formulation enables not only geometric localization but also the estimation of task-relevant properties for parametric objects, such as a gripper's opening, where the 3D model is adjusted according to simple, predefined rules. The architecture employs modular, class-specific heads, making it straightforward to extend to novel object types without re-designing the pipeline. We validate PIRATR on an automated forklift platform, focusing on three structurally and functionally diverse categories: crane grippers, loading platforms, and pallets. Trained entirely in a synthetic environment, PIRATR generalizes effectively to real outdoor LiDAR scans, achieving a detection mAP of 0.919 without additional fine-tuning. PIRATR establishes a new paradigm of pose-aware, parameterized perception. This bridges the gap between low-level geometric reasoning and actionable world models, paving the way for scalable, simulation-trained perception systems that can be deployed in dynamic robotic environments. Code available at https://github.com/swingaxe/piratr.

PIRATR: Parametric Object Inference for Robotic Applications with Transformers in 3D Point Clouds

TL;DR

PIRATR tackles the challenge of robotic perception in outdoor environments by jointly estimating multi-class 6-DoF poses and class-specific parametric attributes from occluded LiDAR point clouds. It extends the PI3DETR framework with class-specific prediction heads, a geometry-aware matching mechanism, and a Chamfer-based loss to handle parametric objects under occlusion. Trained entirely on synthetic data with realistic LiDAR simulation, PIRATR demonstrates strong synthetic-to-real transfer on real-world forklift data, achieving a real-world mAP of 0.919 and robust pose estimation across grippers, loading platforms, and pallets. The work bridges geometric reasoning and actionable world models, enabling scalable, simulation-trained perception for dynamic robotic applications and outlining clear directions for broader class support and temporal integration.

Abstract

We present PIRATR, an end-to-end 3D object detection framework for robotic use cases in point clouds. Extending PI3DETR, our method streamlines parametric 3D object detection by jointly estimating multi-class 6-DoF poses and class-specific parametric attributes directly from occlusion-affected point cloud data. This formulation enables not only geometric localization but also the estimation of task-relevant properties for parametric objects, such as a gripper's opening, where the 3D model is adjusted according to simple, predefined rules. The architecture employs modular, class-specific heads, making it straightforward to extend to novel object types without re-designing the pipeline. We validate PIRATR on an automated forklift platform, focusing on three structurally and functionally diverse categories: crane grippers, loading platforms, and pallets. Trained entirely in a synthetic environment, PIRATR generalizes effectively to real outdoor LiDAR scans, achieving a detection mAP of 0.919 without additional fine-tuning. PIRATR establishes a new paradigm of pose-aware, parameterized perception. This bridges the gap between low-level geometric reasoning and actionable world models, paving the way for scalable, simulation-trained perception systems that can be deployed in dynamic robotic environments. Code available at https://github.com/swingaxe/piratr.
Paper Structure (20 sections, 8 equations, 10 figures, 2 tables)

This paper contains 20 sections, 8 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: PIRATR is an end-to-end trainable model that takes a point cloud as input, applies farthest point sampling to generate point queries, and encodes them into output embeddings using 3DETR misra2021end. Class-specific heads predict parametric objects corresponding to the gripper, loading platform, and pallet. During training, a geometry-aware matcher oberweger2025pi3detr learns to associate embeddings with the correct classes, and at inference, the model directly outputs class-specific configurations, which are applied to predefined meshes to generate the final predictions.
  • Figure 2: Image of the autonomous forklift operating in an outdoor environment, where our parametric object detection method is tested and deployed. The main detection targets such as the gripper, loading platform and pallets are visible to provide understanding of the perception task and context.
  • Figure 3: Left: reference image from the forklift-mounted camera. Center: input point cloud captured with a Livox Mid70 LiDAR. Right: 3D annotations of the supervision targets: gripper, loading platform, and pallets.
  • Figure 4: (a) CAD models and supervision targets. (b) Synthetic training scene and corresponding simulated point cloud.
  • Figure 5: Qualitative synthetic-to-real prediction of PIRATR, which is trained solely on synthetic data and evaluated on real scans. Predicted classes: gripper (yellow), loading platforms (cyan), and pallets (magenta).
  • ...and 5 more figures