Industrial Application of 6D Pose Estimation for Robotic Manipulation in Automotive Internal Logistics
Philipp Quentin, Dino Knoll, Daniel Goehring
TL;DR
This work evaluates a representative, end-to-end 6D pose estimation pipeline for automotive internal logistics, combining LabelFusion-based real-world data, NVISII-based synthetic data, and RGB-/RGB-D-based estimators (GDR-Net and DenseFusion) to assess feasibility for industrial deployment. The study finds that while data-generation pipelines scale well and trained estimators can reach near-optimal accuracy, they fall short on robustness due to unreliable uncertainty estimates, which can lead to crashes or missed placements in practice. RGB-based methods tend to be more robust than RGB-D methods when depth data is noisy or missing, but all approaches struggle with domain gaps introduced by synthetic data. The results highlight the need for reliable uncertainty quantification and depth-noise-aware training to move toward practical, scalable automation in automotive internal logistics. The work demonstrates the potential and current limits of translating advanced 6D pose estimation research into industry-ready robotic grasping pipelines, underscoring uncertainty modeling as the key area for future advancement.
Abstract
Despite the advances in robotics a large proportion of the of parts handling tasks in the automotive industry's internal logistics are not automated but still performed by humans. A key component to competitively automate these processes is a 6D pose estimation that can handle a large number of different parts, is adaptable to new parts with little manual effort, and is sufficiently accurate and robust with respect to industry requirements. In this context, the question arises as to the current status quo with respect to these measures. To address this we built a representative 6D pose estimation pipeline with state-of-the-art components from economically scalable real to synthetic data generation to pose estimators and evaluated it on automotive parts with regards to a realistic sequencing process. We found that using the data generation approaches, the performance of the trained 6D pose estimators are promising, but do not meet industry requirements. We reveal that the reason for this is the inability of the estimators to provide reliable uncertainties for their poses, rather than the ability of to provide sufficiently accurate poses. In this context we further analyzed how RGB- and RGB-D-based approaches compare against this background and show that they are differently vulnerable to the domain gap induced by synthetic data.
