Targeted Hard Sample Synthesis Based on Estimated Pose and Occlusion Error for Improved Object Pose Estimation
Alan Li, Angela P. Schoellig
TL;DR
The paper tackles robust 6D pose estimation in bin-picking by diagnosing where pose estimation errors concentrate in pose and occlusion spaces and then synthesizing targeted hard samples online. It introduces a dual error-modeling framework, deriving $P(\theta,\phi)$ for pose-space and $L(k)$ for occlusion-space, and uses these to generate realistic, difficult training cases within a bin environment. Through continuous online training, the method demonstrates up to 20% gains in correct detection rate on ROBI and notable improvements on bin-picking scenes in T-LESS, while remaining compatible with multiple pose estimators such as PVNet and GDRNPP. The approach enhances reliability and data efficiency for industrial 6D pose estimation, particularly under occlusion and symmetry challenges, with practical impact for robotic manipulation in cluttered environments.
Abstract
6D Object pose estimation is a fundamental component in robotics enabling efficient interaction with the environment. It is particularly challenging in bin-picking applications, where objects may be textureless and in difficult poses, and occlusion between objects of the same type may cause confusion even in well-trained models. We propose a novel method of hard example synthesis that is model-agnostic, using existing simulators and the modeling of pose error in both the camera-to-object viewsphere and occlusion space. Through evaluation of the model performance with respect to the distribution of object poses and occlusions, we discover regions of high error and generate realistic training samples to specifically target these regions. With our training approach, we demonstrate an improvement in correct detection rate of up to 20% across several ROBI-dataset objects using state-of-the-art pose estimation models.
