Increasing the Task Flexibility of Heavy-Duty Manipulators Using Visual 6D Pose Estimation of Objects
Petri Mäkinen, Pauli Mustalahti, Tuomo Kivelä, Jouni Mattila
TL;DR
The paper tackles precise TCP positioning for non-rigid heavy-duty long-reach manipulators by combining eye-in-hand visual 6D pose estimation with motion-based camera-to-robot calibration. It presents an end-to-end pipeline that detects OOIs, estimates their 6D poses from synthetic-data-trained networks, and uses orientation alignment, VO/SLAM-based calibration, and image-based position alignment to drive IPC positioning with high accuracy. The approach achieves sub-2 mm horizontal positioning accuracy in real-world tests and demonstrates a practical method to increase task flexibility and automation for HDLR manipulators without reliance on external fiducials. While limited by non-real-time pose updates, the method offers a viable route toward higher TRLs and more robust, autonomous operation in challenging industrial environments.
Abstract
Recent advances in visual 6D pose estimation of objects using deep neural networks have enabled novel ways of vision-based control for heavy-duty robotic applications. In this study, we present a pipeline for the precise tool positioning of heavy-duty, long-reach (HDLR) manipulators using advanced machine vision. A camera is utilized in the so-called eye-in-hand configuration to estimate directly the poses of a tool and a target object of interest (OOI). Based on the pose error between the tool and the target, along with motion-based calibration between the camera and the robot, precise tool positioning can be reliably achieved using conventional robotic modeling and control methods prevalent in the industry. The proposed methodology comprises orientation and position alignment based on the visually estimated OOI poses, whereas camera-to-robot calibration is conducted based on motion utilizing visual SLAM. The methods seek to avert the inaccuracies resulting from rigid-body--based kinematics of structurally flexible HDLR manipulators via image-based algorithms. To train deep neural networks for OOI pose estimation, only synthetic data are utilized. The methods are validated in a real-world setting using an HDLR manipulator with a 5 m reach. The experimental results demonstrate that an image-based average tool positioning error of less than 2 mm along the non-depth axes is achieved, which facilitates a new way to increase the task flexibility and automation level of non-rigid HDLR manipulators.
