Table of Contents
Fetching ...

Increasing the Task Flexibility of Heavy-Duty Manipulators Using Visual 6D Pose Estimation of Objects

Petri Mäkinen, Pauli Mustalahti, Tuomo Kivelä, Jouni Mattila

TL;DR

The paper tackles precise TCP positioning for non-rigid heavy-duty long-reach manipulators by combining eye-in-hand visual 6D pose estimation with motion-based camera-to-robot calibration. It presents an end-to-end pipeline that detects OOIs, estimates their 6D poses from synthetic-data-trained networks, and uses orientation alignment, VO/SLAM-based calibration, and image-based position alignment to drive IPC positioning with high accuracy. The approach achieves sub-2 mm horizontal positioning accuracy in real-world tests and demonstrates a practical method to increase task flexibility and automation for HDLR manipulators without reliance on external fiducials. While limited by non-real-time pose updates, the method offers a viable route toward higher TRLs and more robust, autonomous operation in challenging industrial environments.

Abstract

Recent advances in visual 6D pose estimation of objects using deep neural networks have enabled novel ways of vision-based control for heavy-duty robotic applications. In this study, we present a pipeline for the precise tool positioning of heavy-duty, long-reach (HDLR) manipulators using advanced machine vision. A camera is utilized in the so-called eye-in-hand configuration to estimate directly the poses of a tool and a target object of interest (OOI). Based on the pose error between the tool and the target, along with motion-based calibration between the camera and the robot, precise tool positioning can be reliably achieved using conventional robotic modeling and control methods prevalent in the industry. The proposed methodology comprises orientation and position alignment based on the visually estimated OOI poses, whereas camera-to-robot calibration is conducted based on motion utilizing visual SLAM. The methods seek to avert the inaccuracies resulting from rigid-body--based kinematics of structurally flexible HDLR manipulators via image-based algorithms. To train deep neural networks for OOI pose estimation, only synthetic data are utilized. The methods are validated in a real-world setting using an HDLR manipulator with a 5 m reach. The experimental results demonstrate that an image-based average tool positioning error of less than 2 mm along the non-depth axes is achieved, which facilitates a new way to increase the task flexibility and automation level of non-rigid HDLR manipulators.

Increasing the Task Flexibility of Heavy-Duty Manipulators Using Visual 6D Pose Estimation of Objects

TL;DR

The paper tackles precise TCP positioning for non-rigid heavy-duty long-reach manipulators by combining eye-in-hand visual 6D pose estimation with motion-based camera-to-robot calibration. It presents an end-to-end pipeline that detects OOIs, estimates their 6D poses from synthetic-data-trained networks, and uses orientation alignment, VO/SLAM-based calibration, and image-based position alignment to drive IPC positioning with high accuracy. The approach achieves sub-2 mm horizontal positioning accuracy in real-world tests and demonstrates a practical method to increase task flexibility and automation for HDLR manipulators without reliance on external fiducials. While limited by non-real-time pose updates, the method offers a viable route toward higher TRLs and more robust, autonomous operation in challenging industrial environments.

Abstract

Recent advances in visual 6D pose estimation of objects using deep neural networks have enabled novel ways of vision-based control for heavy-duty robotic applications. In this study, we present a pipeline for the precise tool positioning of heavy-duty, long-reach (HDLR) manipulators using advanced machine vision. A camera is utilized in the so-called eye-in-hand configuration to estimate directly the poses of a tool and a target object of interest (OOI). Based on the pose error between the tool and the target, along with motion-based calibration between the camera and the robot, precise tool positioning can be reliably achieved using conventional robotic modeling and control methods prevalent in the industry. The proposed methodology comprises orientation and position alignment based on the visually estimated OOI poses, whereas camera-to-robot calibration is conducted based on motion utilizing visual SLAM. The methods seek to avert the inaccuracies resulting from rigid-body--based kinematics of structurally flexible HDLR manipulators via image-based algorithms. To train deep neural networks for OOI pose estimation, only synthetic data are utilized. The methods are validated in a real-world setting using an HDLR manipulator with a 5 m reach. The experimental results demonstrate that an image-based average tool positioning error of less than 2 mm along the non-depth axes is achieved, which facilitates a new way to increase the task flexibility and automation level of non-rigid HDLR manipulators.

Paper Structure

This paper contains 16 sections, 21 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: The two OOIs used in this work. The mock-up tool attached to the manipulator is highlighted with the red bounding box, whereas the larger target OOI is highlighted with the blue bounding box.
  • Figure 2: Example color images of the synthetic dataset generated with BlenderProc4BOP.
  • Figure 3: Examples of the real-world dataset used for fine-tuning the visual object detector.
  • Figure 4: The experimental setup comprising an HDLR manipulator with an eye-in-hand camera and two OOIs. The control objective was to position the tool OOI to one of the holes of the target OOI. The static mapping from each hole to the target's base frame was based on the known geometry.
  • Figure 5: The overall methodology for the precise TCP positioning of HDLR manipulators in OOI-focused applications. The advanced machine vision system comprises machine learning-related methods for visual OOI detection and pose estimation, along with VO/SLAM for motion-based calibration. The manipulator's real-time control system consists of the lower-level joint control guided by a given reference TCP pose.
  • ...and 5 more figures