Gaze-based dual resolution deep imitation learning for high-precision dexterous robot manipulation

Heecheol Kim; Yoshiyuki Ohmura; Yasuo Kuniyoshi

Gaze-based dual resolution deep imitation learning for high-precision dexterous robot manipulation

Heecheol Kim, Yoshiyuki Ohmura, Yasuo Kuniyoshi

TL;DR

This work tackles high-precision manipulation of deformable objects (e.g., needle threading) by viewing control through a gaze-guided dual-resolution lens that separates fast-reaching motions from slow, precise targeting. By processing a peripheral, low-resolution view for rapid approach and a foveated, high-resolution view around gaze for precise interaction, the approach learns slow- and fast-action policies via deep imitation learning, augmented with a gaze predictor and a recovery action module. Key contributions include a robust dual-vision architecture, a Gaussian-mixture-based action-separation threshold, explicit recovery mechanisms, and evidence that foveated vision plus stereo input substantially boosts performance and computational efficiency over full high-resolution processing. The method demonstrates strong task performance on needle threading and generalizes to bolt picking, with practical implications for dexterous manipulation of deformable objects in real-world robotics and potential extensions to active vision systems.

Abstract

A high-precision manipulation task, such as needle threading, is challenging. Physiological studies have proposed connecting low-resolution peripheral vision and fast movement to transport the hand into the vicinity of an object, and using high-resolution foveated vision to achieve the accurate homing of the hand to the object. The results of this study demonstrate that a deep imitation learning based method, inspired by the gaze-based dual resolution visuomotor control system in humans, can solve the needle threading task. First, we recorded the gaze movements of a human operator who was teleoperating a robot. Then, we used only a high-resolution image around the gaze to precisely control the thread position when it was close to the target. We used a low-resolution peripheral image to reach the vicinity of the target. The experimental results obtained in this study demonstrate that the proposed method enables precise manipulation tasks using a general-purpose robot manipulator and improves computational efficiency. Data from this and related works are available at: https://sites.google.com/view/multi-task-fine.

Gaze-based dual resolution deep imitation learning for high-precision dexterous robot manipulation

TL;DR

Abstract

Gaze-based dual resolution deep imitation learning for high-precision dexterous robot manipulation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)