Table of Contents
Fetching ...

Gaze-based dual resolution deep imitation learning for high-precision dexterous robot manipulation

Heecheol Kim, Yoshiyuki Ohmura, Yasuo Kuniyoshi

TL;DR

This work tackles high-precision manipulation of deformable objects (e.g., needle threading) by viewing control through a gaze-guided dual-resolution lens that separates fast-reaching motions from slow, precise targeting. By processing a peripheral, low-resolution view for rapid approach and a foveated, high-resolution view around gaze for precise interaction, the approach learns slow- and fast-action policies via deep imitation learning, augmented with a gaze predictor and a recovery action module. Key contributions include a robust dual-vision architecture, a Gaussian-mixture-based action-separation threshold, explicit recovery mechanisms, and evidence that foveated vision plus stereo input substantially boosts performance and computational efficiency over full high-resolution processing. The method demonstrates strong task performance on needle threading and generalizes to bolt picking, with practical implications for dexterous manipulation of deformable objects in real-world robotics and potential extensions to active vision systems.

Abstract

A high-precision manipulation task, such as needle threading, is challenging. Physiological studies have proposed connecting low-resolution peripheral vision and fast movement to transport the hand into the vicinity of an object, and using high-resolution foveated vision to achieve the accurate homing of the hand to the object. The results of this study demonstrate that a deep imitation learning based method, inspired by the gaze-based dual resolution visuomotor control system in humans, can solve the needle threading task. First, we recorded the gaze movements of a human operator who was teleoperating a robot. Then, we used only a high-resolution image around the gaze to precisely control the thread position when it was close to the target. We used a low-resolution peripheral image to reach the vicinity of the target. The experimental results obtained in this study demonstrate that the proposed method enables precise manipulation tasks using a general-purpose robot manipulator and improves computational efficiency. Data from this and related works are available at: https://sites.google.com/view/multi-task-fine.

Gaze-based dual resolution deep imitation learning for high-precision dexterous robot manipulation

TL;DR

This work tackles high-precision manipulation of deformable objects (e.g., needle threading) by viewing control through a gaze-guided dual-resolution lens that separates fast-reaching motions from slow, precise targeting. By processing a peripheral, low-resolution view for rapid approach and a foveated, high-resolution view around gaze for precise interaction, the approach learns slow- and fast-action policies via deep imitation learning, augmented with a gaze predictor and a recovery action module. Key contributions include a robust dual-vision architecture, a Gaussian-mixture-based action-separation threshold, explicit recovery mechanisms, and evidence that foveated vision plus stereo input substantially boosts performance and computational efficiency over full high-resolution processing. The method demonstrates strong task performance on needle threading and generalizes to bolt picking, with practical implications for dexterous manipulation of deformable objects in real-world robotics and potential extensions to active vision systems.

Abstract

A high-precision manipulation task, such as needle threading, is challenging. Physiological studies have proposed connecting low-resolution peripheral vision and fast movement to transport the hand into the vicinity of an object, and using high-resolution foveated vision to achieve the accurate homing of the hand to the object. The results of this study demonstrate that a deep imitation learning based method, inspired by the gaze-based dual resolution visuomotor control system in humans, can solve the needle threading task. First, we recorded the gaze movements of a human operator who was teleoperating a robot. Then, we used only a high-resolution image around the gaze to precisely control the thread position when it was close to the target. We used a low-resolution peripheral image to reach the vicinity of the target. The experimental results obtained in this study demonstrate that the proposed method enables precise manipulation tasks using a general-purpose robot manipulator and improves computational efficiency. Data from this and related works are available at: https://sites.google.com/view/multi-task-fine.

Paper Structure

This paper contains 22 sections, 1 equation, 11 figures, 5 tables, 1 algorithm.

Figures (11)

  • Figure 1: The proposed method can efficiently calculate a precise policy with both global features from peripheral vision (\ref{['fig:small_left_gaze']}) and detailed visual information for the needle and thread from foveated vision (\ref{['fig:fovea_left']}).
  • Figure 2: Difference between thread grasps. The proposed method can adjust its policy with respect to the posture of the grasped thread.
  • Figure 3: Histogram of action speed and fitted GMM of needle threading. The intersection point of the two Gaussian distributions is defined as the threshold between slow-action and fast-action.
  • Figure 4: Proposed architecture.
  • Figure 5: Example of task failure caused by the human operator. The operator failed to thread the needle (\ref{['fig:50']}), recovered from the failure (\ref{['fig:59']}), retried, and finally succeeded (\ref{['fig:69']}).
  • ...and 6 more figures