Robust Imitation Learning for Mobile Manipulator Focusing on Task-Related Viewpoints and Regions

Yutaro Ishida; Yuki Noguchi; Takayuki Kanai; Kazuhiro Shintani; Hiroshi Bito

Robust Imitation Learning for Mobile Manipulator Focusing on Task-Related Viewpoints and Regions

Yutaro Ishida, Yuki Noguchi, Takayuki Kanai, Kazuhiro Shintani, Hiroshi Bito

TL;DR

A robust imitation learning method for mobile manipulators that focuses on task-related viewpoints and their spatial regions when observing multiple viewpoints and brings optimal viewpoints and robust visual embedding against occlusion and domain shift is proposed.

Abstract

We study how to generalize the visuomotor policy of a mobile manipulator from the perspective of visual observations. The mobile manipulator is prone to occlusion owing to its own body when only a single viewpoint is employed and a significant domain shift when deployed in diverse situations. However, to the best of the authors' knowledge, no study has been able to solve occlusion and domain shift simultaneously and propose a robust policy. In this paper, we propose a robust imitation learning method for mobile manipulators that focuses on task-related viewpoints and their spatial regions when observing multiple viewpoints. The multiple viewpoint policy includes attention mechanism, which is learned with an augmented dataset, and brings optimal viewpoints and robust visual embedding against occlusion and domain shift. Comparison of our results for different tasks and environments with those of previous studies revealed that our proposed method improves the success rate by up to 29.3 points. We also conduct ablation studies using our proposed method. Learning task-related viewpoints from the multiple viewpoints dataset increases robustness to occlusion than using a uniquely defined viewpoint. Focusing on task-related regions contributes to up to a 33.3-point improvement in the success rate against domain shift.

Robust Imitation Learning for Mobile Manipulator Focusing on Task-Related Viewpoints and Regions

TL;DR

Abstract

Paper Structure (16 sections, 12 figures, 9 tables)

This paper contains 16 sections, 12 figures, 9 tables.

Introduction
Related Work
Mobile Manipulator
Robot Learning
Preliminaries
Proposed Method
Attention Mechanism for Multiple Viewpoints and Their Spatial Regions for Imitation Learning Policy
Fast and Low Computational Resource Augmentation using Fractal Texture
Experiments
Settings: Tasks and Environments
Settings: Implementations
Overall Results
What is the effect of the multiple viewpoints?
How can MM focus on task-related viewpoint?
How can MM focus on task-related regions?
...and 1 more sections

Figures (12)

Figure 1: Example of occlusion of internal viewpoints on the mobile manipulators. Left: first-person viewpoint is occluded by the body of the MM in pick task. Right: in-hand viewpoint is occluded by the grasped object in place task.
Figure 2: Example of visual observation domain shift. Left: environments for training the policy. Middle: distractor objects cause the minor change. Right: unknown furniture causes the major change.
Figure 3: Attention mechanism for multiple viewpoints and their spatial regions. By weighting the features with spatial attention, the information of task-related viewpoints and their spatial regions are extracted in image encoders from multiple visual observations. Since spatial attention is the learnable parameter, our method can learn task-related viewpoints from dataset instead of uniquely defined by hand-craft.
Figure 4: Processing steps of fast and low computational resource augmentation using fractal texture. By detecting and tracking task-related regions, non-task-related regions are augmented with fractal textures. The augmentation facilitates the learning of attention mechanism that focuses strongly on task-related regions which less changed, rather than non-task-related regions that are changed greater with fractal texture.
Figure 5: Overview of the pick-bottle-from-shelf task. Figures are lined in time-step order from left to right. Left: the MM started with $o_{h}$ and $o_{f}$ facing the bottle placed on the shelf. Middle: the MM moved the mobile base and arm to reach the bottle. Right: the MM picked up the bottle from the shelf.
...and 7 more figures

Robust Imitation Learning for Mobile Manipulator Focusing on Task-Related Viewpoints and Regions

TL;DR

Abstract

Robust Imitation Learning for Mobile Manipulator Focusing on Task-Related Viewpoints and Regions

Authors

TL;DR

Abstract

Table of Contents

Figures (12)