Table of Contents
Fetching ...

Active-Perceptive Motion Generation for Mobile Manipulation

Snehal Jauhri, Sophie Lueth, Georgia Chalvatzaki

TL;DR

This work introduces an active perception pipeline for mobile manipulators to generate motions that are informative toward manipulation tasks, such as grasping in unknown, cluttered scenes, and demonstrates the transfer of the mobile grasping strategy to the real world, indicating a promising direction for active-perceptive MoMa.

Abstract

Mobile Manipulation (MoMa) systems incorporate the benefits of mobility and dexterity, due to the enlarged space in which they can move and interact with their environment. However, even when equipped with onboard sensors, e.g., an embodied camera, extracting task-relevant visual information in unstructured and cluttered environments, such as households, remains challenging. In this work, we introduce an active perception pipeline for mobile manipulators to generate motions that are informative toward manipulation tasks, such as grasping in unknown, cluttered scenes. Our proposed approach, ActPerMoMa, generates robot paths in a receding horizon fashion by sampling paths and computing path-wise utilities. These utilities trade-off maximizing the visual Information Gain (IG) for scene reconstruction and the task-oriented objective, e.g., grasp success, by maximizing grasp reachability. We show the efficacy of our method in simulated experiments with a dual-arm TIAGo++ MoMa robot performing mobile grasping in cluttered scenes with obstacles. We empirically analyze the contribution of various utilities and parameters, and compare against representative baselines both with and without active perception objectives. Finally, we demonstrate the transfer of our mobile grasping strategy to the real world, indicating a promising direction for active-perceptive MoMa.

Active-Perceptive Motion Generation for Mobile Manipulation

TL;DR

This work introduces an active perception pipeline for mobile manipulators to generate motions that are informative toward manipulation tasks, such as grasping in unknown, cluttered scenes, and demonstrates the transfer of the mobile grasping strategy to the real world, indicating a promising direction for active-perceptive MoMa.

Abstract

Mobile Manipulation (MoMa) systems incorporate the benefits of mobility and dexterity, due to the enlarged space in which they can move and interact with their environment. However, even when equipped with onboard sensors, e.g., an embodied camera, extracting task-relevant visual information in unstructured and cluttered environments, such as households, remains challenging. In this work, we introduce an active perception pipeline for mobile manipulators to generate motions that are informative toward manipulation tasks, such as grasping in unknown, cluttered scenes. Our proposed approach, ActPerMoMa, generates robot paths in a receding horizon fashion by sampling paths and computing path-wise utilities. These utilities trade-off maximizing the visual Information Gain (IG) for scene reconstruction and the task-oriented objective, e.g., grasp success, by maximizing grasp reachability. We show the efficacy of our method in simulated experiments with a dual-arm TIAGo++ MoMa robot performing mobile grasping in cluttered scenes with obstacles. We empirically analyze the contribution of various utilities and parameters, and compare against representative baselines both with and without active perception objectives. Finally, we demonstrate the transfer of our mobile grasping strategy to the real world, indicating a promising direction for active-perceptive MoMa.
Paper Structure (14 sections, 3 equations, 3 figures, 3 tables)

This paper contains 14 sections, 3 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: ActPerMoMa pipeline. Using a rough initial knowledge about the target area or target object position $\widetilde{\mathbf{p}}_{target}$, we continuously plan and execute informative motions for the mobile grasping task. At every timestep $t$, the RGBD information from the head-mounted embodied camera is integrated into a scene TSDF for both grasp detection and information gain computation. Using the currently known free space for movement of the robot base, we sample candidate robot paths $\mathcal{T}$, including both base and camera poses, towards the target. For each candidate path $\tau_j \in \mathcal{T}$, we compute the information gained from camera views $\mathbf{p}_{j,cam}^{i}$ in the path, and the reachability of stable detected grasps from the final base poses $\mathbf{p}_{j,base}^{goal}$ in the path. We trade-off these objectives with a receding horizon cost $J_{\tau}$ and take a step of the optimal path $\tau^*$ for execution at every timestep.
  • Figure 2: Example scene with sampled candidate paths (blue) for the robot pose towards the target object (red box). The paths consist of SE(2) poses for the base and SE(3) poses for the head-mounted camera (visualized from the robot to the base goals). The current optimal path is highlighted in green.
  • Figure 3: Left: Example rear-side Information Gain (IG) for a candidate view. Pink voxels denote observed TSDF voxels. Blue voxels are on the rear side of the observed TSDF, which could be revealed by a candidate view. Views are colored red to green, denoting lower to higher IG. Right: Reachability map of the robot's left arm, reduced from 6 dimensions (SE(3)) to 3 for visualization. Red and green points denote lower and higher reachability. Current detected 6D grasps are visualized in green on a target object.