Table of Contents
Fetching ...

Predictive Reachability for Embodiment Selection in Mobile Manipulation Behaviors

Xiaoxu Feng, Takato Horii, Takayuki Nagai

TL;DR

Predictive reachability is introduced, a new approach that decides reachability based on predicted trajectories that outperforms previous model-based approaches in both sample efficiency and performance, while also enabling more reasonable embodiment selection based on predictive reachability.

Abstract

Mobile manipulators require coordinated control between navigation and manipulation to accomplish tasks. Typically, coordinated mobile manipulation behaviors have base navigation to approach the goal followed by arm manipulation to reach the desired pose. Selecting the embodiment between the base and arm can be determined based on reachability. Previous methods evaluate reachability by computing inverse kinematics and activate arm motions once solutions are identified. In this study, we introduce a new approach called predictive reachability that decides reachability based on predicted arm motions. Our model utilizes a hierarchical policy framework built upon a world model. The world model allows the prediction of future trajectories and the evaluation of reachability. The hierarchical policy selects the embodiment based on the predicted reachability and plans accordingly. Unlike methods that require prior knowledge about robots and environments for inverse kinematics, our method only relies on image-based observations. We evaluate our approach through basic reaching tasks across various environments. The results demonstrate that our method outperforms previous model-based approaches in both sample efficiency and performance, while enabling more reasonable embodiment selection based on predictive reachability.

Predictive Reachability for Embodiment Selection in Mobile Manipulation Behaviors

TL;DR

Predictive reachability is introduced, a new approach that decides reachability based on predicted trajectories that outperforms previous model-based approaches in both sample efficiency and performance, while also enabling more reasonable embodiment selection based on predictive reachability.

Abstract

Mobile manipulators require coordinated control between navigation and manipulation to accomplish tasks. Typically, coordinated mobile manipulation behaviors have base navigation to approach the goal followed by arm manipulation to reach the desired pose. Selecting the embodiment between the base and arm can be determined based on reachability. Previous methods evaluate reachability by computing inverse kinematics and activate arm motions once solutions are identified. In this study, we introduce a new approach called predictive reachability that decides reachability based on predicted arm motions. Our model utilizes a hierarchical policy framework built upon a world model. The world model allows the prediction of future trajectories and the evaluation of reachability. The hierarchical policy selects the embodiment based on the predicted reachability and plans accordingly. Unlike methods that require prior knowledge about robots and environments for inverse kinematics, our method only relies on image-based observations. We evaluate our approach through basic reaching tasks across various environments. The results demonstrate that our method outperforms previous model-based approaches in both sample efficiency and performance, while enabling more reasonable embodiment selection based on predictive reachability.

Paper Structure

This paper contains 24 sections, 8 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Embodiment selection based on predictive reachability. It prefers the arm when goal-reaching is predicted within imagined rollouts. Otherwise, the base is selected to move.
  • Figure 2: Demonstrations of coordinated mobile manipulation behaviors. Each has two stages where arm motions are activated after the base has approached the goal.
  • Figure 3: Overview of MMDirector. MMDirector comprises a world model and a three-level hierarchical policy. The world model compresses observations into latent states and predicts ahead in that latent space. It is trained through observation reconstruction and learns to predict task rewards and collisions, which are used for reachability evaluations. The selector takes the current state and two stage goals from demonstrations as input to determine embodiment selection. The manager, conditioned on the final goal and embodiment selection, produces discrete codes that are decoded into subgoals within the latent space. The worker generates actions conditioned on subgoals, which are masked by the embodiment selection. All three levels of the policy are trained concurrently using actor-critic algorithms to maximize their respective reward functions.
  • Figure 4: Evaluation of predictive reachability. From one latent state, a rollout is predicted. Task rewards and collisions are then predicted given the rollout. Predictive reachability is identified if the sequence of task rewards and collisions satisfies the conditions.
  • Figure 5: Evaluation environments. We prepare four environments to evaluate models. In all environments, the robot is to reach its end-effector to the goal marked by a black sphere. (a) Empty: No obstacle and no collision are considered. (b) Obstacle for base: A fixed block is placed between the robot and the goal, which impedes base motions but has no effect on arm motions near the goal. (c) Obstacle for arm: A block is placed below the goal. A stick is placed diagonally in front of the goal, hindering arm motions from one side. (d) Realistic room: A room space. The goal is located at the corner of one table.
  • ...and 5 more figures