Table of Contents
Fetching ...

BaSeNet: A Learning-based Mobile Manipulator Base Pose Sequence Planning for Pickup Tasks

Lakshadeep Naik, Sinan Kalkan, Sune L. Sørensen, Mikkel B. Kjærgaard, Norbert Krüger

TL;DR

BaSeNet addresses the challenge of planning a sequence of mobile base poses to pick up multiple objects with minimal total time by formulating the problem as reinforcement learning and solving it with Layered Learning and graph-based representations. It decomposes the task into a grasp-sequence policy and a base-pose policy, training the latter with Soft Actor-Critic and the former with REINFORCE on a Graph Attention Network encoder that handles variable object counts via Graph Node Regression. The approach yields solutions close to exact or approximate baselines but with significantly reduced planning time, and experiments in simulation demonstrate potential for fast re-planning in dynamic scenes. The work highlights practical gains for mobile manipulators but recognizes current limitations around pose uncertainty and self-localization, pointing to future work on uncertainty-aware planning.

Abstract

In many applications, a mobile manipulator robot is required to grasp a set of objects distributed in space. This may not be feasible from a single base pose and the robot must plan the sequence of base poses for grasping all objects, minimizing the total navigation and grasping time. This is a Combinatorial Optimization problem that can be solved using exact methods, which provide optimal solutions but are computationally expensive, or approximate methods, which offer computationally efficient but sub-optimal solutions. Recent studies have shown that learning-based methods can solve Combinatorial Optimization problems, providing near-optimal and computationally efficient solutions. In this work, we present BASENET - a learning-based approach to plan the sequence of base poses for the robot to grasp all the objects in the scene. We propose a Reinforcement Learning based solution that learns the base poses for grasping individual objects and the sequence in which the objects should be grasped to minimize the total navigation and grasping costs using Layered Learning. As the problem has a varying number of states and actions, we represent states and actions as a graph and use Graph Neural Networks for learning. We show that the proposed method can produce comparable solutions to exact and approximate methods with significantly less computation time.

BaSeNet: A Learning-based Mobile Manipulator Base Pose Sequence Planning for Pickup Tasks

TL;DR

BaSeNet addresses the challenge of planning a sequence of mobile base poses to pick up multiple objects with minimal total time by formulating the problem as reinforcement learning and solving it with Layered Learning and graph-based representations. It decomposes the task into a grasp-sequence policy and a base-pose policy, training the latter with Soft Actor-Critic and the former with REINFORCE on a Graph Attention Network encoder that handles variable object counts via Graph Node Regression. The approach yields solutions close to exact or approximate baselines but with significantly reduced planning time, and experiments in simulation demonstrate potential for fast re-planning in dynamic scenes. The work highlights practical gains for mobile manipulators but recognizes current limitations around pose uncertainty and self-localization, pointing to future work on uncertainty-aware planning.

Abstract

In many applications, a mobile manipulator robot is required to grasp a set of objects distributed in space. This may not be feasible from a single base pose and the robot must plan the sequence of base poses for grasping all objects, minimizing the total navigation and grasping time. This is a Combinatorial Optimization problem that can be solved using exact methods, which provide optimal solutions but are computationally expensive, or approximate methods, which offer computationally efficient but sub-optimal solutions. Recent studies have shown that learning-based methods can solve Combinatorial Optimization problems, providing near-optimal and computationally efficient solutions. In this work, we present BASENET - a learning-based approach to plan the sequence of base poses for the robot to grasp all the objects in the scene. We propose a Reinforcement Learning based solution that learns the base poses for grasping individual objects and the sequence in which the objects should be grasped to minimize the total navigation and grasping costs using Layered Learning. As the problem has a varying number of states and actions, we represent states and actions as a graph and use Graph Neural Networks for learning. We show that the proposed method can produce comparable solutions to exact and approximate methods with significantly less computation time.
Paper Structure (17 sections, 9 equations, 8 figures, 1 table)

This paper contains 17 sections, 9 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: BaSeNetTop row: The robot represents the scene as a graph and determines the next object to grasp using the grasp sequence policy. Bottom row: The robot predicts the base pose for grasping the selected object using the base pose policy and performs the action.
  • Figure 2: Proposed Layered Approach - BaSeNet: (b) Layer 2: Learning base poses for grasping individual objects (base pose policy $\pi_{\text{bp}}$). (a) Layer 1: Learning object grasping sequences (grasp sequence policy $\pi_{\text{seq}}$) using already learned $\pi_{\text{bp}}$ for determining the base pose for grasping for the selected object. The example demonstrates that in Layer 1, the object $o_1$ is selected for grasping as it receives the highest probability. The base pose policy is then used in Layer 2 to determine the base pose for grasping object $o_1$.
  • Figure 3: Attention-Based Graph Encoder.
  • Figure 4: Decoder: The decoder takes the object context embeddings provided by the encoder as input and outputs the grasp probability for each object. Already grasped objects are masked during subsequent time steps. The example demonstrates the construction of a grasp sequence for a scene with four objects.
  • Figure 5: Sequence of base poses planned by baselines and our method for a random scene. Red highlights unacceptable success rates (objects grasped) & hence blue italic values are irrelevant. Predicted base poses are centers of the mobile base. The manipulator is positioned on the right back corner of the mobile base as can be seen in Fig. \ref{['fig:idea']}.
  • ...and 3 more figures