BaSeNet: A Learning-based Mobile Manipulator Base Pose Sequence Planning for Pickup Tasks
Lakshadeep Naik, Sinan Kalkan, Sune L. Sørensen, Mikkel B. Kjærgaard, Norbert Krüger
TL;DR
BaSeNet addresses the challenge of planning a sequence of mobile base poses to pick up multiple objects with minimal total time by formulating the problem as reinforcement learning and solving it with Layered Learning and graph-based representations. It decomposes the task into a grasp-sequence policy and a base-pose policy, training the latter with Soft Actor-Critic and the former with REINFORCE on a Graph Attention Network encoder that handles variable object counts via Graph Node Regression. The approach yields solutions close to exact or approximate baselines but with significantly reduced planning time, and experiments in simulation demonstrate potential for fast re-planning in dynamic scenes. The work highlights practical gains for mobile manipulators but recognizes current limitations around pose uncertainty and self-localization, pointing to future work on uncertainty-aware planning.
Abstract
In many applications, a mobile manipulator robot is required to grasp a set of objects distributed in space. This may not be feasible from a single base pose and the robot must plan the sequence of base poses for grasping all objects, minimizing the total navigation and grasping time. This is a Combinatorial Optimization problem that can be solved using exact methods, which provide optimal solutions but are computationally expensive, or approximate methods, which offer computationally efficient but sub-optimal solutions. Recent studies have shown that learning-based methods can solve Combinatorial Optimization problems, providing near-optimal and computationally efficient solutions. In this work, we present BASENET - a learning-based approach to plan the sequence of base poses for the robot to grasp all the objects in the scene. We propose a Reinforcement Learning based solution that learns the base poses for grasping individual objects and the sequence in which the objects should be grasped to minimize the total navigation and grasping costs using Layered Learning. As the problem has a varying number of states and actions, we represent states and actions as a graph and use Graph Neural Networks for learning. We show that the proposed method can produce comparable solutions to exact and approximate methods with significantly less computation time.
