Table of Contents
Fetching ...

A Reactive Framework for Whole-Body Motion Planning of Mobile Manipulators Combining Reinforcement Learning and SDF-Constrained Quadratic Programmi

Chenyu Zhang, Shiying Sun, Kuan Liu, Chuanbao Zhou, Xiaoguang Zhao, Min Tan, Yanlong Huang

TL;DR

The paper tackles the challenge of efficient, safe whole-body motion planning for mobile manipulators with redundant DOFs in cluttered environments. It presents a hybrid framework that combines Bayes-DSAC, a distributional off-policy RL method with Bayesian controller fusion, for task-space velocity planning, and a robot-centric SDF-constrained QP for joint-space control, translating high-level commands into collision-free joint velocities. The framework defines a soft return distribution $Z_{pi}(s_t,a_t)$ and a Bayesian-fused hybrid distribution $Z^{hyb}$ to improve value estimation and convergence, while enforcing obstacle avoidance through $A_{avoi}$-based joint-space constraints and SDF queries. Experimental results show faster learning, higher planning efficiency, and improved safety across multiple cluttered scenarios, outperforming several strong baselines. The work advances reactive whole-body planning by integrating perception, learning, and optimization in a coherent, real-time capable pipeline with practical impact for autonomous service and industrial robotics.

Abstract

As an important branch of embodied artificial intelligence, mobile manipulators are increasingly applied in intelligent services, but their redundant degrees of freedom also limit efficient motion planning in cluttered environments. To address this issue, this paper proposes a hybrid learning and optimization framework for reactive whole-body motion planning of mobile manipulators. We develop the Bayesian distributional soft actor-critic (Bayes-DSAC) algorithm to improve the quality of value estimation and the convergence performance of the learning. Additionally, we introduce a quadratic programming method constrained by the signed distance field to enhance the safety of the obstacle avoidance motion. We conduct experiments and make comparison with standard benchmark. The experimental results verify that our proposed framework significantly improves the efficiency of reactive whole-body motion planning, reduces the planning time, and improves the success rate of motion planning. Additionally, the proposed reinforcement learning method ensures a rapid learning process in the whole-body planning task. The novel framework allows mobile manipulators to adapt to complex environments more safely and efficiently.

A Reactive Framework for Whole-Body Motion Planning of Mobile Manipulators Combining Reinforcement Learning and SDF-Constrained Quadratic Programmi

TL;DR

The paper tackles the challenge of efficient, safe whole-body motion planning for mobile manipulators with redundant DOFs in cluttered environments. It presents a hybrid framework that combines Bayes-DSAC, a distributional off-policy RL method with Bayesian controller fusion, for task-space velocity planning, and a robot-centric SDF-constrained QP for joint-space control, translating high-level commands into collision-free joint velocities. The framework defines a soft return distribution and a Bayesian-fused hybrid distribution to improve value estimation and convergence, while enforcing obstacle avoidance through -based joint-space constraints and SDF queries. Experimental results show faster learning, higher planning efficiency, and improved safety across multiple cluttered scenarios, outperforming several strong baselines. The work advances reactive whole-body planning by integrating perception, learning, and optimization in a coherent, real-time capable pipeline with practical impact for autonomous service and industrial robotics.

Abstract

As an important branch of embodied artificial intelligence, mobile manipulators are increasingly applied in intelligent services, but their redundant degrees of freedom also limit efficient motion planning in cluttered environments. To address this issue, this paper proposes a hybrid learning and optimization framework for reactive whole-body motion planning of mobile manipulators. We develop the Bayesian distributional soft actor-critic (Bayes-DSAC) algorithm to improve the quality of value estimation and the convergence performance of the learning. Additionally, we introduce a quadratic programming method constrained by the signed distance field to enhance the safety of the obstacle avoidance motion. We conduct experiments and make comparison with standard benchmark. The experimental results verify that our proposed framework significantly improves the efficiency of reactive whole-body motion planning, reduces the planning time, and improves the success rate of motion planning. Additionally, the proposed reinforcement learning method ensures a rapid learning process in the whole-body planning task. The novel framework allows mobile manipulators to adapt to complex environments more safely and efficiently.

Paper Structure

This paper contains 12 sections, 15 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Overview of the hybrid learning and optimization framework for reactive whole-body motion planning in mobile manipulators.
  • Figure 2: The result of estimation the ensemble of critics by BCF. The composite Q distribution avoids underestimation and mitigates the overestimation of value estimation.
  • Figure 3: Training curves on benchmarks.The solid line represents the average of five runs, and the shaded area indicates the 95% confidence interval. From the results, it can be observed that Bayes-DSAC achieves the highest average return and converges the fastest.
  • Figure 4: Comparison results of training with different replay ratios. The numbers following the type names represent the adopted replay ratios. From the results, it can be observed that, unlike DSAC-T which experiences training collapse with high replay ratios, Bayes-DSAC is able to utilize relatively high replay ratios for training.
  • Figure 5: Four cluttered scenarios(a-d) with the planned results by our hybrid whole-body motion planning framework. The red sphere in the figure shows the target EE position. The blue trajectory is the mobile base path, and the green is the end-effector trajectory.