Table of Contents
Fetching ...

SM2ITH: Safe Mobile Manipulation with Interactive Human Prediction via Task-Hierarchical Bilevel Model Predictive Control

Francesco D'Orazio, Sepehr Samavi, Xintong Du, Siqi Zhou, Giuseppe Oriolo, Angela P. Schoellig

TL;DR

This work tackles safe, prioritized mobile manipulation in human-centered environments by integrating Hierarchical Task Model Predictive Control with interactive, bilevel human motion prediction. The approach, SM$^2$ITH, couples HTMPC with ORCA-based predictions and discrete safety constraints (DT-CBF) to produce joint robot plans and human trajectories that respect task priorities while remaining collision-free. Extensive experiments on two mobile manipulators across multiple scenarios show that interactive predictions improve safety and efficiency, particularly under crowding and adversarial interactions, outperforming weighted-sum and open-loop baselines. The results demonstrate the practical impact of tightly coupling human-aware prediction with multitask predictive control for real-time, safe, and efficient robot behavior in shared spaces.

Abstract

Mobile manipulators are designed to perform complex sequences of navigation and manipulation tasks in human-centered environments. While recent optimization-based methods such as Hierarchical Task Model Predictive Control (HTMPC) enable efficient multitask execution with strict task priorities, they have so far been applied mainly to static or structured scenarios. Extending these approaches to dynamic human-centered environments requires predictive models that capture how humans react to the actions of the robot. This work introduces Safe Mobile Manipulation with Interactive Human Prediction via Task-Hierarchical Bilevel Model Predictive Control (SM$^2$ITH), a unified framework that combines HTMPC with interactive human motion prediction through bilevel optimization that jointly accounts for robot and human dynamics. The framework is validated on two different mobile manipulators, the Stretch 3 and the Ridgeback-UR10, across three experimental settings: (i) delivery tasks with different navigation and manipulation priorities, (ii) sequential pick-and-place tasks with different human motion prediction models, and (iii) interactions involving adversarial human behavior. Our results highlight how interactive prediction enables safe and efficient coordination, outperforming baselines that rely on weighted objectives or open-loop human models.

SM2ITH: Safe Mobile Manipulation with Interactive Human Prediction via Task-Hierarchical Bilevel Model Predictive Control

TL;DR

This work tackles safe, prioritized mobile manipulation in human-centered environments by integrating Hierarchical Task Model Predictive Control with interactive, bilevel human motion prediction. The approach, SMITH, couples HTMPC with ORCA-based predictions and discrete safety constraints (DT-CBF) to produce joint robot plans and human trajectories that respect task priorities while remaining collision-free. Extensive experiments on two mobile manipulators across multiple scenarios show that interactive predictions improve safety and efficiency, particularly under crowding and adversarial interactions, outperforming weighted-sum and open-loop baselines. The results demonstrate the practical impact of tightly coupling human-aware prediction with multitask predictive control for real-time, safe, and efficient robot behavior in shared spaces.

Abstract

Mobile manipulators are designed to perform complex sequences of navigation and manipulation tasks in human-centered environments. While recent optimization-based methods such as Hierarchical Task Model Predictive Control (HTMPC) enable efficient multitask execution with strict task priorities, they have so far been applied mainly to static or structured scenarios. Extending these approaches to dynamic human-centered environments requires predictive models that capture how humans react to the actions of the robot. This work introduces Safe Mobile Manipulation with Interactive Human Prediction via Task-Hierarchical Bilevel Model Predictive Control (SMITH), a unified framework that combines HTMPC with interactive human motion prediction through bilevel optimization that jointly accounts for robot and human dynamics. The framework is validated on two different mobile manipulators, the Stretch 3 and the Ridgeback-UR10, across three experimental settings: (i) delivery tasks with different navigation and manipulation priorities, (ii) sequential pick-and-place tasks with different human motion prediction models, and (iii) interactions involving adversarial human behavior. Our results highlight how interactive prediction enables safe and efficient coordination, outperforming baselines that rely on weighted objectives or open-loop human models.

Paper Structure

This paper contains 14 sections, 6 equations, 7 figures.

Figures (7)

  • Figure 1: Snapshots of pick-and-place tasks in human-centered environments with the Ridgeback–UR10 (top) and Stretch 3 (bottom) platforms along with a top-down view animation. The illustrated SM$^2$ITH solutions consist of planned robot trajectories coupled with human predictions allowing the robot to interactively navigate among humans while coordinating multiple prioritized mobile manipulation tasks. Video: http://tiny.cc/sm2ith.
  • Figure 2: Proposed control architecture composed of three main blocks. The motion generation block (orange) computes the control commands using a hierarchical predictive formulation that accounts for both robot tasks and predicted human motion. The planning block (green) organizes the tasks into a priority hierarchy and generates the corresponding references to be tracked. The human estimation block (blue) detects humans in the environment and estimates their motion, providing predictions that feed into the control loop.
  • Figure 3: Box-and-whisker plots summarizing the results of task prioritization experiments comparing the SM$^2$ITH and the weighted sum method. For cases where the end effector task is prioritized we present the average and maximum end-effector tracking error and the navigation efficiency in the Banner tasks (end effector task prioritized E$\ge$B) and in the Cup task (base task prioritized B$\ge$E). Median values are illustrated by the line in the box-and-whisker plots and mean values are indicated by the $\times$. At higher human densities (3 humans), the proposed SM$^2$ITH method outperforms the weighted sum controller on the high-priority task, highlighting the benefits of strict lexicographic prioritization.
  • Figure 4: In both scenarios, the human slowly approaches the robot arm, and the robot has two options to avoid collision, either by retracting the end effector or moving the base. In the banner transport scenario (top), the end effector task has higher priority and the base of the robot moves to avoid collision with the human, while in the cup transport scenario (bottom), the base task has higher priority, and the end effector retracts instead. These examples showcase the potential of leveraging hierarchical formulation to encode different robot behaviors based on the context.
  • Figure 5: Experimental setup for the pick-and-place experiments. The robot starts from the illustrated position and must pick the red hat and place it into the target and end the run by returning to the start point. Each scenario involves $n_h \in \{1,2\}$ humans, starting at the illustrated positions and crossing to the opposite side of the room each time the robot travels between start, pick, place, and end locations.
  • ...and 2 more figures