Table of Contents
Fetching ...

A Modular Framework for Flexible Planning in Human-Robot Collaboration

Valerio Belcamino, Mariya Kilina, Linda Lastrico, Alessandro Carfì, Fulvio Mastrogiovanni

TL;DR

This paper addresses scalable human-robot collaboration in real-world assembly tasks by proposing a modular, HTN-based planning framework coupled with a multisensory perception pipeline. It formalizes the interaction state and a small set of primitive actions, enabling online, explainable planning for one or more agents. The approach is demonstrated on Baxter hardware with two humans assembling four furniture items, showing low planning overhead relative to task duration and promising adaptability. The work highlights practical implications for flexible, interpretable HRC and outlines avenues for enhancing perception, automation of planning, and parallel execution. Overall, the framework provides a scalable blueprint for deploying cooperative robot assistants across diverse application domains.

Abstract

This paper presents a comprehensive framework to enhance Human-Robot Collaboration (HRC) in real-world scenarios. It introduces a formalism to model articulated tasks, requiring cooperation between two agents, through a smaller set of primitives. Our implementation leverages Hierarchical Task Networks (HTN) planning and a modular multisensory perception pipeline, which includes vision, human activity recognition, and tactile sensing. To showcase the system's scalability, we present an experimental scenario where two humans alternate in collaborating with a Baxter robot to assemble four pieces of furniture with variable components. This integration highlights promising advancements in HRC, suggesting a scalable approach for complex, cooperative tasks across diverse applications.

A Modular Framework for Flexible Planning in Human-Robot Collaboration

TL;DR

This paper addresses scalable human-robot collaboration in real-world assembly tasks by proposing a modular, HTN-based planning framework coupled with a multisensory perception pipeline. It formalizes the interaction state and a small set of primitive actions, enabling online, explainable planning for one or more agents. The approach is demonstrated on Baxter hardware with two humans assembling four furniture items, showing low planning overhead relative to task duration and promising adaptability. The work highlights practical implications for flexible, interpretable HRC and outlines avenues for enhancing perception, automation of planning, and parallel execution. Overall, the framework provides a scalable blueprint for deploying cooperative robot assistants across diverse application domains.

Abstract

This paper presents a comprehensive framework to enhance Human-Robot Collaboration (HRC) in real-world scenarios. It introduces a formalism to model articulated tasks, requiring cooperation between two agents, through a smaller set of primitives. Our implementation leverages Hierarchical Task Networks (HTN) planning and a modular multisensory perception pipeline, which includes vision, human activity recognition, and tactile sensing. To showcase the system's scalability, we present an experimental scenario where two humans alternate in collaborating with a Baxter robot to assemble four pieces of furniture with variable components. This integration highlights promising advancements in HRC, suggesting a scalable approach for complex, cooperative tasks across diverse applications.
Paper Structure (6 sections, 1 equation, 5 figures, 1 table)

This paper contains 6 sections, 1 equation, 5 figures, 1 table.

Figures (5)

  • Figure 1: A top view of the experimental scenario, in which the collaborative robot Baxter waits for the human to assemble the furniture pieces before continuing the interaction. We show in blue the robot workspace and in green the shared workspace. The labels O1 to O6 point to the components needed for the assembly.
  • Figure 2: Architecture diagram of the system. The HTN planner can activate the perception modules to update its state and move the robot using the Joint Trajectory Client. The perception module is composed of cameras and wearable and tactile sensors. The vision has three different modules: Localize Multiple Markers to identify and estimate the positions of the markers in the scene, Refine Marker Pose to improve the estimated position of a single marker before grasping, and Box Handover Detection to detect handover of small components such as screws. The wearables have been used to detect when the human is idle and the Tactile sensing to automatize the handover of tools using the shear forces.
  • Figure 3: The plots show the fluency metrics expressed as a percentage of the assembly time. From left to right we provide human idle time, robot idle time, functional delay and concurrent action time.
  • Figure 4: Time needed for each action involving perception during the collaborative scenario. The different colours refer to the perception modality associated with each action.
  • Figure 5: The picture represents four frames from the experimental scenario referring to the four fluency metrics. The frames respectively represent Robot Idle time (a), Human Idle time (b), Functional Delay (c) and Concurrent Action (d).