Table of Contents
Fetching ...

Bayesian Intention for Enhanced Human Robot Collaboration

Vanessa Hernandez-Cruz, Xiaotong Zhang, Kamal Youcef-Toumi

TL;DR

A novel Bayesian Intention (BI) framework is developed to predict human intent within a multi-modality information framework in HRC scenarios, which captures the complexity of intent prediction by modeling the correlations between human behavior conventions and scene data.

Abstract

Predicting human intent is challenging yet essential to achieving seamless Human-Robot Collaboration (HRC). Many existing approaches fail to fully exploit the inherent relationships between objects, tasks, and the human model. Current methods for predicting human intent, such as Gaussian Mixture Models (GMMs) and Conditional Random Fields (CRFs), often lack interpretability due to their failure to account for causal relationships between variables. To address these challenges, in this paper, we developed a novel Bayesian Intention (BI) framework to predict human intent within a multi-modality information framework in HRC scenarios. This framework captures the complexity of intent prediction by modeling the correlations between human behavior conventions and scene data. Our framework leverages these inferred intent predictions to optimize the robot's response in real-time, enabling smoother and more intuitive collaboration. We demonstrate the effectiveness of our approach through a HRC task involving a UR5 robot, highlighting BI's capability for real-time human intent prediction and collision avoidance using a unique dataset we created. Our evaluations show that the multi-modality BI model predicts human intent within 2.69ms, with a 36% increase in precision, a 60% increase in F1 Score, and an 85% increase in accuracy compared to its best baseline method. The results underscore BI's potential to advance real-time human intent prediction and collision avoidance, making a significant contribution to the field of HRC.

Bayesian Intention for Enhanced Human Robot Collaboration

TL;DR

A novel Bayesian Intention (BI) framework is developed to predict human intent within a multi-modality information framework in HRC scenarios, which captures the complexity of intent prediction by modeling the correlations between human behavior conventions and scene data.

Abstract

Predicting human intent is challenging yet essential to achieving seamless Human-Robot Collaboration (HRC). Many existing approaches fail to fully exploit the inherent relationships between objects, tasks, and the human model. Current methods for predicting human intent, such as Gaussian Mixture Models (GMMs) and Conditional Random Fields (CRFs), often lack interpretability due to their failure to account for causal relationships between variables. To address these challenges, in this paper, we developed a novel Bayesian Intention (BI) framework to predict human intent within a multi-modality information framework in HRC scenarios. This framework captures the complexity of intent prediction by modeling the correlations between human behavior conventions and scene data. Our framework leverages these inferred intent predictions to optimize the robot's response in real-time, enabling smoother and more intuitive collaboration. We demonstrate the effectiveness of our approach through a HRC task involving a UR5 robot, highlighting BI's capability for real-time human intent prediction and collision avoidance using a unique dataset we created. Our evaluations show that the multi-modality BI model predicts human intent within 2.69ms, with a 36% increase in precision, a 60% increase in F1 Score, and an 85% increase in accuracy compared to its best baseline method. The results underscore BI's potential to advance real-time human intent prediction and collision avoidance, making a significant contribution to the field of HRC.
Paper Structure (19 sections, 12 equations, 7 figures, 1 table)

This paper contains 19 sections, 12 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: The framework of Bayesian Intention and its application for enhanced HRC. The offline portion of the framework feeds a learned conditional probability table (CPT) to the online portion that predicts human intent by modeling causal relationships between the human and the scene where the robot changes its path and task plan accordingly.
  • Figure 2: The key points used to derive the features of the three modalities of information for (a) head orientation; (b) hand orientation; (c) hand motion. The head shot is from mediapipe.
  • Figure 3: The illustration of (a) the experimental setup and the initial state of the HRC task; (b) the goal of the HRC task. The objects of interest, potential targets, are a cereal box, a banana, and a milk jug. $x$-, $y$-, and $z$- axis are shown with red, green, and blue arrows, respectively.
  • Figure 4: An example of the subject's head orientation and right hand position for the task of making a bowl of cereal. The cyan spheres represent the location of the objects in the scene. The black, pink, and green points represent the three keypoints of the subject’s hand that are being tracked. The red and blue points represent the two endpoints for the vector representing head orientation.
  • Figure 5: Confusion matrix depicting the human intentions when the robot is not involved in the task. The main diagonal highlights BI's impressive ability to predict human intent accurately.
  • ...and 2 more figures