Table of Contents
Fetching ...

Uncertainty Mitigation and Intent Inference: A Dual-Mode Human-Machine Joint Planning System

Zeyu Fang, Yuxin Lin, Cheng Liu, Beomyeol Yu, Zeyuan Yang, Rongqian Chen, Taeyoung Lee, Mahdi Imani, Tian Lan

TL;DR

A unified human-robot joint planning system designed to tackle dual sources of uncertainty: task-relevant knowledge gaps and latent human intent, validated in both Gazebo simulations and real-world UAV deployments integrated with a Vision-Language Model-based 3D semantic perception pipeline.

Abstract

Effective human-robot collaboration in open-world environments requires joint planning under uncertain conditions. However, existing approaches often treat humans as passive supervisors, preventing autonomous agents from becoming human-like teammates that can actively model teammate behaviors, reason about knowledge gaps, query, and elicit responses through communication to resolve uncertainties. To address these limitations, we propose a unified human-robot joint planning system designed to tackle dual sources of uncertainty: task-relevant knowledge gaps and latent human intent. Our system operates in two complementary modes. First, an uncertainty-mitigation joint planning module enables two-way conversations to resolve semantic ambiguity and object uncertainty. It utilizes an LLM-assisted active elicitation mechanism and a hypothesis-augmented A^* search, subsequently computing an optimal querying policy via dynamic programming to minimize interaction and verification costs. Second, a real-time intent-aware collaboration module maintains a probabilistic belief over the human's latent task intent via spatial and directional cues, enabling dynamic, coordination-aware task selection for agents without explicit communication. We validate the proposed system in both Gazebo simulations and real-world UAV deployments integrated with a Vision-Language Model (VLM)-based 3D semantic perception pipeline. Experimental results demonstrate that the system significantly cuts the interaction cost by 51.9% in uncertainty-mitigation planning and reduces the task execution time by 25.4% in intent-aware cooperation compared to the baselines.

Uncertainty Mitigation and Intent Inference: A Dual-Mode Human-Machine Joint Planning System

TL;DR

A unified human-robot joint planning system designed to tackle dual sources of uncertainty: task-relevant knowledge gaps and latent human intent, validated in both Gazebo simulations and real-world UAV deployments integrated with a Vision-Language Model-based 3D semantic perception pipeline.

Abstract

Effective human-robot collaboration in open-world environments requires joint planning under uncertain conditions. However, existing approaches often treat humans as passive supervisors, preventing autonomous agents from becoming human-like teammates that can actively model teammate behaviors, reason about knowledge gaps, query, and elicit responses through communication to resolve uncertainties. To address these limitations, we propose a unified human-robot joint planning system designed to tackle dual sources of uncertainty: task-relevant knowledge gaps and latent human intent. Our system operates in two complementary modes. First, an uncertainty-mitigation joint planning module enables two-way conversations to resolve semantic ambiguity and object uncertainty. It utilizes an LLM-assisted active elicitation mechanism and a hypothesis-augmented A^* search, subsequently computing an optimal querying policy via dynamic programming to minimize interaction and verification costs. Second, a real-time intent-aware collaboration module maintains a probabilistic belief over the human's latent task intent via spatial and directional cues, enabling dynamic, coordination-aware task selection for agents without explicit communication. We validate the proposed system in both Gazebo simulations and real-world UAV deployments integrated with a Vision-Language Model (VLM)-based 3D semantic perception pipeline. Experimental results demonstrate that the system significantly cuts the interaction cost by 51.9% in uncertainty-mitigation planning and reduces the task execution time by 25.4% in intent-aware cooperation compared to the baselines.
Paper Structure (22 sections, 4 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 22 sections, 4 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of the proposed system and the illustrations of the two modules inside the core planning engine supporting different planning modes. The system started with analyzing surrounding environments via perception models. Next, the central interface will activate one of the planning mode based on the task type to begin planning. The planning results will be present as either paths with way-points or high-level goals, and sent to the low-level controller, which will then drive the drone to carry out the plan. The details of the two planning mode are further elaborated in Section \ref{['chapter:Method']}.
  • Figure 2: Semantic Fusion: The top row displays the original RGB inputs. The bottom row illustrates the semantic segmentation masks after the recursive fusion strategy.
  • Figure 3: Voice interface and Gazebo simulation environment.
  • Figure 4: Evaluation setups: (a) shows the joint planning configuration, while (b) details the intent-aware setup.
  • Figure 5: Task completion progress (left) and expected remaining travel distance (right) over time for the proposed method and non-cooperative baseline.
  • ...and 1 more figures