Table of Contents
Fetching ...

IG-MCTS: Human-in-the-Loop Cooperative Navigation under Incomplete Information

Shenghui Chen, Ruihan Zhao, Sandeep Chinchali, Ufuk Topcu

TL;DR

This work addresses human–robot cooperative navigation under incomplete information by introducing CoNav-Maze and a planning framework called IG-MCTS, guided by a Neural Human Perception Model (NHPM). IG-MCTS balances task progress with informative communication by treating information sharing as an actionable decision, using a learned NHPM to predict human belief updates and an augmentation of the reward with an information term. Empirical results show IG-MCTS reduces communication by up to two orders of magnitude and lowers cognitive load (eye-tracking metrics) while maintaining competitive task performance, with a scalable continuous-space extension via Voronoi-graph planning. The approach demonstrates strong potential for scalable, low-bandwidth human–robot collaboration in real-world navigation, rescue, and exploration tasks.

Abstract

Human-robot cooperative navigation is challenging under incomplete information. We introduce CoNav-Maze, a simulated environment where a robot navigates with local perception while a human operator provides guidance based on an inaccurate map. The robot can share its onboard camera views to help the operator refine their understanding of the environment. To enable efficient cooperation, we propose Information Gain Monte Carlo Tree Search (IG-MCTS), an online planning algorithm that jointly optimizes autonomous movement and informative communication. IG-MCTS leverages a learned Neural Human Perception Model (NHPM) -- trained on a crowdsourced mapping dataset -- to predict how the human's internal map evolves as new observations are shared. User studies show that IG-MCTS significantly reduces communication demands and yields eye-tracking metrics indicative of lower cognitive load, while maintaining task performance comparable to teleoperation and instruction-following baselines. Finally, we illustrate generalization beyond discrete mazes through a continuous-space waterway navigation setting, in which NHPM benefits from deeper encoder-decoder architectures and IG-MCTS leverages a dynamically constructed Voronoi-partitioned traversability graph.

IG-MCTS: Human-in-the-Loop Cooperative Navigation under Incomplete Information

TL;DR

This work addresses human–robot cooperative navigation under incomplete information by introducing CoNav-Maze and a planning framework called IG-MCTS, guided by a Neural Human Perception Model (NHPM). IG-MCTS balances task progress with informative communication by treating information sharing as an actionable decision, using a learned NHPM to predict human belief updates and an augmentation of the reward with an information term. Empirical results show IG-MCTS reduces communication by up to two orders of magnitude and lowers cognitive load (eye-tracking metrics) while maintaining competitive task performance, with a scalable continuous-space extension via Voronoi-graph planning. The approach demonstrates strong potential for scalable, low-bandwidth human–robot collaboration in real-world navigation, rescue, and exploration tasks.

Abstract

Human-robot cooperative navigation is challenging under incomplete information. We introduce CoNav-Maze, a simulated environment where a robot navigates with local perception while a human operator provides guidance based on an inaccurate map. The robot can share its onboard camera views to help the operator refine their understanding of the environment. To enable efficient cooperation, we propose Information Gain Monte Carlo Tree Search (IG-MCTS), an online planning algorithm that jointly optimizes autonomous movement and informative communication. IG-MCTS leverages a learned Neural Human Perception Model (NHPM) -- trained on a crowdsourced mapping dataset -- to predict how the human's internal map evolves as new observations are shared. User studies show that IG-MCTS significantly reduces communication demands and yields eye-tracking metrics indicative of lower cognitive load, while maintaining task performance comparable to teleoperation and instruction-following baselines. Finally, we illustrate generalization beyond discrete mazes through a continuous-space waterway navigation setting, in which NHPM benefits from deeper encoder-decoder architectures and IG-MCTS leverages a dynamically constructed Voronoi-partitioned traversability graph.

Paper Structure

This paper contains 41 sections, 10 equations, 12 figures, 3 tables, 4 algorithms.

Figures (12)

  • Figure 1: We enable efficient human-robot collaboration in a maze navigation setting. Left: The robot gathers local observations, while the human navigates using an imprecise global map. The robot can transmit images to improve the human’s understanding of the environment, while the human assists by suggesting paths. Right: Main contributions: (1) crowdsourcing a dataset of human perceptual updates, (2) training a neural human perception model, (3) developing Information Gain Monte Carlo Tree Search (IG-MCTS), a planning algorithm that balances task progress with informative communication. (4) validating the approach through a user study with eye-tracking and task metrics.
  • Figure 2: Neural Human Perception Model. Inputs: The human's current perception, the robot's path since the last transmission, and the captured environment grids are processed into 2D masks. Outputs: Probability masks for adding and removing walls.
  • Figure 3: Visualization of human perception models. The left two columns show inputs: the human's current map, the robot's path, and the visible grids communicated by the robot. The models predict how the human will update the maze based on this information. Top: The human mistakenly marks a nearby wall at the wrong location. Despite never encountering this exact scenario, NHPM successfully anticipates the error by generalizing from similar training examples. Bottom: The human correctly adds a distant wall instead of a nearby one, a behavior accurately predicted by NHPM.
  • Figure 4: Mean pupil diameter with $95\%$ CI (Interpolation: $1000$, smoothing: $5$).
  • Figure 5: Eye-tracking metrics of cognitive load: PCPD (%), blink rate (/min), and fixation shift rate (/min).
  • ...and 7 more figures