Table of Contents
Fetching ...

IndoorR2X: Indoor Robot-to-Everything Coordination with LLM-Driven Planning

Fan Yang, Soumya Teotia, Shaunak A. Mehta, Prajit KrisshnaKumar, Quanting Xie, Jun Liu, Yueqi Song, Li Wenkai, Atsunori Moteki, Kanji Uchino, Yonatan Bisk

Abstract

Although robot-to-robot (R2R) communication improves indoor scene understanding beyond what a single robot can achieve, R2R alone cannot overcome partial observability without substantial exploration overhead or scaling team size. In contrast, many indoor environments already include low-cost Internet of Things (IoT) sensors (e.g., cameras) that provide persistent, building-wide context beyond onboard perception. We therefore introduce IndoorR2X, the first benchmark and simulation framework for Large Language Model (LLM)-driven multi-robot task planning with Robot-to-Everything (R2X) perception and communication in indoor environments. IndoorR2X integrates observations from mobile robots and static IoT devices to construct a global semantic state that supports scalable scene understanding, reduces redundant exploration, and enables high-level coordination through LLM-based planning. IndoorR2X provides configurable simulation environments, sensor layouts, robot teams, and task suites to systematically evaluate high-level semantic coordination strategies. Extensive experiments across diverse settings demonstrate that IoT-augmented world modeling improves multi-robot efficiency and reliability, and we highlight key insights and failure modes for advancing LLM-based collaboration between robot teams and indoor IoT sensors.

IndoorR2X: Indoor Robot-to-Everything Coordination with LLM-Driven Planning

Abstract

Although robot-to-robot (R2R) communication improves indoor scene understanding beyond what a single robot can achieve, R2R alone cannot overcome partial observability without substantial exploration overhead or scaling team size. In contrast, many indoor environments already include low-cost Internet of Things (IoT) sensors (e.g., cameras) that provide persistent, building-wide context beyond onboard perception. We therefore introduce IndoorR2X, the first benchmark and simulation framework for Large Language Model (LLM)-driven multi-robot task planning with Robot-to-Everything (R2X) perception and communication in indoor environments. IndoorR2X integrates observations from mobile robots and static IoT devices to construct a global semantic state that supports scalable scene understanding, reduces redundant exploration, and enables high-level coordination through LLM-based planning. IndoorR2X provides configurable simulation environments, sensor layouts, robot teams, and task suites to systematically evaluate high-level semantic coordination strategies. Extensive experiments across diverse settings demonstrate that IoT-augmented world modeling improves multi-robot efficiency and reliability, and we highlight key insights and failure modes for advancing LLM-based collaboration between robot teams and indoor IoT sensors.
Paper Structure (21 sections, 7 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 21 sections, 7 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Motivation for IndoorR2X. Augmenting robot perception with global IoT context via LLMs for efficient coordination.
  • Figure 2: Our IndoorR2X framework. CCTV observations and other IoT device signals are collected to augment the world model beyond the perception range of the robots’ ego cameras. These heterogeneous observations are synchronized through a coordination hub, where an LLM-based online planner generates parallel actions for each robot and executes them to perform their respective tasks. As an example scenario, robots are assigned to perform household tasks in the morning. After potential overnight changes to object locations or device statuses (e.g., TVs), robots first update their indoor world model by leveraging the "X" observations.
  • Figure 3: Scalability analysis. Success rate (left) and efficiency metrics (center/right) as a function of team size ($N=2$ to $6$). While success remains stable up to $N=5$, the coordination overhead (total distance traveled) increases with fleet size.
  • Figure 4: Robustness to "X" failures. The system is resilient to missing detections (left), maintaining a constant success rate at the cost of increased travel. However, incorrect semantic status reports (right) significantly impact success, as false positives can lead to unrecoverable planning errors.
  • Figure 5: Qualitative demonstration of IndoorR2X (simulation environment). Three robots and IoT sensors coordinate to efficiently dispose of perishables, power down devices, and consolidate items in the family room.
  • ...and 1 more figures