Table of Contents
Fetching ...

ConEQsA: Concurrent and Asynchronous Embodied Questions Scheduling and Answering

Haisheng Wang, Dong Liu, Weiming Zhi

TL;DR

Empirical evaluations demonstrate that ConEQsA consistently outperforms strong sequential baselines, and show that urgency-aware, concurrent scheduling is key to making embodied agents responsive and efficient under realistic, multi-question workloads.

Abstract

This paper formulates the Embodied Questions Answering (EQsA) problem, introduces a corresponding benchmark, and proposes an agentic system to tackle the problem. Classical Embodied Question Answering (EQA) is typically formulated as answering one single question by actively exploring a 3D environment. Real deployments, however, often demand handling multiple questions that may arrive asynchronously and carry different urgencies. We formalize this setting as Embodied Questions Answering (EQsA) and present ConEQsA, an agentic framework for concurrent, urgency-aware scheduling and answering. ConEQsA leverages shared group memory to reduce redundant exploration, and a priority-planning method to dynamically schedule questions. To evaluate the EQsA setting fairly, we contribute the Concurrent Asynchronous Embodied Questions (CAEQs) benchmark containing 40 indoor scenes and five questions per scene (200 in total), featuring asynchronous follow-up questions and human-annotated urgency labels. We further propose metrics for EQsA performance: Direct Answer Rate (DAR), and Normalized Urgency-Weighted Latency (NUWL), which serve as a fair evaluation protocol for EQsA. Empirical evaluations demonstrate that ConEQsA consistently outperforms strong sequential baselines, and show that urgency-aware, concurrent scheduling is key to making embodied agents responsive and efficient under realistic, multi-question workloads. Code is available on https://anonymous.4open.science/r/ConEQsA.

ConEQsA: Concurrent and Asynchronous Embodied Questions Scheduling and Answering

TL;DR

Empirical evaluations demonstrate that ConEQsA consistently outperforms strong sequential baselines, and show that urgency-aware, concurrent scheduling is key to making embodied agents responsive and efficient under realistic, multi-question workloads.

Abstract

This paper formulates the Embodied Questions Answering (EQsA) problem, introduces a corresponding benchmark, and proposes an agentic system to tackle the problem. Classical Embodied Question Answering (EQA) is typically formulated as answering one single question by actively exploring a 3D environment. Real deployments, however, often demand handling multiple questions that may arrive asynchronously and carry different urgencies. We formalize this setting as Embodied Questions Answering (EQsA) and present ConEQsA, an agentic framework for concurrent, urgency-aware scheduling and answering. ConEQsA leverages shared group memory to reduce redundant exploration, and a priority-planning method to dynamically schedule questions. To evaluate the EQsA setting fairly, we contribute the Concurrent Asynchronous Embodied Questions (CAEQs) benchmark containing 40 indoor scenes and five questions per scene (200 in total), featuring asynchronous follow-up questions and human-annotated urgency labels. We further propose metrics for EQsA performance: Direct Answer Rate (DAR), and Normalized Urgency-Weighted Latency (NUWL), which serve as a fair evaluation protocol for EQsA. Empirical evaluations demonstrate that ConEQsA consistently outperforms strong sequential baselines, and show that urgency-aware, concurrent scheduling is key to making embodied agents responsive and efficient under realistic, multi-question workloads. Code is available on https://anonymous.4open.science/r/ConEQsA.

Paper Structure

This paper contains 24 sections, 5 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: An overview of our Concurrent Embodied Questions Answering (ConEQsA) framework. Both initial questions and asynchronous follow-up questions with various urgency levels are coordinated via shared group memory and priority planning.
  • Figure 2: ConEQsA pipeline: parse each incoming question, attempt direct answering from group memory, otherwise enqueue it into a Question Pool; the planner selects a question for question-conditioned exploration, and the answerer responds and updates memory. "Concurrent" means multiple questions remain in-flight via scheduling and memory reuse under a single exploring agent.
  • Figure 3: Question Pool maintains buffered questions, resolves cross-question dependencies as a DAG, and assigns each question a priority score based on urgency, scope, estimated reward, and status.
  • Figure 4: Targeted exploration loop: retrieve relevant memory, update semantic map/memory with new observations, navigate to the next frontier, and stop when the stopping module is satisfied.
  • Figure 5: General questions, functional questions, and safety-related questions receive low, medium, and high urgency, respectively.
  • ...and 1 more figures