Table of Contents
Fetching ...

Hypothesis Graph Refinement: Hypothesis-Driven Exploration with Cascade Error Correction for Embodied Navigation

Peixin Chen, Guoxi Zhang, Jianwei Ma, Qing Li

Abstract

Embodied agents must explore partially observed environments while maintaining reliable long-horizon memory. Existing graph-based navigation systems improve scalability, but they often treat unexplored regions as semantically unknown, leading to inefficient frontier search. Although vision-language models (VLMs) can predict frontier semantics, erroneous predictions may be embedded into memory and propagate through downstream inferences, causing structural error accumulation that confidence attenuation alone cannot resolve. These observations call for a framework that can leverage semantic predictions for directed exploration while systematically retracting errors once new evidence contradicts them. We propose Hypothesis Graph Refinement (HGR), a framework that represents frontier predictions as revisable hypothesis nodes in a dependency-aware graph memory. HGR introduces (1) semantic hypothesis module, which estimates context-conditioned semantic distributions over frontiers and ranks exploration targets by goal relevance, travel cost, and uncertainty, and (2) verification-driven cascade correction, which compares on-site observations against predicted semantics and, upon mismatch, retracts the refuted node together with all its downstream dependents. Unlike additive map-building, this allows the graph to contract by pruning erroneous subgraphs, keeping memory reliable throughout long episodes. We evaluate HGR on multimodal lifelong navigation (GOAT-Bench) and embodied question answering (A-EQA, EM-EQA). HGR achieves 72.41% success rate and 56.22% SPL on GOAT-Bench, and shows consistent improvements on both QA benchmarks. Diagnostic analysis reveals that cascade correction eliminates approximately 20% of structurally redundant hypothesis nodes and reduces revisits to erroneous regions by 4.5x, with specular and transparent surfaces accounting for 67% of corrected prediction errors.

Hypothesis Graph Refinement: Hypothesis-Driven Exploration with Cascade Error Correction for Embodied Navigation

Abstract

Embodied agents must explore partially observed environments while maintaining reliable long-horizon memory. Existing graph-based navigation systems improve scalability, but they often treat unexplored regions as semantically unknown, leading to inefficient frontier search. Although vision-language models (VLMs) can predict frontier semantics, erroneous predictions may be embedded into memory and propagate through downstream inferences, causing structural error accumulation that confidence attenuation alone cannot resolve. These observations call for a framework that can leverage semantic predictions for directed exploration while systematically retracting errors once new evidence contradicts them. We propose Hypothesis Graph Refinement (HGR), a framework that represents frontier predictions as revisable hypothesis nodes in a dependency-aware graph memory. HGR introduces (1) semantic hypothesis module, which estimates context-conditioned semantic distributions over frontiers and ranks exploration targets by goal relevance, travel cost, and uncertainty, and (2) verification-driven cascade correction, which compares on-site observations against predicted semantics and, upon mismatch, retracts the refuted node together with all its downstream dependents. Unlike additive map-building, this allows the graph to contract by pruning erroneous subgraphs, keeping memory reliable throughout long episodes. We evaluate HGR on multimodal lifelong navigation (GOAT-Bench) and embodied question answering (A-EQA, EM-EQA). HGR achieves 72.41% success rate and 56.22% SPL on GOAT-Bench, and shows consistent improvements on both QA benchmarks. Diagnostic analysis reveals that cascade correction eliminates approximately 20% of structurally redundant hypothesis nodes and reduces revisits to erroneous regions by 4.5x, with specular and transparent surfaces accounting for 67% of corrected prediction errors.

Paper Structure

This paper contains 68 sections, 4 equations, 7 figures, 15 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview of Hypothesis Graph Refinement (HGR). The hypothesis graph separates observed nodes (purple, verified regions) from hypothesis nodes (green, probabilistic frontier predictions), enabling a hypothesis-verification-correction cycle. (Left) Given the query "What is inside the basket?", observed nodes provide confirmed scene context, while hypothesis nodes project semantic distributions onto unexplored frontiers, guiding the agent toward the most likely location of the target object. (Right) Upon arrival, the agent verifies each hypothesis against actual observations. If the prediction is confirmed, the hypothesis node transitions to an observed node; if refuted, cascade correction retracts the erroneous node and all its downstream dependents, preventing error propagation through the graph.
  • Figure 2: Architecture of HGR.(Left) Frontiers $\mathcal{F}_t$ detected from the occupancy map are fed to a VLM reasoner for semantic hypothesis module, which estimates categorical distributions and generates hypothesis nodes linked to observed nodes via spatial and dependency edges. Upon visitation, cascade correction compares predicted ($I_{\mathrm{pred}}$) and actual ($I_{\mathrm{actual}}$) semantics; if $\Delta_{\mathrm{sem}} > \theta$, the refuted node and all its dependents are removed. (Right) Running example on a floor plan. Graph Refinement marks confirmed hypotheses promoted to observed nodes; Graph Refinement (Shrinking) shows where cascade correction prunes erroneous subgraphs, contracting the graph.
  • Figure 3: Semantic Hypothesis Module. Left: Traditional frontier representation treats unexplored regions as undifferentiated boundaries. Right: HGR projects probabilistic semantic distributions onto frontiers as hypothesis nodes, enabling goal-directed exploration.
  • Figure 4: Cascade Correction Example. A VLM misidentifies a mirror reflection as a bedroom entrance, generating hypothesis nodes for inferred furniture. Upon reaching the mirror and detecting a prediction violation (residual $> \theta_{\text{refute}}$), the system traces the dependency DAG and removes the entire erroneous subgraph, including all descendant hypothesis nodes.
  • Figure 5: Cumulative Success Rate vs. Episode Steps. HGR reaches navigation targets earlier than baselines due to hypothesis-driven frontier selection, while 3D-Mem and ConceptGraph require more steps for exhaustive geometric search. The gap widens in later steps as cascade correction prevents error accumulation.
  • ...and 2 more figures