StratXplore: Strategic Novelty-seeking and Instruction-aligned Exploration for Vision and Language Navigation

Muraleekrishna Gopinathan; Jumana Abu-Khalaf; David Suter; Martin Masek

StratXplore: Strategic Novelty-seeking and Instruction-aligned Exploration for Vision and Language Navigation

Muraleekrishna Gopinathan, Jumana Abu-Khalaf, David Suter, Martin Masek

TL;DR

StratXplore tackles Vision-Language Navigation in unseen environments by introducing mistake-aware frontier exploration that combines global and local planning with dual memories. It learns a deviation-prediction pretraining task and a recovery-confidence predictor to guide exploration toward recently observed, novel, and instruction-aligned frontiers, enabling efficient error recovery. The approach yields improved success rates and path-quality on R2R and R4R, demonstrating robust navigation under long-horizon and partial observability conditions. This work advances practical VLN by integrating memory-driven frontier selection with principled recovery strategies, potentially informing real-world embodied agents facing ambiguity and mistakes.

Abstract

Embodied navigation requires robots to understand and interact with the environment based on given tasks. Vision-Language Navigation (VLN) is an embodied navigation task, where a robot navigates within a previously seen and unseen environment, based on linguistic instruction and visual inputs. VLN agents need access to both local and global action spaces; former for immediate decision making and the latter for recovering from navigational mistakes. Prior VLN agents rely only on instruction-viewpoint alignment for local and global decision making and back-track to a previously visited viewpoint, if the instruction and its current viewpoint mismatches. These methods are prone to mistakes, due to the complexity of the instruction and partial observability of the environment. We posit that, back-tracking is sub-optimal and agent that is aware of its mistakes can recover efficiently. For optimal recovery, exploration should be extended to unexplored viewpoints (or frontiers). The optimal frontier is a recently observed but unexplored viewpoint that aligns with the instruction and is novel. We introduce a memory-based and mistake-aware path planning strategy for VLN agents, called \textit{StratXplore}, that presents global and local action planning to select the optimal frontier for path correction. The proposed method collects all past actions and viewpoint features during navigation and then selects the optimal frontier suitable for recovery. Experimental results show this simple yet effective strategy improves the success rate on two VLN datasets with different task complexities.

StratXplore: Strategic Novelty-seeking and Instruction-aligned Exploration for Vision and Language Navigation

TL;DR

Abstract

Paper Structure (34 sections, 6 equations, 5 figures, 3 tables)

This paper contains 34 sections, 6 equations, 5 figures, 3 tables.

INTRODUCTION
Related Work
Path planning in VLN
Memory representations in VLN
Our Approach
Task Definition
Inputs
Instruction Encoding
Topology Encoding
Ego-centric Semantic Map Encoding
Action Proposal
Detecting a navigation mistake
Recovery
Action-and-Knowledge based Frontier Selection
Action Memory
...and 19 more sections

Figures (5)

Figure 1: Overview. StratXplore enables an embodied agent to correct its path by exploring frontiers that are both novel and conforms to the given instruction. Here, exploit refers to selecting one of the local candidate directions and explore considers all unexplored frontiers from the memory.
Figure 2: Model Architecture of StratXplore. (a) Fused action proposal from Global and Local Cross-modal transformers (CMT) is used for exploitation (b) When the recovery confidence $S^{conf}$ of current candidates drops below a threshold $c_{thresh}$, the agent chooses to explore. Frontier selector considers the optimal recent-and-novel and instruction-aligned frontier to explore. Blocks are hyperlinked to relevant sections.
Figure 3: Novelty and Alignment Scoring. At each time step, object features are added to memory if the objects are novel. During exploration both scores are used to rank frontiers.
Figure 4: Example for Temporal prioritisation and Entity Alignment Scoring. Recent memories (thicker connections) are prioritised over past actions to encourage exploration. $S^{align}$ (values in parenthesis) is the highest for the path to the optimal frontier (here 1234H).
Figure 5: Qualitative comparison between trajectories of the baseline and StratXplore agents. The baseline agent does not recover from the navigational mistake and continues to exploit the same direction and eventually fails. Although StratXplore makes a mistake by entering the second living room, it quickly corrects itself by moving to the best observed frontier (hallway).

StratXplore: Strategic Novelty-seeking and Instruction-aligned Exploration for Vision and Language Navigation

TL;DR

Abstract

StratXplore: Strategic Novelty-seeking and Instruction-aligned Exploration for Vision and Language Navigation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)