Table of Contents
Fetching ...

ReEXplore: Improving MLLMs for Embodied Exploration with Contextualized Retrospective Experience Replay

Gengyuan Zhang, Mingcong Ding, Jingpei Wu, Ruotong Liao, Volker Tresp

TL;DR

ReEXplore tackles the suboptimality of MLLM-driven embodied exploration by introducing a training-free approach that leverages retrospective experience abstractions and contextual replay. By distilling past trajectories into concise, transferable schemas and retrieving them at inference time, the method provides robust priors that guide frontier ranking and exploration decisions. A hierarchical frontier selection mechanism further stabilizes decisions in large action spaces, enabling coarse-to-fine reasoning. Empirical results across OpenEQA and GOAT-Bench demonstrate up to 3x improvements in success rate and navigation efficiency, validating the effectiveness of non-parametric, memory-informed adaptation for embodied tasks.

Abstract

Embodied exploration is a target-driven process that requires embodied agents to possess fine-grained perception and knowledge-enhanced decision making. While recent attempts leverage MLLMs for exploration due to their strong perceptual and reasoning abilities, we find that MLLM-based embodied agents remain suboptimal in exploring new environments: (i) they rely on profound but stale pre-trained knowledge, (ii) training-based approaches such as imitation learning or reinforcement learning are expensive for long-horizon tasks with sparse outcome rewards, and (iii) frontier-based exploration yields a large, visually nuanced action space that is difficult for MLLMs to make reliable decisions. We address these challenges with ReEXplore, a training-free framework that performs retrospective experience replay to inject distilled, abstract experience at inference time, and hierarchical frontier selection to decompose frontier ranking into coarse-to-fine decisions. Our approach enables robust, traceable, and efficient exploration. Across multiple embodied exploration benchmarks, ReEXplore yields great improvements over strong MLLM baselines, up to 3x higher performance in both success rate and in navigation efficiency under open-source backbones.

ReEXplore: Improving MLLMs for Embodied Exploration with Contextualized Retrospective Experience Replay

TL;DR

ReEXplore tackles the suboptimality of MLLM-driven embodied exploration by introducing a training-free approach that leverages retrospective experience abstractions and contextual replay. By distilling past trajectories into concise, transferable schemas and retrieving them at inference time, the method provides robust priors that guide frontier ranking and exploration decisions. A hierarchical frontier selection mechanism further stabilizes decisions in large action spaces, enabling coarse-to-fine reasoning. Empirical results across OpenEQA and GOAT-Bench demonstrate up to 3x improvements in success rate and navigation efficiency, validating the effectiveness of non-parametric, memory-informed adaptation for embodied tasks.

Abstract

Embodied exploration is a target-driven process that requires embodied agents to possess fine-grained perception and knowledge-enhanced decision making. While recent attempts leverage MLLMs for exploration due to their strong perceptual and reasoning abilities, we find that MLLM-based embodied agents remain suboptimal in exploring new environments: (i) they rely on profound but stale pre-trained knowledge, (ii) training-based approaches such as imitation learning or reinforcement learning are expensive for long-horizon tasks with sparse outcome rewards, and (iii) frontier-based exploration yields a large, visually nuanced action space that is difficult for MLLMs to make reliable decisions. We address these challenges with ReEXplore, a training-free framework that performs retrospective experience replay to inject distilled, abstract experience at inference time, and hierarchical frontier selection to decompose frontier ranking into coarse-to-fine decisions. Our approach enables robust, traceable, and efficient exploration. Across multiple embodied exploration benchmarks, ReEXplore yields great improvements over strong MLLM baselines, up to 3x higher performance in both success rate and in navigation efficiency under open-source backbones.

Paper Structure

This paper contains 29 sections, 19 equations, 13 figures, 6 tables.

Figures (13)

  • Figure 1: We introduce ReEXplore, a training-free framework that strengthens MLLM-based embodied agents for frontier-based exploration. Prior MLLM agents suffer from two core limitations: (1) dependence on stale pre-trained knowledge, and (2) difficulty distinguishing and ranking visually similar frontier candidates in a large action space. ReEXplore overcomes these challenges through Retrospective Experience Replay, which injects distilled experiences from previous trials directly at inference time, and Hierarchical Frontier Selection, which decomposes frontier space into coarse-to-fine decisions for more reliable and efficient exploration.
  • Figure 2: Overview of ReEXplore. Left: Retrospective Experience Abstraction. Completed trajectories are distilled into compact, transferable experience abstractions in a progressive way. Right: Embodied Exploration with ReEXplore. In a new environment, the agent retrieves salient past experiences based on scene and text similarity, incorporates them through contextualized experience replay, and performs hierarchical frontier selection to guide exploration. This enables the agent to navigate toward informative viewpoints efficiently while leveraging distilled prior experience.
  • Figure 3: Performance of ReEXplore (with Qwen2.5-VL-7B-Instruct) on A-EQA by question categories, showing both Success Rate (Succ.) and SPL. Our approach outperms the strongest baseline model by a large margin in all task categories.
  • Figure 4: Hierarchical frontier selection using visualizations taken directly from real exploration data: the agent first picks a Broad-View frontier cluster (left), then selects a fine-grained frontier(close-up view) within that cluster (right)
  • Figure 5: Performance of ReEXplore (with GPT-4o) on A-EQA by question categories, showing both Success Rate (Succ.) and SPL. Our approach outperms the strongest base-line model by a large margin in all task categories.
  • ...and 8 more figures