Table of Contents
Fetching ...

120 Minutes and a Laptop: Minimalist Image-goal Navigation via Unsupervised Exploration and Offline RL

Xiaoming Liu, Borong Zhang, Qingbiao Li, Steven Morad

Abstract

The prevailing paradigm for image-goal visual navigation often assumes access to large-scale datasets, substantial pretraining, and significant computational resources. In this work, we challenge this assumption. We show that we can collect a dataset, train an in-domain policy, and deploy it to the real world (1) in less than 120 minutes, (2) on a consumer laptop, (3) without any human intervention. Our method, MINav, formulates image-goal navigation as an offline goal-conditioned reinforcement learning problem, combining unsupervised data collection with hindsight goal relabeling and offline policy learning. Experiments in simulation and the real world show that MINav improves exploration efficiency, outperforms zero-shot navigation baselines in target environments, and scales favorably with dataset size. These results suggest that effective real-world robotic learning can be achieved with high computational efficiency, lowering the barrier to rapid policy prototyping and deployment.

120 Minutes and a Laptop: Minimalist Image-goal Navigation via Unsupervised Exploration and Offline RL

Abstract

The prevailing paradigm for image-goal visual navigation often assumes access to large-scale datasets, substantial pretraining, and significant computational resources. In this work, we challenge this assumption. We show that we can collect a dataset, train an in-domain policy, and deploy it to the real world (1) in less than 120 minutes, (2) on a consumer laptop, (3) without any human intervention. Our method, MINav, formulates image-goal navigation as an offline goal-conditioned reinforcement learning problem, combining unsupervised data collection with hindsight goal relabeling and offline policy learning. Experiments in simulation and the real world show that MINav improves exploration efficiency, outperforms zero-shot navigation baselines in target environments, and scales favorably with dataset size. These results suggest that effective real-world robotic learning can be achieved with high computational efficiency, lowering the barrier to rapid policy prototyping and deployment.

Paper Structure

This paper contains 30 sections, 10 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: MINav enables rapid, two-hour deployment for real-world end-to-end ImageNav. (a) Autonomous data collection. (b) Offline policy learning. (c) Real-world policy deployment.
  • Figure 2: Real-world deployment of MINav in the three evaluation environments. The robot successfully reaches image-goals despite clutter, blur, reflections, and partial occlusion.
  • Figure 3: Overview of the MINav pipeline. Starting from autonomous exploration in the target environment, MINav collects diverse trajectories using the proposed pink uniform noise model, extracts visual representations with frozen DINOv3, constructs the goal space, and trains a navigation policy offline via hindsight goal relabeling and TD3+BC.
  • Figure 4: Generation process of the proposed pink uniform noise.
  • Figure 5: Valid goal selection using DINOv3-based spatial standard deviation.
  • ...and 3 more figures