MOPA: Modular Object Navigation with PointGoal Agents
Sonia Raychaudhuri, Tommaso Campari, Unnat Jain, Manolis Savva, Angel X. Chang
TL;DR
MOPA proposes a modular Object Navigation framework that decouples object detection, semantic mapping, exploration, and navigation, enabling reuse of pretrained PointNav policies for long-horizon tasks. By building a top-down semantic map and testing multiple exploration strategies, the authors show that a simple Uniform exploration policy combined with a PointNav-based navigator can outperform more complex, end-to-end or analytically-planned baselines. The creation of MultiON 2.0 provides a large, challenging benchmark with natural and cylinder objects, distractors, and longer episodes to study generalization and transfer. The results demonstrate strong transferability to unseen environments and highlight practical design choices—most notably, the benefit of modularity and the surprising efficacy of Uniform exploration. Overall, the work suggests that leveraging transfer learning and simple heuristics within a modular pipeline can yield robust, scalable solutions for long-horizon embodied navigation tasks.
Abstract
We propose a simple but effective modular approach MOPA (Modular ObjectNav with PointGoal agents) to systematically investigate the inherent modularity of the object navigation task in Embodied AI. MOPA consists of four modules: (a) an object detection module trained to identify objects from RGB images, (b) a map building module to build a semantic map of the observed objects, (c) an exploration module enabling the agent to explore the environment, and (d) a navigation module to move to identified target objects. We show that we can effectively reuse a pretrained PointGoal agent as the navigation model instead of learning to navigate from scratch, thus saving time and compute. We also compare various exploration strategies for MOPA and find that a simple uniform strategy significantly outperforms more advanced exploration methods.
