Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning

Khanh Nguyen; Hal Daumé

Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning

Khanh Nguyen, Hal Daumé

TL;DR

HANNA introduces a photo-realistic navigation environment where agents can request multimodal assistance from simulated human assistants (ANNA). The authors propose a memory-augmented, hierarchical policy with retrospective, curiosity-driven imitation learning to learn when and how to ask for help and how to interpret language-vision routes. Key contributions include the Hanna simulator, the I3L learning framework with reference and curiosity losses, and a hierarchical recurrent architecture that effectively leverages language and vision instructions. Empirical results show substantial improvements in task success, especially in unseen environments, validating the approach and highlighting the value of language-enabled guidance for robust navigation.

Abstract

Mobile agents that can leverage help from humans can potentially accomplish more complex tasks than they could entirely on their own. We develop "Help, Anna!" (HANNA), an interactive photo-realistic simulator in which an agent fulfills object-finding tasks by requesting and interpreting natural language-and-vision assistance. An agent solving tasks in a HANNA environment can leverage simulated human assistants, called ANNA (Automatic Natural Navigation Assistants), which, upon request, provide natural language and visual instructions to direct the agent towards the goals. To address the HANNA problem, we develop a memory-augmented neural agent that hierarchically models multiple levels of decision-making, and an imitation learning algorithm that teaches the agent to avoid repeating past mistakes while simultaneously predicting its own chances of making future progress. Empirically, our approach is able to ask for help more effectively than competitive baselines and, thus, attains higher task success rate on both previously seen and previously unseen environments. We publicly release code and data at https://github.com/khanhptnk/hanna . A video demo is available at https://youtu.be/18P94aaaLKg .

Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning

TL;DR

Abstract

Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (5)