End-to-End Goal-Driven Web Navigation
Rodrigo Nogueira, Kyunghyun Cho
TL;DR
This work introduces End-to-End Goal-Driven Web Navigation as a large-scale benchmark for evaluating AI agents on natural language understanding and planning within partially observable graph-structured websites. It defines the WebNav framework to convert sites into navigable graphs and WikiNav as a Wikipedia-derived dataset, plus WikiNav-Jeopardy for QA-style queries. The authors propose NeuAgent, a neural navigator with both feedforward and recurrent variants and attention-based query encoding, trained end-to-end via supervised learning and evaluated with beam search; results show strong gains over baselines and clear benefits from history and attention, especially on harder tasks. The study demonstrates the benchmark’s potential for assessing progress in real-world tasks like focused crawling and question answering, and highlights pretrained agents’ advantages for complex navigation.
Abstract
We propose a goal-driven web navigation as a benchmark task for evaluating an agent with abilities to understand natural language and plan on partially observed environments. In this challenging task, an agent navigates through a website, which is represented as a graph consisting of web pages as nodes and hyperlinks as directed edges, to find a web page in which a query appears. The agent is required to have sophisticated high-level reasoning based on natural languages and efficient sequential decision-making capability to succeed. We release a software tool, called WebNav, that automatically transforms a website into this goal-driven web navigation task, and as an example, we make WikiNav, a dataset constructed from the English Wikipedia. We extensively evaluate different variants of neural net based artificial agents on WikiNav and observe that the proposed goal-driven web navigation well reflects the advances in models, making it a suitable benchmark for evaluating future progress. Furthermore, we extend the WikiNav with question-answer pairs from Jeopardy! and test the proposed agent based on recurrent neural networks against strong inverted index based search engines. The artificial agents trained on WikiNav outperforms the engined based approaches, demonstrating the capability of the proposed goal-driven navigation as a good proxy for measuring the progress in real-world tasks such as focused crawling and question-answering.
