Table of Contents
Fetching ...

Nomad: Autonomous Exploration and Discovery

Bokang Jia, Samta Kamboj, Satheesh Katipomu, Seung Hun Han, Neha Sengupta, Andrew Jackson

Abstract

We introduce Nomad, a system for autonomous data exploration and insight discovery. Given a corpus of documents, databases, or other data sources, users rarely know the full set of questions, hypotheses, or connections that could be explored. As a result, query-driven question answering and prompt-driven deep-research systems remain limited by human framing and often fail to cover the broader insight space. Nomad addresses this problem with an exploration-first architecture. It constructs an explicit Exploration Map over the domain and systematically traverses it to balance breadth and depth. It generates and selects hypotheses and investigates them with an explorer agent that can use document search, web search, and database tools. Candidate insights are then checked by an independent verifier before entering a reporting pipeline that produces cited reports and higher-level meta-reports. We also present a comprehensive evaluation framework for autonomous discovery systems that measures trustworthiness, report quality, and diversity. Using a corpus of selected UN and WHO reports, we show that Nomad produces more trustworthy and higher-quality reports than baselines, while also producing more diverse insights over several runs. Nomad is a step toward autonomous systems that not only answer user questions or conduct directed research, but also discover which questions, research directions, and insights are worth surfacing in the first place.

Nomad: Autonomous Exploration and Discovery

Abstract

We introduce Nomad, a system for autonomous data exploration and insight discovery. Given a corpus of documents, databases, or other data sources, users rarely know the full set of questions, hypotheses, or connections that could be explored. As a result, query-driven question answering and prompt-driven deep-research systems remain limited by human framing and often fail to cover the broader insight space. Nomad addresses this problem with an exploration-first architecture. It constructs an explicit Exploration Map over the domain and systematically traverses it to balance breadth and depth. It generates and selects hypotheses and investigates them with an explorer agent that can use document search, web search, and database tools. Candidate insights are then checked by an independent verifier before entering a reporting pipeline that produces cited reports and higher-level meta-reports. We also present a comprehensive evaluation framework for autonomous discovery systems that measures trustworthiness, report quality, and diversity. Using a corpus of selected UN and WHO reports, we show that Nomad produces more trustworthy and higher-quality reports than baselines, while also producing more diverse insights over several runs. Nomad is a step toward autonomous systems that not only answer user questions or conduct directed research, but also discover which questions, research directions, and insights are worth surfacing in the first place.

Paper Structure

This paper contains 85 sections, 7 equations, 14 figures, 14 tables, 7 algorithms.

Figures (14)

  • Figure 1: Query-driven systems (left) explore only the narrow slice of a corpus that a user thinks to ask about, producing a single report shaped by the initial question. Nomad (right) constructs an Exploration Map over the domain and systematically traverses it, generating and verifying hypotheses across the full breadth of the corpus. This exploration-first approach yields diverse, evidence-backed reports covering directions the user may never have thought to pursue.
  • Figure 2: Overview of the Nomad pipeline. The exploration map and topic selection feed hypothesis generation, which enters the explorer--verifier loop. Verified insights are reported and evaluated, with periodic synthesis in meta-reports.
  • Figure 3: Interim state of the Exploration Map after concept layer construction. Documents are connected to the concepts extracted from them. Concepts are disambiguated across documents to point to a unified concept. The red (blue) highlighted nodes become part of a single LLM call during insight potential evaluation of concept $c7$ ($c3$). This results in the generation of hypothesis $h7$ ($h3$) as a child of the concept node.
  • Figure 4: The full exploration map: A Topic Tree is constructed on top of the concept layer, and concepts are connected to their closest leaf topic node. The Topic Tree is traversed breadth-first for exploration.
  • Figure 5: A real exploration map from the WHO Research Analyst instance. The Topic Tree (circle nodes) is constructed over the concept layer (rectangles), with each concept connected to its closest leaf topic node via dashed lines. Note the non-uniform branching factor and depth across different branches of the tree.
  • ...and 9 more figures