Table of Contents
Fetching ...

AutoDiscovery: Open-ended Scientific Discovery via Bayesian Surprise

Dhruv Agarwal, Bodhisattwa Prasad Majumder, Reece Adamson, Megha Chakravorty, Satvika Reddy Gavireddy, Aditya Parashar, Harshit Surana, Bhavana Dalvi Mishra, Andrew McCallum, Ashish Sabharwal, Peter Clark

TL;DR

AutoDiscovery tackles open-ended autonomous scientific discovery by letting an LLM-driven agent search for surprising hypotheses using Bayesian surprise as the reward. It formalizes belief about hypotheses with a Beta-Bernoulli framework, defines Bayesian surprise via $BS= D_{KL}(P(\theta_H|\mathcal{V}_D) \parallel P(\theta_H))$, and uses a surprisal indicator $S(H,\mathcal{V}_D)$ to drive a Monte Carlo Tree Search with progressive widening. Empirically, it evaluates on 21 real-world datasets across diverse domains and shows AutoDiscovery discovers 5-29% more surprising hypotheses than strong baselines, with two-thirds of discoveries aligning with human surprisal; a deduplication step via LLM-based HAC reduces duplicates and enhances reliability. The work also includes a comprehensive human annotation study, discusses limitations and safeguards for open-ended ASD, and introduces a multi-agent architecture with system prompts and structured agent roles to support scalable, open-ended exploration.

Abstract

The promise of autonomous scientific discovery (ASD) hinges not only on answering questions, but also on knowing which questions to ask. Most recent works in ASD explore the use of large language models (LLMs) in goal-driven settings, relying on human-specified research questions to guide hypothesis generation. However, scientific discovery may be accelerated further by allowing the AI system to drive exploration by its own criteria. The few existing approaches in open-ended ASD select hypotheses based on diversity heuristics or subjective proxies for human interestingness, but the former struggles to meaningfully navigate the typically vast hypothesis space, and the latter suffers from imprecise definitions. This paper presents AutoDiscovery -- a method for open-ended ASD that instead drives scientific exploration using Bayesian surprise. Here, we quantify the epistemic shift from the LLM's prior beliefs about a hypothesis to its posterior beliefs after gathering experimental results. To efficiently explore the space of nested hypotheses, our method employs a Monte Carlo tree search (MCTS) strategy with progressive widening using surprisal as the reward function. We evaluate AutoDiscovery in the setting of data-driven discovery across 21 real-world datasets spanning domains such as biology, economics, finance, and behavioral science. Our results demonstrate that under a fixed budget, AutoDiscovery substantially outperforms competitors by producing 5-29% more discoveries deemed surprising by the LLM. Our human evaluation further reveals that two-thirds of discoveries made by our system are surprising to domain experts as well, suggesting this is an important step towards building open-ended ASD systems.

AutoDiscovery: Open-ended Scientific Discovery via Bayesian Surprise

TL;DR

AutoDiscovery tackles open-ended autonomous scientific discovery by letting an LLM-driven agent search for surprising hypotheses using Bayesian surprise as the reward. It formalizes belief about hypotheses with a Beta-Bernoulli framework, defines Bayesian surprise via , and uses a surprisal indicator to drive a Monte Carlo Tree Search with progressive widening. Empirically, it evaluates on 21 real-world datasets across diverse domains and shows AutoDiscovery discovers 5-29% more surprising hypotheses than strong baselines, with two-thirds of discoveries aligning with human surprisal; a deduplication step via LLM-based HAC reduces duplicates and enhances reliability. The work also includes a comprehensive human annotation study, discusses limitations and safeguards for open-ended ASD, and introduces a multi-agent architecture with system prompts and structured agent roles to support scalable, open-ended exploration.

Abstract

The promise of autonomous scientific discovery (ASD) hinges not only on answering questions, but also on knowing which questions to ask. Most recent works in ASD explore the use of large language models (LLMs) in goal-driven settings, relying on human-specified research questions to guide hypothesis generation. However, scientific discovery may be accelerated further by allowing the AI system to drive exploration by its own criteria. The few existing approaches in open-ended ASD select hypotheses based on diversity heuristics or subjective proxies for human interestingness, but the former struggles to meaningfully navigate the typically vast hypothesis space, and the latter suffers from imprecise definitions. This paper presents AutoDiscovery -- a method for open-ended ASD that instead drives scientific exploration using Bayesian surprise. Here, we quantify the epistemic shift from the LLM's prior beliefs about a hypothesis to its posterior beliefs after gathering experimental results. To efficiently explore the space of nested hypotheses, our method employs a Monte Carlo tree search (MCTS) strategy with progressive widening using surprisal as the reward function. We evaluate AutoDiscovery in the setting of data-driven discovery across 21 real-world datasets spanning domains such as biology, economics, finance, and behavioral science. Our results demonstrate that under a fixed budget, AutoDiscovery substantially outperforms competitors by producing 5-29% more discoveries deemed surprising by the LLM. Our human evaluation further reveals that two-thirds of discoveries made by our system are surprising to domain experts as well, suggesting this is an important step towards building open-ended ASD systems.

Paper Structure

This paper contains 38 sections, 5 equations, 9 figures, 5 tables, 4 algorithms.

Figures (9)

  • Figure 1: Overview of AutoDiscovery: A method for open-ended ASD that is guided by Bayesian surprise. We elicit LLM prior and posterior beliefs about hypotheses via sampling, and use surprisal as a reward function within an MCTS procedure to find hypotheses by trading-off exploration and exploitation of the hypothesis space in search for surprising discoveries.
  • Figure 2: Search Performance.(a) Cumulative number of surprisals discovered across timesteps within a budget of 500 evaluations, averaged over 21 datasets. (b) Search efficiency gradient computed using a sliding window of 10 iterations. (c) Number of surprisals discovered per dataset. Takeaway:AutoDiscovery outperforms baselines, including other tree-search methods, in both search efficiency and number of surprisals discovered.
  • Figure 3: Belief shift across datasets. Bayesian surprise under belief shift for surprisals discovered using AutoDiscovery, grouped by domain and direction of shift.
  • Figure 4: Finite state machine for the discovery agent.
  • Figure 5: Internal annotation tool for discovery agent and deduplication verification.
  • ...and 4 more figures