Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents

Patrick Tser Jern Kon; Jiachen Liu; Qiuyi Ding; Yiming Qiu; Zhenning Yang; Yibo Huang; Jayanth Srinivasa; Myungjin Lee; Mosharaf Chowdhury; Ang Chen

Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents

Patrick Tser Jern Kon, Jiachen Liu, Qiuyi Ding, Yiming Qiu, Zhenning Yang, Yibo Huang, Jayanth Srinivasa, Myungjin Lee, Mosharaf Chowdhury, Ang Chen

TL;DR

Curie tackles the challenge of making AI-driven scientific experimentation rigorous by introducing an Experimental Rigor Engine composed of intra-agent reliability, inter-agent methodical control, and an interpretable Experiment Knowledge Module. The architecture couples Architect and Technician agents to perform end-to-end experimentation with formal validation and structured knowledge management, enabling auditable and reproducible results. A novel 46-task Experimentation Benchmark across four CS domains shows Curie achieving a 3.4x improvement over strong baselines, underscoring the value of built-in rigor for automated science. The work lays a foundation for trustworthy, scalable AI-assisted research and points to future directions in interdisciplinary deployment and knowledge reuse.

Abstract

Scientific experimentation, a cornerstone of human progress, demands rigor in reliability, methodical control, and interpretability to yield meaningful results. Despite the growing capabilities of large language models (LLMs) in automating different aspects of the scientific process, automating rigorous experimentation remains a significant challenge. To address this gap, we propose Curie, an AI agent framework designed to embed rigor into the experimentation process through three key components: an intra-agent rigor module to enhance reliability, an inter-agent rigor module to maintain methodical control, and an experiment knowledge module to enhance interpretability. To evaluate Curie, we design a novel experimental benchmark composed of 46 questions across four computer science domains, derived from influential research papers, and widely adopted open-source projects. Compared to the strongest baseline tested, we achieve a 3.4$\times$ improvement in correctly answering experimental questions. Curie is open-sourced at https://github.com/Just-Curieous/Curie.

Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents

TL;DR

Abstract

Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)