The AI Cosmologist I: An Agentic System for Automated Data Analysis

Adam Moss

The AI Cosmologist I: An Agentic System for Automated Data Analysis

Adam Moss

TL;DR

The paper tackles the bottlenecks of data-rich cosmology by introducing the AI Cosmologist, an agentic system that integrates AutoML and LLM-driven code generation to automate the full ML research lifecycle—from idea generation to experimental evaluation and dissemination. It presents a modular architecture with Planning, Coding, Execution, Analysis, Synthesis, and Literature agents that coordinate through a directed workflow, enabling autonomous hypothesis testing and publication-ready outputs. Experimental results on Galaxy Zoo 2 and Quijote demonstrate iterative improvement, state-of-the-art-like performance, and the ability to autonomously generate complete scientific manuscripts. The work suggests that agentic AI can accelerate scientific discovery in cosmology, while acknowledging current limitations in novelty, theory integration, and resource requirements.

Abstract

We present the AI Cosmologist, an agentic system designed to automate cosmological/astronomical data analysis and machine learning research workflows. This implements a complete pipeline from idea generation to experimental evaluation and research dissemination, mimicking the scientific process typically performed by human researchers. The system employs specialized agents for planning, coding, execution, analysis, and synthesis that work together to develop novel approaches. Unlike traditional auto machine-learning systems, the AI Cosmologist generates diverse implementation strategies, writes complete code, handles execution errors, analyzes results, and synthesizes new approaches based on experimental outcomes. We demonstrate the AI Cosmologist capabilities across several machine learning tasks, showing how it can successfully explore solution spaces, iterate based on experimental results, and combine successful elements from different approaches. Our results indicate that agentic systems can automate portions of the research process, potentially accelerating scientific discovery. The code and experimental data used in this paper are available on GitHub at https://github.com/adammoss/aicosmologist. Example papers included in the appendix demonstrate the system's capability to autonomously produce complete scientific publications, starting from only the dataset and task description

The AI Cosmologist I: An Agentic System for Automated Data Analysis

TL;DR

Abstract

The AI Cosmologist I: An Agentic System for Automated Data Analysis

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)