Table of Contents
Fetching ...

The AI Cosmologist I: An Agentic System for Automated Data Analysis

Adam Moss

TL;DR

The paper tackles the bottlenecks of data-rich cosmology by introducing the AI Cosmologist, an agentic system that integrates AutoML and LLM-driven code generation to automate the full ML research lifecycle—from idea generation to experimental evaluation and dissemination. It presents a modular architecture with Planning, Coding, Execution, Analysis, Synthesis, and Literature agents that coordinate through a directed workflow, enabling autonomous hypothesis testing and publication-ready outputs. Experimental results on Galaxy Zoo 2 and Quijote demonstrate iterative improvement, state-of-the-art-like performance, and the ability to autonomously generate complete scientific manuscripts. The work suggests that agentic AI can accelerate scientific discovery in cosmology, while acknowledging current limitations in novelty, theory integration, and resource requirements.

Abstract

We present the AI Cosmologist, an agentic system designed to automate cosmological/astronomical data analysis and machine learning research workflows. This implements a complete pipeline from idea generation to experimental evaluation and research dissemination, mimicking the scientific process typically performed by human researchers. The system employs specialized agents for planning, coding, execution, analysis, and synthesis that work together to develop novel approaches. Unlike traditional auto machine-learning systems, the AI Cosmologist generates diverse implementation strategies, writes complete code, handles execution errors, analyzes results, and synthesizes new approaches based on experimental outcomes. We demonstrate the AI Cosmologist capabilities across several machine learning tasks, showing how it can successfully explore solution spaces, iterate based on experimental results, and combine successful elements from different approaches. Our results indicate that agentic systems can automate portions of the research process, potentially accelerating scientific discovery. The code and experimental data used in this paper are available on GitHub at https://github.com/adammoss/aicosmologist. Example papers included in the appendix demonstrate the system's capability to autonomously produce complete scientific publications, starting from only the dataset and task description

The AI Cosmologist I: An Agentic System for Automated Data Analysis

TL;DR

The paper tackles the bottlenecks of data-rich cosmology by introducing the AI Cosmologist, an agentic system that integrates AutoML and LLM-driven code generation to automate the full ML research lifecycle—from idea generation to experimental evaluation and dissemination. It presents a modular architecture with Planning, Coding, Execution, Analysis, Synthesis, and Literature agents that coordinate through a directed workflow, enabling autonomous hypothesis testing and publication-ready outputs. Experimental results on Galaxy Zoo 2 and Quijote demonstrate iterative improvement, state-of-the-art-like performance, and the ability to autonomously generate complete scientific manuscripts. The work suggests that agentic AI can accelerate scientific discovery in cosmology, while acknowledging current limitations in novelty, theory integration, and resource requirements.

Abstract

We present the AI Cosmologist, an agentic system designed to automate cosmological/astronomical data analysis and machine learning research workflows. This implements a complete pipeline from idea generation to experimental evaluation and research dissemination, mimicking the scientific process typically performed by human researchers. The system employs specialized agents for planning, coding, execution, analysis, and synthesis that work together to develop novel approaches. Unlike traditional auto machine-learning systems, the AI Cosmologist generates diverse implementation strategies, writes complete code, handles execution errors, analyzes results, and synthesizes new approaches based on experimental outcomes. We demonstrate the AI Cosmologist capabilities across several machine learning tasks, showing how it can successfully explore solution spaces, iterate based on experimental results, and combine successful elements from different approaches. Our results indicate that agentic systems can automate portions of the research process, potentially accelerating scientific discovery. The code and experimental data used in this paper are available on GitHub at https://github.com/adammoss/aicosmologist. Example papers included in the appendix demonstrate the system's capability to autonomously produce complete scientific publications, starting from only the dataset and task description

Paper Structure

This paper contains 28 sections, 12 equations, 4 figures.

Figures (4)

  • Figure 1: Example prompt template used by the Planning Agent to generate initial implementation ideas. Curly braces indicate variable placeholders that are dynamically filled with task-specific information.
  • Figure 2: Workflow diagram of the AI Cosmologist system in the research phase. The process begins with initialization and generation of initial ideas, followed by a development cycle for each idea that includes planning, code generation, execution, and evaluation. The system then enters collaborative rounds where cross-analysis of results leads to two parallel pathways: analyzing top-performing ideas to identify successful patterns, and examining the solution space to discover unexplored approaches.
  • Figure 3: Evolution of implementation strategies for the Galaxy Zoo dataset. The figure shows a subset of the initial ideas, analysis of experimental results, and new, synthesized ideas. Text has been abbreviated and annotated for space considerations.
  • Figure 4: Improvement of the best validation RMSE and public Kaggle score on the Galaxy Zoo 2 dataset, through initial ideas to collaborative rounds.