Table of Contents
Fetching ...

Jr. AI Scientist and Its Risk Report: Autonomous Scientific Exploration from a Baseline Paper

Atsuyuki Miyai, Mashiro Toyooka, Takashi Otonari, Zaiying Zhao, Kiyoharu Aizawa

TL;DR

This work develops Jr. AI Scientist, a baseline-driven autonomous research system that mirrors a novice student’s workflow: analyze a baseline paper, identify limitations, hypothesize improvements, implement and validate those ideas, and write a full paper. It integrates a multi-stage experimental pipeline (Idea Generation, Experiment with Stage 1–3, and Writing with iterative reflections) powered by modern coding agents to handle realistic multi-file codebases and artifacts from the baseline. Evaluations include automated AI reviewers, author-led checks, and submission to Agents4Science, revealing higher quality papers than existing AI-only systems while exposing limitations in novelty, experimental breadth, and citation integrity. The paper also delivers a comprehensive risk analysis (e.g., risk of fabrication, hallucinations, and review manipulation) to guide responsible development and future improvements, with results suggesting practical pathways for safer, trustworthy AI-driven scientific exploration, including constraints like eight-page manuscript limits and explicit human verification where necessary. $8$ pages, within $\pm$ 1 page, are used as the final formatting target to ensure concise reporting across venues.

Abstract

Understanding the current capabilities and risks of AI Scientist systems is essential for ensuring trustworthy and sustainable AI-driven scientific progress while preserving the integrity of the academic ecosystem. To this end, we develop Jr. AI Scientist, a state-of-the-art autonomous AI scientist system that mimics the core research workflow of a novice student researcher: Given the baseline paper from the human mentor, it analyzes its limitations, formulates novel hypotheses for improvement, and iteratively conducts experiments until improvements are realized, and writes a paper with the results. Unlike previous approaches that assume full automation or operate on small-scale code, Jr. AI Scientist follows a well-defined research workflow and leverages modern coding agents to handle complex, multi-file implementations, leading to scientifically valuable contributions. Through our experiments, the Jr. AI Scientist successfully generated new research papers that build upon real NeurIPS, IJCV, and ICLR works by proposing and implementing novel methods. For evaluation, we conducted automated assessments using AI Reviewers, author-led evaluations, and submissions to Agents4Science, a venue dedicated to AI-driven scientific contributions. The findings demonstrate that Jr. AI Scientist generates papers receiving higher review scores than existing fully automated systems. Nevertheless, we identify important limitations from both the author evaluation and the Agents4Science reviews, indicating the potential risks of directly applying current AI Scientist systems and key challenges for future research. Finally, we comprehensively report various risks identified during development. We believe this study clarifies the current role and limitations of AI Scientist systems, offering insights into the areas that still require human expertise and the risks that may emerge as these systems evolve.

Jr. AI Scientist and Its Risk Report: Autonomous Scientific Exploration from a Baseline Paper

TL;DR

This work develops Jr. AI Scientist, a baseline-driven autonomous research system that mirrors a novice student’s workflow: analyze a baseline paper, identify limitations, hypothesize improvements, implement and validate those ideas, and write a full paper. It integrates a multi-stage experimental pipeline (Idea Generation, Experiment with Stage 1–3, and Writing with iterative reflections) powered by modern coding agents to handle realistic multi-file codebases and artifacts from the baseline. Evaluations include automated AI reviewers, author-led checks, and submission to Agents4Science, revealing higher quality papers than existing AI-only systems while exposing limitations in novelty, experimental breadth, and citation integrity. The paper also delivers a comprehensive risk analysis (e.g., risk of fabrication, hallucinations, and review manipulation) to guide responsible development and future improvements, with results suggesting practical pathways for safer, trustworthy AI-driven scientific exploration, including constraints like eight-page manuscript limits and explicit human verification where necessary. pages, within 1 page, are used as the final formatting target to ensure concise reporting across venues.

Abstract

Understanding the current capabilities and risks of AI Scientist systems is essential for ensuring trustworthy and sustainable AI-driven scientific progress while preserving the integrity of the academic ecosystem. To this end, we develop Jr. AI Scientist, a state-of-the-art autonomous AI scientist system that mimics the core research workflow of a novice student researcher: Given the baseline paper from the human mentor, it analyzes its limitations, formulates novel hypotheses for improvement, and iteratively conducts experiments until improvements are realized, and writes a paper with the results. Unlike previous approaches that assume full automation or operate on small-scale code, Jr. AI Scientist follows a well-defined research workflow and leverages modern coding agents to handle complex, multi-file implementations, leading to scientifically valuable contributions. Through our experiments, the Jr. AI Scientist successfully generated new research papers that build upon real NeurIPS, IJCV, and ICLR works by proposing and implementing novel methods. For evaluation, we conducted automated assessments using AI Reviewers, author-led evaluations, and submissions to Agents4Science, a venue dedicated to AI-driven scientific contributions. The findings demonstrate that Jr. AI Scientist generates papers receiving higher review scores than existing fully automated systems. Nevertheless, we identify important limitations from both the author evaluation and the Agents4Science reviews, indicating the potential risks of directly applying current AI Scientist systems and key challenges for future research. Finally, we comprehensively report various risks identified during development. We believe this study clarifies the current role and limitations of AI Scientist systems, offering insights into the areas that still require human expertise and the risks that may emerge as these systems evolve.

Paper Structure

This paper contains 29 sections, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: Jr. AI Scientist Workflow. We provide the baseline paper, its LaTeX source files, and the associated codebase. By effectively utilizing these resources across all phases, the system significantly improves the quality of the generated paper.
  • Figure 2: Jr. AI Scientist Workflow for the Experiment Phase. The workflow consists of three stages. Through bug management and performance tracking, our system passes the most promising experimental nodes to the next stage.
  • Figure 3: Jr. AI Scientist Workflow for the Writing Phase. The Writing process consists of three steps: Draft Writing, Reflection, and Adjustment.
  • Figure 4: An example of a generated paper. Our Jr. AI Scientist can generate full-length research papers with appendices.