Table of Contents
Fetching ...

SciDER: Scientific Data-centric End-to-end Researcher

Ke Lin, Yilin Lu, Shreyas Bhat, Xuehang Guo, Junier Oliva, Qingyun Wang

TL;DR

SciDER is introduced, a data-centric end-to-end system that automates the research lifecycle and excels in specialized data-driven scientific discovery and outperforms general-purpose agents and state-of-the-art models through its self-evolving memory and critic-led feedback loop.

Abstract

Automated scientific discovery with large language models is transforming the research lifecycle from ideation to experimentation, yet existing agents struggle to autonomously process raw data collected from scientific experiments. We introduce SciDER, a data-centric end-to-end system that automates the research lifecycle. Unlike traditional frameworks, our specialized agents collaboratively parse and analyze raw scientific data, generate hypotheses and experimental designs grounded in specific data characteristics, and write and execute corresponding code. Evaluation on three benchmarks shows SciDER excels in specialized data-driven scientific discovery and outperforms general-purpose agents and state-of-the-art models through its self-evolving memory and critic-led feedback loop. Distributed as a modular Python package, we also provide easy-to-use PyPI packages with a lightweight web interface to accelerate autonomous, data-driven research and aim to be accessible to all researchers and developers.

SciDER: Scientific Data-centric End-to-end Researcher

TL;DR

SciDER is introduced, a data-centric end-to-end system that automates the research lifecycle and excels in specialized data-driven scientific discovery and outperforms general-purpose agents and state-of-the-art models through its self-evolving memory and critic-led feedback loop.

Abstract

Automated scientific discovery with large language models is transforming the research lifecycle from ideation to experimentation, yet existing agents struggle to autonomously process raw data collected from scientific experiments. We introduce SciDER, a data-centric end-to-end system that automates the research lifecycle. Unlike traditional frameworks, our specialized agents collaboratively parse and analyze raw scientific data, generate hypotheses and experimental designs grounded in specific data characteristics, and write and execute corresponding code. Evaluation on three benchmarks shows SciDER excels in specialized data-driven scientific discovery and outperforms general-purpose agents and state-of-the-art models through its self-evolving memory and critic-led feedback loop. Distributed as a modular Python package, we also provide easy-to-use PyPI packages with a lightweight web interface to accelerate autonomous, data-driven research and aim to be accessible to all researchers and developers.
Paper Structure (25 sections, 1 equation, 5 figures, 2 tables, 1 algorithm)

This paper contains 25 sections, 1 equation, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 2: Self-evolving memory mechanism of SciDER. Memories are categorized as short- or long-term chunks within agent contexts, while new responses are summarized and integrated back into the memory bank.
  • Figure 3: Results of SciCode. Solve rates for main and sub-problems are reported, with higher rates indicating greater capability for domain-specific problems.
  • Figure 4: A screenshot of the web interface of SciDER.
  • Figure 5: A screenshot of the Accordion UI to check the output of each sub-agent's output step-by-step.
  • Figure 6: Left: The research proposal for this iteration, with the green highlights indicating the core experimental topic. Right: The experimental outputs, including the evaluation report (red highlight) and the corresponding implementation code (lower right).