Table of Contents
Fetching ...

Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback

Jiakang Yuan, Xiangchao Yan, Shiyang Feng, Bo Zhang, Tao Chen, Botian Shi, Wanli Ouyang, Yu Qiao, Lei Bai, Bowen Zhou

TL;DR

Dolphin addresses the need for a fully closed-loop auto-research system by integrating idea generation, experimental verification, and results feedback driven by task-specific paper ranking and traceback-guided debugging. The framework uses an LLM-driven pipeline to retrieve relevant literature, generate and filter novel ideas, automatically implement experiments via code templates, and iteratively refine ideas based on experimental outcomes. Empirical results show Dolphin yielding measurable improvements across 3D point classification, 2D image classification, and sentiment tasks, with competitive performance relative to human-designed state-of-the-art methods in some cases and successful integration with MLE-bench workflows. The work establishes a viable path toward automatic scientific research while highlighting limitations and future directions for more robust cross-disciplinary knowledge integration and advanced code understanding capabilities.

Abstract

The scientific research paradigm is undergoing a profound transformation owing to the development of Artificial Intelligence (AI). Recent works demonstrate that various AI-assisted research methods can largely improve research efficiency by improving data analysis, accelerating computation, and fostering novel idea generation. To further move towards the ultimate goal (i.e., automatic scientific research), in this paper, we introduce Dolphin, a closed-loop LLM-driven framework to enhance the automation level of scientific research. Dolphin first generates novel ideas based on feedback from previous experiments and relevant papers ranked by the topic and task attributes. Then, the generated ideas can be implemented using a code template refined and debugged with the designed exception-traceback-guided local code structure. Finally, Dolphin automatically analyzes the results of each idea and feeds the results back to the next round of idea generation. Experiments are conducted on the benchmark datasets of different topics and a subset of MLE-bench. Results show that Dolphin can continuously improve the performance of the input topic in a loop. We highlight that Dolphin can automatically propose methods that are comparable to the state-of-the-art in some tasks such as 3D point classification.

Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback

TL;DR

Dolphin addresses the need for a fully closed-loop auto-research system by integrating idea generation, experimental verification, and results feedback driven by task-specific paper ranking and traceback-guided debugging. The framework uses an LLM-driven pipeline to retrieve relevant literature, generate and filter novel ideas, automatically implement experiments via code templates, and iteratively refine ideas based on experimental outcomes. Empirical results show Dolphin yielding measurable improvements across 3D point classification, 2D image classification, and sentiment tasks, with competitive performance relative to human-designed state-of-the-art methods in some cases and successful integration with MLE-bench workflows. The work establishes a viable path toward automatic scientific research while highlighting limitations and future directions for more robust cross-disciplinary knowledge integration and advanced code understanding capabilities.

Abstract

The scientific research paradigm is undergoing a profound transformation owing to the development of Artificial Intelligence (AI). Recent works demonstrate that various AI-assisted research methods can largely improve research efficiency by improving data analysis, accelerating computation, and fostering novel idea generation. To further move towards the ultimate goal (i.e., automatic scientific research), in this paper, we introduce Dolphin, a closed-loop LLM-driven framework to enhance the automation level of scientific research. Dolphin first generates novel ideas based on feedback from previous experiments and relevant papers ranked by the topic and task attributes. Then, the generated ideas can be implemented using a code template refined and debugged with the designed exception-traceback-guided local code structure. Finally, Dolphin automatically analyzes the results of each idea and feeds the results back to the next round of idea generation. Experiments are conducted on the benchmark datasets of different topics and a subset of MLE-bench. Results show that Dolphin can continuously improve the performance of the input topic in a loop. We highlight that Dolphin can automatically propose methods that are comparable to the state-of-the-art in some tasks such as 3D point classification.
Paper Structure (27 sections, 11 figures, 8 tables, 1 algorithm)

This paper contains 27 sections, 11 figures, 8 tables, 1 algorithm.

Figures (11)

  • Figure 1: Comparisons of the four stages in the evolutionary trajectory towards auto-research including (a) Entirely human-driven research, (b) AI-assisted research, (c) Semi-automatic research, and (d) Auto-research.
  • Figure 2: Dolphin first generates a set of ideas based on the retrieved papers. After filtering ideas, experimental plans will be generated for these filtered ideas. Then, codes can be generated and debugged using the proposed error-traceback-guided debugging process. Finally, the results of successfully executed experiments will be auto-analyzed and reflected into the next round of ideas generation.
  • Figure 3: Debugging with traceback-guided local code structure.
  • Figure 5: Prompts of paper retrieval, paper ranking, and ideas generation.
  • Figure 6: An example of independence check.
  • ...and 6 more figures