Table of Contents
Fetching ...

AutoSynth: Automated Workflow Optimization for High-Quality Synthetic Dataset Generation via Monte Carlo Tree Search

Shuzhen Bi, Chang Song, Siyu Song, Jinze Lv, Jian Chen, Xinyun Wang, Aimin Zhou, Hao Hao

TL;DR

AutoSynth tackles the data bottleneck in domain-specific SFT by turning synthetic-data workflow design into a dataset-free optimization problem solved with Monte Carlo Tree Search guided by a novel hybrid reward. The framework uses two LLM-based judges to evaluate samples and workflows, and dynamically regenerates task-specific evaluation metrics to enable meta-learning without ground-truth data. Empirical results on subjective educational tasks show AutoSynth-trained models surpass the base model and approach, but do not yet match expert-designed workflows in human preferences, while dramatically reducing initial human effort by over 90%. The work demonstrates a scalable path toward fully automated, data-centric AI for subjective tasks and highlights directions to incorporate domain knowledge and pedagogy in future optimizations.

Abstract

Supervised fine-tuning (SFT) of large language models (LLMs) for specialized tasks requires high-quality datasets, but manual curation is prohibitively expensive. Synthetic data generation offers scalability, but its effectiveness relies on complex, multi-stage workflows, integrating prompt engineering and model orchestration. Existing automated workflow methods face a cold start problem: they require labeled datasets for reward modeling, which is especially problematic for subjective, open-ended tasks with no objective ground truth. We introduce AutoSynth, a framework that automates workflow discovery and optimization without reference datasets by reframing the problem as a Monte Carlo Tree Search guided by a novel dataset-free hybrid reward. This reward enables meta-learning through two LLM-as-judge components: one evaluates sample quality using dynamically generated task-specific metrics, and another assesses workflow code and prompt quality. Experiments on subjective educational tasks show that while expert-designed workflows achieve higher human preference rates (96-99% win rates vs. AutoSynth's 40-51%), models trained on AutoSynth-generated data dramatically outperform baselines (40-51% vs. 2-5%) and match or surpass expert workflows on certain metrics, suggesting discovery of quality dimensions beyond human intuition. These results are achieved while reducing human effort from 5-7 hours to just 30 minutes (>90% reduction). AutoSynth tackles the cold start issue in data-centric AI, offering a scalable, cost-effective method for subjective LLM tasks. Code: https://github.com/bisz9918-maker/AutoSynth.

AutoSynth: Automated Workflow Optimization for High-Quality Synthetic Dataset Generation via Monte Carlo Tree Search

TL;DR

AutoSynth tackles the data bottleneck in domain-specific SFT by turning synthetic-data workflow design into a dataset-free optimization problem solved with Monte Carlo Tree Search guided by a novel hybrid reward. The framework uses two LLM-based judges to evaluate samples and workflows, and dynamically regenerates task-specific evaluation metrics to enable meta-learning without ground-truth data. Empirical results on subjective educational tasks show AutoSynth-trained models surpass the base model and approach, but do not yet match expert-designed workflows in human preferences, while dramatically reducing initial human effort by over 90%. The work demonstrates a scalable path toward fully automated, data-centric AI for subjective tasks and highlights directions to incorporate domain knowledge and pedagogy in future optimizations.

Abstract

Supervised fine-tuning (SFT) of large language models (LLMs) for specialized tasks requires high-quality datasets, but manual curation is prohibitively expensive. Synthetic data generation offers scalability, but its effectiveness relies on complex, multi-stage workflows, integrating prompt engineering and model orchestration. Existing automated workflow methods face a cold start problem: they require labeled datasets for reward modeling, which is especially problematic for subjective, open-ended tasks with no objective ground truth. We introduce AutoSynth, a framework that automates workflow discovery and optimization without reference datasets by reframing the problem as a Monte Carlo Tree Search guided by a novel dataset-free hybrid reward. This reward enables meta-learning through two LLM-as-judge components: one evaluates sample quality using dynamically generated task-specific metrics, and another assesses workflow code and prompt quality. Experiments on subjective educational tasks show that while expert-designed workflows achieve higher human preference rates (96-99% win rates vs. AutoSynth's 40-51%), models trained on AutoSynth-generated data dramatically outperform baselines (40-51% vs. 2-5%) and match or surpass expert workflows on certain metrics, suggesting discovery of quality dimensions beyond human intuition. These results are achieved while reducing human effort from 5-7 hours to just 30 minutes (>90% reduction). AutoSynth tackles the cold start issue in data-centric AI, offering a scalable, cost-effective method for subjective LLM tasks. Code: https://github.com/bisz9918-maker/AutoSynth.

Paper Structure

This paper contains 27 sections, 5 equations, 1 figure, 4 tables.

Figures (1)

  • Figure 1: Overview of the AutoSynth framework. The system operates in two phases: (Left) Human-in-the-loop initialization generates a functional baseline workflow $W_0$ through 1-3 iterations of LLM generation, execution, and human feedback; (Middle & Right) MCTS-driven optimization iteratively refines workflows through selection, refinement, evaluation, and backpropagation, guided by a hybrid reward signal combining sample quality ($Score_{sample}$) and workflow quality ($Score_{workflow}$).