Table of Contents
Fetching ...

SPIO: Ensemble and Selective Strategies via LLM-Based Multi-Agent Planning in Automated Data Science

Wonduk Seo, Juhyeon Lee, Yanjun Shao, Qingshan Zhou, Seunghyun Lee, Yi Bu

TL;DR

SPIO tackles the rigidity of traditional AutoML by introducing sequential plan integration and optimization across four pipeline modules, enabling adaptive multi-path exploration. It combines fundamental code-generation agents with an LLM-driven sequential planner, delivering SPIO-S (single best plan) and SPIO-E (top-$k$ ensemble) variants. Across Kaggle/OpenML benchmarks and multiple LLM backends, SPIO yields an average improvement of $5.6\%$ over strong baselines, with ablations showing that feature engineering and hyperparameter tuning are key performance drivers. The work advances automated data science by providing a transparent, robust framework that balances exploration, fidelity, and efficiency.

Abstract

Large Language Models (LLMs) have enabled dynamic reasoning in automated data analytics, yet recent multi-agent systems remain limited by rigid, single-path workflows that restrict strategic exploration and often lead to suboptimal outcomes. To overcome these limitations, we propose SPIO (Sequential Plan Integration and Optimization), a framework that replaces rigid workflows with adaptive, multi-path planning across four core modules: data preprocessing, feature engineering, model selection, and hyperparameter tuning. In each module, specialized agents generate diverse candidate strategies, which are cascaded and refined by an optimization agent. SPIO offers two operating modes: SPIO-S for selecting a single optimal pipeline, and SPIO-E for ensembling top-k pipelines to maximize robustness. Extensive evaluations on Kaggle and OpenML benchmarks show that SPIO consistently outperforms state-of-the-art baselines, achieving an average performance gain of 5.6%. By explicitly exploring and integrating multiple solution paths, SPIO delivers a more flexible, accurate, and reliable foundation for automated data science.

SPIO: Ensemble and Selective Strategies via LLM-Based Multi-Agent Planning in Automated Data Science

TL;DR

SPIO tackles the rigidity of traditional AutoML by introducing sequential plan integration and optimization across four pipeline modules, enabling adaptive multi-path exploration. It combines fundamental code-generation agents with an LLM-driven sequential planner, delivering SPIO-S (single best plan) and SPIO-E (top- ensemble) variants. Across Kaggle/OpenML benchmarks and multiple LLM backends, SPIO yields an average improvement of over strong baselines, with ablations showing that feature engineering and hyperparameter tuning are key performance drivers. The work advances automated data science by providing a transparent, robust framework that balances exploration, fidelity, and efficiency.

Abstract

Large Language Models (LLMs) have enabled dynamic reasoning in automated data analytics, yet recent multi-agent systems remain limited by rigid, single-path workflows that restrict strategic exploration and often lead to suboptimal outcomes. To overcome these limitations, we propose SPIO (Sequential Plan Integration and Optimization), a framework that replaces rigid workflows with adaptive, multi-path planning across four core modules: data preprocessing, feature engineering, model selection, and hyperparameter tuning. In each module, specialized agents generate diverse candidate strategies, which are cascaded and refined by an optimization agent. SPIO offers two operating modes: SPIO-S for selecting a single optimal pipeline, and SPIO-E for ensembling top-k pipelines to maximize robustness. Extensive evaluations on Kaggle and OpenML benchmarks show that SPIO consistently outperforms state-of-the-art baselines, achieving an average performance gain of 5.6%. By explicitly exploring and integrating multiple solution paths, SPIO delivers a more flexible, accurate, and reliable foundation for automated data science.

Paper Structure

This paper contains 50 sections, 13 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: SPIO framework overview. Given dataset descriptions, SPIO produces baseline code and results for each pipeline module. A sequential planning agent then proposes candidate improvements per module, after which SPIO either selects a single best end-to-end path (SPIO-S) or ensembles the top-$k$ paths (SPIO-E) to generate the prediction file.
  • Figure 2: Sequential plan integration and optimization in SPIO (Titanic example).SPIO enumerates candidate plans for preprocessing, feature engineering, model selection, and hyperparameter tuning (e.g., predictive Age imputation, interaction/target-encoded features, RF/XGBoost/LogReg choices, and Optuna/Hyperopt search), then composes them into complete pipelines. Compared with GPT-4o baselines (Zero-Shot ACC 0.78; CoT ACC 0.80), SPIO selects the best single path (SPIO-S, ACC 0.84) or ensembles the top-$k$ pipelines (SPIO-E, final ACC 0.86).
  • Figure 3: Distribution of Preprocessing and Feature Engineering Method. PCA projection of embedded preprocessing and Feature engineering methods.
  • Figure 4: Token usage breakdown by framework and steps. Input and output tokens are shown as stacked bars on a log-scaled x-axis.
  • Figure 5: Performance versus token cost trade-off across different frameworks. Left: classification performance measured by ACC/ROC (higher is better). Right: regression performance measured by RMSE (lower is better). Token cost is shown on a logarithmic scale.
  • ...and 5 more figures