Text-to-Pipeline: Bridging Natural Language and Data Preparation Pipelines

Yuhang Ge; Yachuan Liu; Zhangyan Ye; Yuren Mao; Yunjun Gao

Text-to-Pipeline: Bridging Natural Language and Data Preparation Pipelines

Yuhang Ge, Yachuan Liu, Zhangyan Ye, Yuren Mao, Yunjun Gao

TL;DR

We address the bottleneck of data preparation by introducing Text-to-Pipeline, a NL-to-DP task, and PARROT, a large-scale NL-driven DP benchmark built from real production pipelines. The authors show that modeling pipelines as a DSL with schema-aware validation significantly improves execution accuracy over direct code generation, with EA of 62.9% vs 33.8% for Pandas, while LLMs struggle with multi-step compositionality and semantic grounding. They propose Pipeline-Agent, an execution-aware planner that iteratively reasons over intermediate states, achieving the state-of-the-art EA of 76.2% but still leaving a substantial gap to be closed. PARROT provides a realistic, scalable platform for developing autonomous data preparation agents and highlights key research directions including richer planning, grounding, and state-tracking capabilities.

Abstract

Data preparation (DP) transforms raw data into a form suitable for downstream applications, typically by composing operations into executable pipelines. Building such pipelines is time-consuming and requires sophisticated programming skills, posing a significant barrier for non-experts. To lower this barrier, we introduce Text-to-Pipeline, a new task that translates NL data preparation instructions into DP pipelines, and PARROT, a large-scale benchmark to support systematic evaluation. To ensure realistic DP scenarios, PARROT is built by mining transformation patterns from production pipelines and instantiating them on 23,009 real-world tables, resulting in ~18,000 tasks spanning 16 core operators. Our empirical evaluation on PARROT reveals a critical failure mode in cutting-edge LLMs: they struggle not only with multi-step compositional logic but also with semantic parameter grounding. We thus establish a strong baseline with Pipeline-Agent, an execution-aware agent that iteratively reflects on intermediate states. While it achieves state-of-the-art performance, a significant gap remains, underscoring the deep, unsolved challenges for PARROT. It provides the essential, large-scale testbed for developing and evaluating the next generation of autonomous data preparation agentic systems.

Text-to-Pipeline: Bridging Natural Language and Data Preparation Pipelines

TL;DR

Abstract

Text-to-Pipeline: Bridging Natural Language and Data Preparation Pipelines

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)