Table of Contents
Fetching ...

Text-to-Pipeline: Bridging Natural Language and Data Preparation Pipelines

Yuhang Ge, Yachuan Liu, Zhangyan Ye, Yuren Mao, Yunjun Gao

TL;DR

We address the bottleneck of data preparation by introducing Text-to-Pipeline, a NL-to-DP task, and PARROT, a large-scale NL-driven DP benchmark built from real production pipelines. The authors show that modeling pipelines as a DSL with schema-aware validation significantly improves execution accuracy over direct code generation, with EA of 62.9% vs 33.8% for Pandas, while LLMs struggle with multi-step compositionality and semantic grounding. They propose Pipeline-Agent, an execution-aware planner that iteratively reasons over intermediate states, achieving the state-of-the-art EA of 76.2% but still leaving a substantial gap to be closed. PARROT provides a realistic, scalable platform for developing autonomous data preparation agents and highlights key research directions including richer planning, grounding, and state-tracking capabilities.

Abstract

Data preparation (DP) transforms raw data into a form suitable for downstream applications, typically by composing operations into executable pipelines. Building such pipelines is time-consuming and requires sophisticated programming skills, posing a significant barrier for non-experts. To lower this barrier, we introduce Text-to-Pipeline, a new task that translates NL data preparation instructions into DP pipelines, and PARROT, a large-scale benchmark to support systematic evaluation. To ensure realistic DP scenarios, PARROT is built by mining transformation patterns from production pipelines and instantiating them on 23,009 real-world tables, resulting in ~18,000 tasks spanning 16 core operators. Our empirical evaluation on PARROT reveals a critical failure mode in cutting-edge LLMs: they struggle not only with multi-step compositional logic but also with semantic parameter grounding. We thus establish a strong baseline with Pipeline-Agent, an execution-aware agent that iteratively reflects on intermediate states. While it achieves state-of-the-art performance, a significant gap remains, underscoring the deep, unsolved challenges for PARROT. It provides the essential, large-scale testbed for developing and evaluating the next generation of autonomous data preparation agentic systems.

Text-to-Pipeline: Bridging Natural Language and Data Preparation Pipelines

TL;DR

We address the bottleneck of data preparation by introducing Text-to-Pipeline, a NL-to-DP task, and PARROT, a large-scale NL-driven DP benchmark built from real production pipelines. The authors show that modeling pipelines as a DSL with schema-aware validation significantly improves execution accuracy over direct code generation, with EA of 62.9% vs 33.8% for Pandas, while LLMs struggle with multi-step compositionality and semantic grounding. They propose Pipeline-Agent, an execution-aware planner that iteratively reasons over intermediate states, achieving the state-of-the-art EA of 76.2% but still leaving a substantial gap to be closed. PARROT provides a realistic, scalable platform for developing autonomous data preparation agents and highlights key research directions including richer planning, grounding, and state-tracking capabilities.

Abstract

Data preparation (DP) transforms raw data into a form suitable for downstream applications, typically by composing operations into executable pipelines. Building such pipelines is time-consuming and requires sophisticated programming skills, posing a significant barrier for non-experts. To lower this barrier, we introduce Text-to-Pipeline, a new task that translates NL data preparation instructions into DP pipelines, and PARROT, a large-scale benchmark to support systematic evaluation. To ensure realistic DP scenarios, PARROT is built by mining transformation patterns from production pipelines and instantiating them on 23,009 real-world tables, resulting in ~18,000 tasks spanning 16 core operators. Our empirical evaluation on PARROT reveals a critical failure mode in cutting-edge LLMs: they struggle not only with multi-step compositional logic but also with semantic parameter grounding. We thus establish a strong baseline with Pipeline-Agent, an execution-aware agent that iteratively reflects on intermediate states. While it achieves state-of-the-art performance, a significant gap remains, underscoring the deep, unsolved challenges for PARROT. It provides the essential, large-scale testbed for developing and evaluating the next generation of autonomous data preparation agentic systems.

Paper Structure

This paper contains 32 sections, 2 equations, 13 figures, 7 tables, 1 algorithm.

Figures (13)

  • Figure 1: Task overview of Text-to-Pipeline.
  • Figure 2: The data synthesis workflow of PARROT.
  • Figure 3: Left: pipeline length distribution over three difficulty levels. Middle: instruction length distribution by token frequency. Right: operation transition matrix. The abbreviation "w2l" stands for the wide-to-long operator.
  • Figure 4: Heatmap of parameter usage across different operations. Darker colors indicate higher parameter complexity.
  • Figure 5: Operation distribution.
  • ...and 8 more figures