Table of Contents
Fetching ...

Beyond Static Pipelines: Learning Dynamic Workflows for Text-to-SQL

Yihan Wang, Peiyu Liu, Runyu Chen, Wei Xu

TL;DR

This work proposes SquRL, a reinforcement learning framework that enhances LLMs'reasoning capability in adaptive workflow construction and designs a rule-based reward function and introduces two effective training mechanisms: dynamic actor masking to encourage broader exploration and pseudo rewards to improve training efficiency.

Abstract

Text-to-SQL has recently achieved impressive progress, yet remains difficult to apply effectively in real-world scenarios. This gap stems from the reliance on single static workflows, fundamentally limiting scalability to out-of-distribution and long-tail scenarios. Instead of requiring users to select suitable methods through extensive experimentation, we attempt to enable systems to adaptively construct workflows at inference time. Through theoretical and empirical analysis, we demonstrate that optimal dynamic policies consistently outperform the best static workflow, with performance gains fundamentally driven by heterogeneity across candidate workflows. Motivated by this, we propose SquRL, a reinforcement learning framework that enhances LLMs' reasoning capability in adaptive workflow construction. We design a rule-based reward function and introduce two effective training mechanisms: dynamic actor masking to encourage broader exploration, and pseudo rewards to improve training efficiency. Experiments on widely-used Text-to-SQL benchmarks demonstrate that dynamic workflow construction consistently outperforms the best static workflow methods, with especially pronounced gains on complex and out-of-distribution queries. The codes are available at https://github.com/Satissss/SquRL

Beyond Static Pipelines: Learning Dynamic Workflows for Text-to-SQL

TL;DR

This work proposes SquRL, a reinforcement learning framework that enhances LLMs'reasoning capability in adaptive workflow construction and designs a rule-based reward function and introduces two effective training mechanisms: dynamic actor masking to encourage broader exploration and pseudo rewards to improve training efficiency.

Abstract

Text-to-SQL has recently achieved impressive progress, yet remains difficult to apply effectively in real-world scenarios. This gap stems from the reliance on single static workflows, fundamentally limiting scalability to out-of-distribution and long-tail scenarios. Instead of requiring users to select suitable methods through extensive experimentation, we attempt to enable systems to adaptively construct workflows at inference time. Through theoretical and empirical analysis, we demonstrate that optimal dynamic policies consistently outperform the best static workflow, with performance gains fundamentally driven by heterogeneity across candidate workflows. Motivated by this, we propose SquRL, a reinforcement learning framework that enhances LLMs' reasoning capability in adaptive workflow construction. We design a rule-based reward function and introduce two effective training mechanisms: dynamic actor masking to encourage broader exploration, and pseudo rewards to improve training efficiency. Experiments on widely-used Text-to-SQL benchmarks demonstrate that dynamic workflow construction consistently outperforms the best static workflow methods, with especially pronounced gains on complex and out-of-distribution queries. The codes are available at https://github.com/Satissss/SquRL
Paper Structure (68 sections, 2 theorems, 40 equations, 5 figures, 5 tables)

This paper contains 68 sections, 2 theorems, 40 equations, 5 figures, 5 tables.

Key Result

Theorem 3.1

For any finite workflow set $\Omega$ and query distribution $D$, we have $\text{EX}_{\text{dynamic}} \ge \text{EX}_{\text{static}}$. Moreover, $\Delta = 0$ if and only if there exists a workflow $W^* \in \Omega$ whose success region covers the union of all workflows' success regions almost surely un

Figures (5)

  • Figure 1: Oracle dynamic workflow (red star) vs. static workflows.
  • Figure 2: Overview of the SquRL framework. Traditional approaches rely on a single fixed workflow to handle diverse query tasks. In contrast, SquRL dynamically constructs workflows tailored to each query, enabling more flexible, accurate, and robust SQL prediction.
  • Figure 3: Performance comparison with and without Dynamic Actor Masking (DAM) across different retention rates.
  • Figure 4: Pairwise distance matrix between different workflows.
  • Figure 5: Performance metrics across task difficulty levels. (Left) Max/min execution times versus task difficulty (measured by number of capable workflows). (Right) Case distribution and time reduction by difficulty. Easier tasks (left) show greater time variation and more samples, while harder tasks (right) demonstrate convergent execution times.

Theorems & Definitions (3)

  • Theorem 3.1
  • Theorem 2.1: Non-negativity and characterization of equality
  • proof