Table of Contents
Fetching ...

DyFlow: Dynamic Workflow Framework for Agentic Reasoning

Yanbo Wang, Zixiang Xu, Yue Huang, Xiangqi Wang, Zirui Song, Lang Gao, Chenxi Wang, Xiangru Tang, Yue Zhao, Arman Cohan, Xiangliang Zhang, Xiuying Chen

TL;DR

DyFlow tackles the brittleness of static, dataset-specific LLM agent workflows by introducing a dynamic, feedback-driven workflow framework with a designer–executor split. The designer generates and revises subgoal plans as the executor carries out dynamic operators with context-aware inputs, guided by intermediate feedback through a two-phase training regime: supervised distillation and self-play preference optimization. Empirically, DyFlow achieves state-of-the-art performance across five reasoning domains and shows strong cross-domain and cross-model generalization, including zero-shot transfer to unseen tasks and executors. The work demonstrates that execution-adaptive planning yields more robust, scalable reasoning, with a public implementation and favorable cost-performance trade-offs.

Abstract

Agent systems based on large language models (LLMs) have shown great potential in complex reasoning tasks, but building efficient and generalizable workflows remains a major challenge. Most existing approaches rely on manually designed processes, which limits their adaptability across different tasks. While a few methods attempt automated workflow generation, they are often tied to specific datasets or query types and make limited use of intermediate feedback, reducing system robustness and reasoning depth. Moreover, their operations are typically predefined and inflexible. To address these limitations, we propose DyFlow, a dynamic workflow generation framework that adaptively constructs and adjusts reasoning procedures based on task requirements and real-time intermediate feedback, thereby enhancing cross-task generalization. DyFlow consists of two core components: a designer and an executor. The designer decomposes complex problems into a sequence of sub-goals defined by high-level objectives and dynamically plans the next steps based on intermediate outputs and feedback. These plans are then carried out by the executor, which executes each operation using dynamic operators with context-aware parameterization, enabling flexible and semantically grounded reasoning. We systematically evaluate DyFlow across diverse domains, including social reasoning, biomedical tasks, mathematical problem solving, and code generation. Results demonstrate that DyFlow significantly outperforms existing baselines, achieving substantial Pass@k improvements and exhibiting robust generalization across diverse domains. The code is publicly available at https://github.com/wyf23187/DyFlow.

DyFlow: Dynamic Workflow Framework for Agentic Reasoning

TL;DR

DyFlow tackles the brittleness of static, dataset-specific LLM agent workflows by introducing a dynamic, feedback-driven workflow framework with a designer–executor split. The designer generates and revises subgoal plans as the executor carries out dynamic operators with context-aware inputs, guided by intermediate feedback through a two-phase training regime: supervised distillation and self-play preference optimization. Empirically, DyFlow achieves state-of-the-art performance across five reasoning domains and shows strong cross-domain and cross-model generalization, including zero-shot transfer to unseen tasks and executors. The work demonstrates that execution-adaptive planning yields more robust, scalable reasoning, with a public implementation and favorable cost-performance trade-offs.

Abstract

Agent systems based on large language models (LLMs) have shown great potential in complex reasoning tasks, but building efficient and generalizable workflows remains a major challenge. Most existing approaches rely on manually designed processes, which limits their adaptability across different tasks. While a few methods attempt automated workflow generation, they are often tied to specific datasets or query types and make limited use of intermediate feedback, reducing system robustness and reasoning depth. Moreover, their operations are typically predefined and inflexible. To address these limitations, we propose DyFlow, a dynamic workflow generation framework that adaptively constructs and adjusts reasoning procedures based on task requirements and real-time intermediate feedback, thereby enhancing cross-task generalization. DyFlow consists of two core components: a designer and an executor. The designer decomposes complex problems into a sequence of sub-goals defined by high-level objectives and dynamically plans the next steps based on intermediate outputs and feedback. These plans are then carried out by the executor, which executes each operation using dynamic operators with context-aware parameterization, enabling flexible and semantically grounded reasoning. We systematically evaluate DyFlow across diverse domains, including social reasoning, biomedical tasks, mathematical problem solving, and code generation. Results demonstrate that DyFlow significantly outperforms existing baselines, achieving substantial Pass@k improvements and exhibiting robust generalization across diverse domains. The code is publicly available at https://github.com/wyf23187/DyFlow.

Paper Structure

This paper contains 37 sections, 4 theorems, 15 equations, 7 figures, 10 tables, 1 algorithm.

Key Result

Lemma 1

$\Pi_{\mathrm{stat}}\subseteq\Pi_{\mathrm{DyFlow}}$, as any static $\pi_{\mathrm{stat}}$ can be implemented by DyFlow by ignoring $s_t$ and always returning $G_{\mathrm{fix}}$.

Figures (7)

  • Figure 1: Paradigm shift in LLM-based reasoning workflows. From left to right: static workflows apply fixed sequences across all tasks; dataset-specific and question-specific workflows allow more variation but rely on predefined operations (OP); DyFlow adopts an execution-adaptive paradigm that dynamically adjusts the workflow based on intermediate feedback.
  • Figure 2: DyFlow dynamically constructs reasoning workflows by generating stage subgraphs based on the current task state. A high-level designer plans operator sequences, while a low-level executor executes them using memory. The designer is trained via supervised distillation and self-play preference optimization.
  • Figure 3: Average pass@k comparisons between DyFlow and CoT across 5 benchmarks.
  • Figure 4: Cross-executor performance comparison between CoT and DyFlow across five reasoning tasks. DyFlow improves performance across all model scales, with larger gains for stronger models on more challenging tasks. Detailed results are provided in Appendix \ref{['sec:appendix-cross-executor']}, Table \ref{['tab:appendix-cross-executor']}.
  • Figure 5: Case Study between CoT and DyFlow on MATH dataset.
  • ...and 2 more figures

Theorems & Definitions (6)

  • Lemma 1: Static Policies as a Special Case
  • Theorem 1: DyFlow Is Never Worse Than Static
  • Proof 1
  • Lemma 2: Error propagation
  • proof
  • Theorem 2: DyFlow performance bound