FlowMind: Execute-Summarize for Structured Workflow Generation from LLM Reasoning
Yihao Liu, Ziyun Zhang, Zile He, Huaqian Cai
TL;DR
FlowMind tackles the challenge of translating free-form LLM reasoning and tool use into reliable, structured workflows by decoupling task execution from workflow construction. The Execute–Summarize framework uses an execution phase to complete tasks with domain tools, followed by a summarization phase that reconstructs a workflow graph from verified execution traces. FlowBench provides a synthetic, evaluation-rich benchmark for both task solving and workflow induction, and extensive experiments show that ES–based variants, especially ES-P&E, outperform one-stage baselines in both correctness and efficiency. The work demonstrates robustness across model scales, reveals insights into cognitive burden, and highlights practical benefits in interpretability, reproducibility, and downstream automation, while acknowledging limitations of synthetic data and summarization fidelity.
Abstract
LLMs can solve complex tasks through reasoning and tool use, but accurately translating these solutions into structured workflows remains challenging. We model workflows as sequences of tool use and reformulate the problem as designing a mechanism that can both solve tasks and reliably construct workflows. Prior approaches that build workflows during execution often suffer from inaccuracies due to interference between the two processes. We propose an Execute-Summarize(ES) framework that decouples task execution from workflow construction: the model first completes the task using available tools, then independently reconstructs a structured workflow from execution traces. This separation improves workflow accuracy and robustness. We introduce FlowBench and show through extensive experiments that our approach outperforms existing methods, providing a reliable paradigm for grounding free-form LLM reasoning into structured workflows.
