Learning to Compose for Cross-domain Agentic Workflow Generation
Jialiang Wang, Shengxiang Xu, Hanmo Liu, Jiachuan Wang, Yuyu Luo, Shimin Di, Min-Ling Zhang, Lei Chen
TL;DR
CapFlow addresses the challenge of cross-domain agentic workflow generation by learning a compact set of reusable workflow capability bases and a task-conditioned sparse composer to produce executable, task-specific workflows in a single pass. It introduces counterfactual capability attribution to align basis selection with causal factors of success, enabling controllable recomposition across domain shifts. Across multi-domain, cross-domain, and unseen-domain evaluations, CapFlow outperforms 20-iteration refinement baselines while reducing inference-time cost and latency, and reveals interpretable patterns of shared capability factors. The approach advances reliable orchestration of reasoning, verification, and repair in open-ended tasks by enabling transferable, data-driven modularity without iterative refinement at inference time.
Abstract
Automatically generating agentic workflows -- executable operator graphs or codes that orchestrate reasoning, verification, and repair -- has become a practical way to solve complex tasks beyond what single-pass LLM generation can reliably handle. Yet what constitutes a good workflow depends heavily on the task distribution and the available operators. Under domain shift, current systems typically rely on iterative workflow refinement to discover a feasible workflow from a large workflow space, incurring high iteration costs and yielding unstable, domain-specific behavior. In response, we internalize a decompose-recompose-decide mechanism into an open-source LLM for cross-domain workflow generation. To decompose, we learn a compact set of reusable workflow capabilities across diverse domains. To recompose, we map each input task to a sparse composition over these bases to generate a task-specific workflow in a single pass. To decide, we attribute the success or failure of workflow generation to counterfactual contributions from learned capabilities, thereby capturing which capabilities actually drive success by their marginal effects. Across stringent multi-domain, cross-domain, and unseen-domain evaluations, our 1-pass generator surpasses SOTA refinement baselines that consume 20 iterations, while substantially reducing generation latency and cost.
