Table of Contents
Fetching ...

Learning to Compose for Cross-domain Agentic Workflow Generation

Jialiang Wang, Shengxiang Xu, Hanmo Liu, Jiachuan Wang, Yuyu Luo, Shimin Di, Min-Ling Zhang, Lei Chen

TL;DR

CapFlow addresses the challenge of cross-domain agentic workflow generation by learning a compact set of reusable workflow capability bases and a task-conditioned sparse composer to produce executable, task-specific workflows in a single pass. It introduces counterfactual capability attribution to align basis selection with causal factors of success, enabling controllable recomposition across domain shifts. Across multi-domain, cross-domain, and unseen-domain evaluations, CapFlow outperforms 20-iteration refinement baselines while reducing inference-time cost and latency, and reveals interpretable patterns of shared capability factors. The approach advances reliable orchestration of reasoning, verification, and repair in open-ended tasks by enabling transferable, data-driven modularity without iterative refinement at inference time.

Abstract

Automatically generating agentic workflows -- executable operator graphs or codes that orchestrate reasoning, verification, and repair -- has become a practical way to solve complex tasks beyond what single-pass LLM generation can reliably handle. Yet what constitutes a good workflow depends heavily on the task distribution and the available operators. Under domain shift, current systems typically rely on iterative workflow refinement to discover a feasible workflow from a large workflow space, incurring high iteration costs and yielding unstable, domain-specific behavior. In response, we internalize a decompose-recompose-decide mechanism into an open-source LLM for cross-domain workflow generation. To decompose, we learn a compact set of reusable workflow capabilities across diverse domains. To recompose, we map each input task to a sparse composition over these bases to generate a task-specific workflow in a single pass. To decide, we attribute the success or failure of workflow generation to counterfactual contributions from learned capabilities, thereby capturing which capabilities actually drive success by their marginal effects. Across stringent multi-domain, cross-domain, and unseen-domain evaluations, our 1-pass generator surpasses SOTA refinement baselines that consume 20 iterations, while substantially reducing generation latency and cost.

Learning to Compose for Cross-domain Agentic Workflow Generation

TL;DR

CapFlow addresses the challenge of cross-domain agentic workflow generation by learning a compact set of reusable workflow capability bases and a task-conditioned sparse composer to produce executable, task-specific workflows in a single pass. It introduces counterfactual capability attribution to align basis selection with causal factors of success, enabling controllable recomposition across domain shifts. Across multi-domain, cross-domain, and unseen-domain evaluations, CapFlow outperforms 20-iteration refinement baselines while reducing inference-time cost and latency, and reveals interpretable patterns of shared capability factors. The approach advances reliable orchestration of reasoning, verification, and repair in open-ended tasks by enabling transferable, data-driven modularity without iterative refinement at inference time.

Abstract

Automatically generating agentic workflows -- executable operator graphs or codes that orchestrate reasoning, verification, and repair -- has become a practical way to solve complex tasks beyond what single-pass LLM generation can reliably handle. Yet what constitutes a good workflow depends heavily on the task distribution and the available operators. Under domain shift, current systems typically rely on iterative workflow refinement to discover a feasible workflow from a large workflow space, incurring high iteration costs and yielding unstable, domain-specific behavior. In response, we internalize a decompose-recompose-decide mechanism into an open-source LLM for cross-domain workflow generation. To decompose, we learn a compact set of reusable workflow capabilities across diverse domains. To recompose, we map each input task to a sparse composition over these bases to generate a task-specific workflow in a single pass. To decide, we attribute the success or failure of workflow generation to counterfactual contributions from learned capabilities, thereby capturing which capabilities actually drive success by their marginal effects. Across stringent multi-domain, cross-domain, and unseen-domain evaluations, our 1-pass generator surpasses SOTA refinement baselines that consume 20 iterations, while substantially reducing generation latency and cost.
Paper Structure (23 sections, 21 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 23 sections, 21 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: Two workflow generation paradigms. Left: inference-time refinement resorts to trial-and-error in a large workflow space. Right: CapFlow internalizes "decompose-recompose-decide" into LLMs, enabling single-pass generation across domains.
  • Figure 2: Cross-domain agentic workflow analysis: (left: structural analysis) highest-success workflows per domain; (right: latent analysis) t-SNE visualization of tasks embedded by learned workflow capabilities.
  • Figure 3: Overview of workflow capability composition (CapFlow): the task-conditioned composer decomposes each query into a sparse mixture of reusable capability bases, steering the LLM toward successful workflows and away from failures.
  • Figure 4: Workflow generation trade-offs on HumanEval. Left: strong baselines are driven by stochastic refinement that incurs substantially higher evaluation cost for comparable gains. Right: refinement baselines improve with additional rounds but exhibit diminishing returns, whereas CapFlow achieves strong solve rates in a single generation pass.
  • Figure 5: Capability basis usage across domains. CapFlow maintains non-collapsed basis selection and exhibits meaningful cross-dataset overlap, supporting the intended "reusable bases" behavior.
  • ...and 1 more figures