Table of Contents
Fetching ...

RobustFlow: Towards Robust Agentic Workflow Generation

Shengxiang Xu, Jiayi Zhang, Shimin Di, Yuyu Luo, Liang Yao, Hanmo Liu, Jia Zhu, Fan Liu, Min-Ling Zhang

TL;DR

This work addresses the fragility of automated agentic workflow generation under semantically equivalent input variations. It introduces RobustFlow, a two-stage training framework combining instruction-augmented supervised fine-tuning with self-consistency preference optimization to produce canonical, robust workflows. A structure-aware evaluation suite and a large perturbed-task dataset (1,255 base tasks, 31,889 workflows) enable precise measurement of node-level and graph-level robustness, demonstrating substantial gains (70%–90% robustness) with only modest trade-offs in raw task performance. The findings highlight robustness as a crucial objective for workflow generators and point to future work on balancing robustness with execution cost and broader tool integration.

Abstract

The automated generation of agentic workflows is a promising frontier for enabling large language models (LLMs) to solve complex tasks. However, our investigation reveals that the robustness of agentic workflow remains a critical, unaddressed challenge. Current methods often generate wildly inconsistent workflows when provided with instructions that are semantically identical but differently phrased. This brittleness severely undermines their reliability and trustworthiness for real-world applications. To quantitatively diagnose this instability, we propose metrics based on nodal and topological similarity to evaluate workflow consistency against common semantic variations such as paraphrasing and noise injection. Subsequently, we further propose a novel training framework, RobustFlow, that leverages preference optimization to teach models invariance to instruction variations. By training on sets of synonymous task descriptions, RobustFlow boosts workflow robustness scores to 70\% - 90\%, which is a substantial improvement over existing approaches. The code is publicly available at https://github.com/DEFENSE-SEU/RobustFlow.

RobustFlow: Towards Robust Agentic Workflow Generation

TL;DR

This work addresses the fragility of automated agentic workflow generation under semantically equivalent input variations. It introduces RobustFlow, a two-stage training framework combining instruction-augmented supervised fine-tuning with self-consistency preference optimization to produce canonical, robust workflows. A structure-aware evaluation suite and a large perturbed-task dataset (1,255 base tasks, 31,889 workflows) enable precise measurement of node-level and graph-level robustness, demonstrating substantial gains (70%–90% robustness) with only modest trade-offs in raw task performance. The findings highlight robustness as a crucial objective for workflow generators and point to future work on balancing robustness with execution cost and broader tool integration.

Abstract

The automated generation of agentic workflows is a promising frontier for enabling large language models (LLMs) to solve complex tasks. However, our investigation reveals that the robustness of agentic workflow remains a critical, unaddressed challenge. Current methods often generate wildly inconsistent workflows when provided with instructions that are semantically identical but differently phrased. This brittleness severely undermines their reliability and trustworthiness for real-world applications. To quantitatively diagnose this instability, we propose metrics based on nodal and topological similarity to evaluate workflow consistency against common semantic variations such as paraphrasing and noise injection. Subsequently, we further propose a novel training framework, RobustFlow, that leverages preference optimization to teach models invariance to instruction variations. By training on sets of synonymous task descriptions, RobustFlow boosts workflow robustness scores to 70\% - 90\%, which is a substantial improvement over existing approaches. The code is publicly available at https://github.com/DEFENSE-SEU/RobustFlow.

Paper Structure

This paper contains 23 sections, 13 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Structure-aware robustness evaluation metrics. We align nodes between the reference and predicted workflows, then compute node-chain robustness via the longest increasing subsequence length $l$ on the aligned topological sequence and graph-structure robustness by comparing reachability on the aligned DAGs.
  • Figure 2: Overview of RobustFlow. RobustFlow first performs instruction-augmented supervised fine-tuning to mitigate the cold-start, then applies self-consistency preference optimization to enhance structural robustness and consistency.
  • Figure 3: Robustness of agentic workflow generation methods under perturbations on MBPP, DROP, and MATH. Colors in the legend denote methods. Dimensions: Req = Requirement Augmentation, Para = Paraphrasing, Lig/Mode/Hvy = Light/Moderate/Heavy noise.
  • Figure 4: Robustness trends of different methods under noise enhancement.
  • Figure 5: Robust performance on different datasets under different perturbations.
  • ...and 4 more figures