Table of Contents
Fetching ...

GNNs as Predictors of Agentic Workflow Performances

Yuanshuo Zhang, Yuchen Hou, Bohan Tang, Shuo Chen, Muhan Zhang, Xiaowen Dong, Siheng Chen

TL;DR

This work tackles the high computational cost of evaluating LLM-driven agentic workflows by introducing FLOW-GNN, a graph neural predictor that operates on workflows modeled as DAGs to forecast performance without executing the full system. It also presents FLORA-Bench, a large-scale benchmark with 600k workflow-task pairs across coding, mathematics, and reasoning, enabling rigorous evaluation using accuracy and utility. In-domain results show GNNs achieve strong predictive performance (average accuracy ≈ 0.78, utility ≈ 0.72) and robustness across several LLMs, while cross-domain generalization remains challenging. Crucially, using FLOW-GNN as a predictor accelerates optimization cycles by about 125x with modest performance loss, illustrating a practical path toward prediction-driven agentic workflow optimization and expanding the utility of GNNs in complex, multi-agent AI systems.

Abstract

Agentic workflows invoked by Large Language Models (LLMs) have achieved remarkable success in handling complex tasks. However, optimizing such workflows is costly and inefficient in real-world applications due to extensive invocations of LLMs. To fill this gap, this position paper formulates agentic workflows as computational graphs and advocates Graph Neural Networks (GNNs) as efficient predictors of agentic workflow performances, avoiding repeated LLM invocations for evaluation. To empirically ground this position, we construct FLORA-Bench, a unified platform for benchmarking GNNs for predicting agentic workflow performances. With extensive experiments, we arrive at the following conclusion: GNNs are simple yet effective predictors. This conclusion supports new applications of GNNs and a novel direction towards automating agentic workflow optimization. All codes, models, and data are available at https://github.com/youngsoul0731/Flora-Bench.

GNNs as Predictors of Agentic Workflow Performances

TL;DR

This work tackles the high computational cost of evaluating LLM-driven agentic workflows by introducing FLOW-GNN, a graph neural predictor that operates on workflows modeled as DAGs to forecast performance without executing the full system. It also presents FLORA-Bench, a large-scale benchmark with 600k workflow-task pairs across coding, mathematics, and reasoning, enabling rigorous evaluation using accuracy and utility. In-domain results show GNNs achieve strong predictive performance (average accuracy ≈ 0.78, utility ≈ 0.72) and robustness across several LLMs, while cross-domain generalization remains challenging. Crucially, using FLOW-GNN as a predictor accelerates optimization cycles by about 125x with modest performance loss, illustrating a practical path toward prediction-driven agentic workflow optimization and expanding the utility of GNNs in complex, multi-agent AI systems.

Abstract

Agentic workflows invoked by Large Language Models (LLMs) have achieved remarkable success in handling complex tasks. However, optimizing such workflows is costly and inefficient in real-world applications due to extensive invocations of LLMs. To fill this gap, this position paper formulates agentic workflows as computational graphs and advocates Graph Neural Networks (GNNs) as efficient predictors of agentic workflow performances, avoiding repeated LLM invocations for evaluation. To empirically ground this position, we construct FLORA-Bench, a unified platform for benchmarking GNNs for predicting agentic workflow performances. With extensive experiments, we arrive at the following conclusion: GNNs are simple yet effective predictors. This conclusion supports new applications of GNNs and a novel direction towards automating agentic workflow optimization. All codes, models, and data are available at https://github.com/youngsoul0731/Flora-Bench.

Paper Structure

This paper contains 28 sections, 11 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: An illustration of an agentic workflow and its corresponding computational graph. Nodes are agents handling subtasks and edges are the task dependencies.
  • Figure 2: Architecture of workflow graph neural network, a framework for predicting agentic workflow performances with GNNs.
  • Figure 3: Pipeline of benchmark construction.
  • Figure 4: Workflow Extraction from $AFLOW$ in FLORA-Bench.
  • Figure 5: GCN as the predictor indeed benefits the optimization of the agentic workflow in terms of both effectiveness and efficiency.
  • ...and 6 more figures