Table of Contents
Fetching ...

GLOW: Graph-Language Co-Reasoning for Agentic Workflow Performance Prediction

Wei Guan, Jian Cao, Jinyu Cai, Qiqi Cai, Jianqi Gao, See-Kiong Ng

TL;DR

<3-5 sentence high-level summary> GLOW addresses the costly evaluation bottleneck in automatic agentic workflow (AW) generation by predicting AW performance without executing them. It fuses a graph neural network (GNN) that encodes AW topology with a graph-oriented, instruction-tuned LLM that captures deep semantic reasoning, unified through a transformer-based fusion module and a contrastive loss to sharpen discriminative power. The approach achieves state-of-the-art prediction accuracy and ranking utility on FLORA-Bench and dramatically accelerates automatic AW generation (e.g., by 98.7% in AFLOW) with minimal performance loss. This work demonstrates the value of tightly integrating structure-aware and semantics-aware representations for complex, multi-agent task workflows.

Abstract

Agentic Workflows (AWs) have emerged as a promising paradigm for solving complex tasks. However, the scalability of automating their generation is severely constrained by the high cost and latency of execution-based evaluation. Existing AW performance prediction methods act as surrogates but fail to simultaneously capture the intricate topological dependencies and the deep semantic logic embedded in AWs. To address this limitation, we propose GLOW, a unified framework for AW performance prediction that combines the graph-structure modeling capabilities of GNNs with the reasoning power of LLMs. Specifically, we introduce a graph-oriented LLM, instruction-tuned on graph tasks, to extract topologically aware semantic features, which are fused with GNN-encoded structural representations. A contrastive alignment strategy further refines the latent space to distinguish high-quality AWs. Extensive experiments on FLORA-Bench show that GLOW outperforms state-of-the-art baselines in prediction accuracy and ranking utility.

GLOW: Graph-Language Co-Reasoning for Agentic Workflow Performance Prediction

TL;DR

<3-5 sentence high-level summary> GLOW addresses the costly evaluation bottleneck in automatic agentic workflow (AW) generation by predicting AW performance without executing them. It fuses a graph neural network (GNN) that encodes AW topology with a graph-oriented, instruction-tuned LLM that captures deep semantic reasoning, unified through a transformer-based fusion module and a contrastive loss to sharpen discriminative power. The approach achieves state-of-the-art prediction accuracy and ranking utility on FLORA-Bench and dramatically accelerates automatic AW generation (e.g., by 98.7% in AFLOW) with minimal performance loss. This work demonstrates the value of tightly integrating structure-aware and semantics-aware representations for complex, multi-agent task workflows.

Abstract

Agentic Workflows (AWs) have emerged as a promising paradigm for solving complex tasks. However, the scalability of automating their generation is severely constrained by the high cost and latency of execution-based evaluation. Existing AW performance prediction methods act as surrogates but fail to simultaneously capture the intricate topological dependencies and the deep semantic logic embedded in AWs. To address this limitation, we propose GLOW, a unified framework for AW performance prediction that combines the graph-structure modeling capabilities of GNNs with the reasoning power of LLMs. Specifically, we introduce a graph-oriented LLM, instruction-tuned on graph tasks, to extract topologically aware semantic features, which are fused with GNN-encoded structural representations. A contrastive alignment strategy further refines the latent space to distinguish high-quality AWs. Extensive experiments on FLORA-Bench show that GLOW outperforms state-of-the-art baselines in prediction accuracy and ranking utility.

Paper Structure

This paper contains 18 sections, 13 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: An illustrative example of an AW for code generation.
  • Figure 2: The architecture of the proposed GLOW. For AW, high-level semantic representations are derived from a graph-oriented LLM, while structural dependencies are captured by a GNN. The representation of task instruction $T$ is extracted using a sentence-BERT. These distinct representations are then projected into a unified latent space and aggregated through a representation fusion module to generate the predicted performance score.
  • Figure 3: The prompt template used to convert the AW into descriptive text. Node set $\mathcal{V}$ and prompt set $\mathcal{P}$ are organized into a dictionary mapping each node ID to its textual prompt, while the edge set ${\mathcal{E}}$ is converted into a list of (source, target) tuples.
  • Figure 4: Impact of hyperparameters '$\lambda$' and '$\alpha$' on model performance.
  • Figure 5: Comparison of time consumption and final AW performance across different AW evaluation methods in AFLOW.

Theorems & Definitions (1)

  • Definition 3.1: Agentic Workflow Performance Prediction