GLOW: Graph-Language Co-Reasoning for Agentic Workflow Performance Prediction
Wei Guan, Jian Cao, Jinyu Cai, Qiqi Cai, Jianqi Gao, See-Kiong Ng
TL;DR
<3-5 sentence high-level summary> GLOW addresses the costly evaluation bottleneck in automatic agentic workflow (AW) generation by predicting AW performance without executing them. It fuses a graph neural network (GNN) that encodes AW topology with a graph-oriented, instruction-tuned LLM that captures deep semantic reasoning, unified through a transformer-based fusion module and a contrastive loss to sharpen discriminative power. The approach achieves state-of-the-art prediction accuracy and ranking utility on FLORA-Bench and dramatically accelerates automatic AW generation (e.g., by 98.7% in AFLOW) with minimal performance loss. This work demonstrates the value of tightly integrating structure-aware and semantics-aware representations for complex, multi-agent task workflows.
Abstract
Agentic Workflows (AWs) have emerged as a promising paradigm for solving complex tasks. However, the scalability of automating their generation is severely constrained by the high cost and latency of execution-based evaluation. Existing AW performance prediction methods act as surrogates but fail to simultaneously capture the intricate topological dependencies and the deep semantic logic embedded in AWs. To address this limitation, we propose GLOW, a unified framework for AW performance prediction that combines the graph-structure modeling capabilities of GNNs with the reasoning power of LLMs. Specifically, we introduce a graph-oriented LLM, instruction-tuned on graph tasks, to extract topologically aware semantic features, which are fused with GNN-encoded structural representations. A contrastive alignment strategy further refines the latent space to distinguish high-quality AWs. Extensive experiments on FLORA-Bench show that GLOW outperforms state-of-the-art baselines in prediction accuracy and ranking utility.
