Table of Contents
Fetching ...

Agentic Predictor: Performance Prediction for Agentic Workflows via Multi-View Encoding

Patara Trirat, Wonyong Jeong, Sung Ju Hwang

TL;DR

Agentic Predictor addresses the expensive evaluation bottleneck in designing LLM-based agentic workflows by learning a predictive model over multi-view representations of workflows. It combines graph, code, and prompt encodings with cross-domain unsupervised pretraining to produce robust embeddings, then trains a lightweight predictor on limited labeled data to guide a predictor-based search. Across code generation, math, and reasoning domains, it achieves state-of-the-art predictive accuracy (up to $84.38\%$) and higher workflow utility (up to $81.88\%$), while enabling substantial reductions in costly evaluations with pretraining enhancing performance in low-label regimes (up to $12.12\%$ accuracy and $15.16\%$ utility improvements reported). The work demonstrates the practical impact of representation-focused predictors for efficiently navigating heterogeneous agentic workflows and sets the stage for future multi-objective and human-in-the-loop extensions.

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks, but optimizing LLM-based agentic systems remains challenging due to the vast search space of agent configurations, prompting strategies, and communication patterns. Existing approaches often rely on heuristic-based tuning or exhaustive evaluation, which can be computationally expensive and suboptimal. This paper proposes Agentic Predictor, a lightweight predictor for efficient agentic workflow evaluation. Agentic Predictor is equipped with a multi-view workflow encoding technique that leverages multi-view representation learning of agentic systems by incorporating code architecture, textual prompts, and interaction graph features. To achieve high predictive accuracy while significantly reducing the number of required workflow evaluations for training a predictor, Agentic Predictor employs cross-domain unsupervised pretraining. By learning to approximate task success rates, Agentic Predictor enables fast and accurate selection of optimal agentic workflow configurations for a given task, significantly reducing the need for expensive trial-and-error evaluations. Experiments on a carefully curated benchmark spanning three domains show that our predictor outperforms state-of-the-art methods in both predictive accuracy and workflow utility, highlighting the potential of performance predictors in streamlining the design of LLM-based agentic workflows.

Agentic Predictor: Performance Prediction for Agentic Workflows via Multi-View Encoding

TL;DR

Agentic Predictor addresses the expensive evaluation bottleneck in designing LLM-based agentic workflows by learning a predictive model over multi-view representations of workflows. It combines graph, code, and prompt encodings with cross-domain unsupervised pretraining to produce robust embeddings, then trains a lightweight predictor on limited labeled data to guide a predictor-based search. Across code generation, math, and reasoning domains, it achieves state-of-the-art predictive accuracy (up to ) and higher workflow utility (up to ), while enabling substantial reductions in costly evaluations with pretraining enhancing performance in low-label regimes (up to accuracy and utility improvements reported). The work demonstrates the practical impact of representation-focused predictors for efficiently navigating heterogeneous agentic workflows and sets the stage for future multi-objective and human-in-the-loop extensions.

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks, but optimizing LLM-based agentic systems remains challenging due to the vast search space of agent configurations, prompting strategies, and communication patterns. Existing approaches often rely on heuristic-based tuning or exhaustive evaluation, which can be computationally expensive and suboptimal. This paper proposes Agentic Predictor, a lightweight predictor for efficient agentic workflow evaluation. Agentic Predictor is equipped with a multi-view workflow encoding technique that leverages multi-view representation learning of agentic systems by incorporating code architecture, textual prompts, and interaction graph features. To achieve high predictive accuracy while significantly reducing the number of required workflow evaluations for training a predictor, Agentic Predictor employs cross-domain unsupervised pretraining. By learning to approximate task success rates, Agentic Predictor enables fast and accurate selection of optimal agentic workflow configurations for a given task, significantly reducing the need for expensive trial-and-error evaluations. Experiments on a carefully curated benchmark spanning three domains show that our predictor outperforms state-of-the-art methods in both predictive accuracy and workflow utility, highlighting the potential of performance predictors in streamlining the design of LLM-based agentic workflows.

Paper Structure

This paper contains 25 sections, 2 equations, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: Comparison between (a) execution-based and (b) prediction-based candidate evaluation for agentic workflow generation. Execution-based methods rely on costly runtime or LLM calls, while our prediction-based approach offers faster, scalable evaluation via a learned predictor.
  • Figure 2: Overview of our Agentic Predictor framework. A (a) multi-view workflow encoder is designed to encode a set of agentic workflows from graph, code, and prompt aspects into unified representations, which serve as features for training the predictor. In the (b) pretraining phase, the encoder learns these representations on unlabeled workflows spanning diverse tasks and domains, using cross-domain unsupervised pretraining objectives. In the (c) predictor-guided search phase, a performance predictor is trained on a small (workflow configuration, performance) dataset to classify configurations as pass or fail, and subsequently guides the search toward promising configurations.
  • Figure 3: Accuracy comparison between Agentic Predictor and the baselines across varying label ratios.
  • Figure 4: Comparison of accuracy (upper) and utility (lower) between Agentic Predictor and the baselines across varying label ratios.