Table of Contents
Fetching ...

Performant LLM Agentic Framework for Conversational AI

Alex Casella, Wayne Wang

TL;DR

The paper addresses the challenge of deploying LLMs to navigate complex graph-based workflows in Conversational AI, where alignment errors and high latency arise from large context windows and planning overhead. It proposes the Performant Agentic Framework (PAF), which combines LLM reasoning with a vector-based node scoring mechanism to efficiently select the next node in a workflow. Basic PAF uses an LLM-as-Judge with a stepwise logic tree, while Optimized PAF introduces Vector-Based Node Search using embedding vectors and dot-product similarity to improve accuracy and reduce context. Experimental results show that Optimized PAF significantly outperforms baselines in semantic alignment and latency, demonstrating a scalable, production-ready approach for real-time agentic conversation in complex business environments.

Abstract

The rise of Agentic applications and automation in the Voice AI industry has led to an increased reliance on Large Language Models (LLMs) to navigate graph-based logic workflows composed of nodes and edges. However, existing methods face challenges such as alignment errors in complex workflows and hallucinations caused by excessive context size. To address these limitations, we introduce the Performant Agentic Framework (PAF), a novel system that assists LLMs in selecting appropriate nodes and executing actions in order when traversing complex graphs. PAF combines LLM-based reasoning with a mathematically grounded vector scoring mechanism, achieving both higher accuracy and reduced latency. Our approach dynamically balances strict adherence to predefined paths with flexible node jumps to handle various user inputs efficiently. Experiments demonstrate that PAF significantly outperforms baseline methods, paving the way for scalable, real-time Conversational AI systems in complex business environments.

Performant LLM Agentic Framework for Conversational AI

TL;DR

The paper addresses the challenge of deploying LLMs to navigate complex graph-based workflows in Conversational AI, where alignment errors and high latency arise from large context windows and planning overhead. It proposes the Performant Agentic Framework (PAF), which combines LLM reasoning with a vector-based node scoring mechanism to efficiently select the next node in a workflow. Basic PAF uses an LLM-as-Judge with a stepwise logic tree, while Optimized PAF introduces Vector-Based Node Search using embedding vectors and dot-product similarity to improve accuracy and reduce context. Experimental results show that Optimized PAF significantly outperforms baselines in semantic alignment and latency, demonstrating a scalable, production-ready approach for real-time agentic conversation in complex business environments.

Abstract

The rise of Agentic applications and automation in the Voice AI industry has led to an increased reliance on Large Language Models (LLMs) to navigate graph-based logic workflows composed of nodes and edges. However, existing methods face challenges such as alignment errors in complex workflows and hallucinations caused by excessive context size. To address these limitations, we introduce the Performant Agentic Framework (PAF), a novel system that assists LLMs in selecting appropriate nodes and executing actions in order when traversing complex graphs. PAF combines LLM-based reasoning with a mathematically grounded vector scoring mechanism, achieving both higher accuracy and reduced latency. Our approach dynamically balances strict adherence to predefined paths with flexible node jumps to handle various user inputs efficiently. Experiments demonstrate that PAF significantly outperforms baseline methods, paving the way for scalable, real-time Conversational AI systems in complex business environments.

Paper Structure

This paper contains 14 sections, 3 figures, 2 tables, 4 algorithms.

Figures (3)

  • Figure 1: Example illustration of an Agentic workflow for a healthcare call center use case, where the Agent needs to route calls based on different conditions.
  • Figure 2: Example illustration of an Agentic workflow for an internet service company helping callers troubleshoot connection issues. This workflow demonstrates how a more complex use case can have more conditions, nodes, and edges.
  • Figure 3: Distribution of Similarity Scores for the 3 frameworks (Naive=Baseline, Base=Basic PAF, Optimized=Optimized PAF).