Performant LLM Agentic Framework for Conversational AI
Alex Casella, Wayne Wang
TL;DR
The paper addresses the challenge of deploying LLMs to navigate complex graph-based workflows in Conversational AI, where alignment errors and high latency arise from large context windows and planning overhead. It proposes the Performant Agentic Framework (PAF), which combines LLM reasoning with a vector-based node scoring mechanism to efficiently select the next node in a workflow. Basic PAF uses an LLM-as-Judge with a stepwise logic tree, while Optimized PAF introduces Vector-Based Node Search using embedding vectors and dot-product similarity to improve accuracy and reduce context. Experimental results show that Optimized PAF significantly outperforms baselines in semantic alignment and latency, demonstrating a scalable, production-ready approach for real-time agentic conversation in complex business environments.
Abstract
The rise of Agentic applications and automation in the Voice AI industry has led to an increased reliance on Large Language Models (LLMs) to navigate graph-based logic workflows composed of nodes and edges. However, existing methods face challenges such as alignment errors in complex workflows and hallucinations caused by excessive context size. To address these limitations, we introduce the Performant Agentic Framework (PAF), a novel system that assists LLMs in selecting appropriate nodes and executing actions in order when traversing complex graphs. PAF combines LLM-based reasoning with a mathematically grounded vector scoring mechanism, achieving both higher accuracy and reduced latency. Our approach dynamically balances strict adherence to predefined paths with flexible node jumps to handle various user inputs efficiently. Experiments demonstrate that PAF significantly outperforms baseline methods, paving the way for scalable, real-time Conversational AI systems in complex business environments.
