Table of Contents
Fetching ...

A Practical Approach for Building Production-Grade Conversational Agents with Workflow Graphs

Chiwan Park, Wonjun Jang, Daeryong Kim, Aelim Ahn, Kichang Yang, Woosung Hwang, Jihyeon Roh, Hyerin Park, Hyosun Wang, Min Seok Kim, Jihoon Kang

TL;DR

The paper tackles the challenge of deploying production-grade conversational agents by reconciling flexible LLM behavior with strict domain constraints through a graph-based workflow (DAG) framework. It introduces per-node prompts, constrained decoding, and history manipulation, complemented by a data-collection pipeline using a prototype agent and loss-masked fine-tuning to preserve node-specific guidance. The authors demonstrate a real-world e-commerce application, achieving a 52% improvement in task accuracy and a 50% improvement in format adherence over baselines and GPT-4o, with the internal model even surpassing GPT-4o in several metrics. The framework offers a scalable, controllable approach for building reliable AI agents suitable for mobile messaging and other constraint-heavy domains. The work bridges research and practice, enabling production-ready AI assistants with robust tooling, evaluation, and governance considerations.

Abstract

The advancement of Large Language Models (LLMs) has led to significant improvements in various service domains, including search, recommendation, and chatbot applications. However, applying state-of-the-art (SOTA) research to industrial settings presents challenges, as it requires maintaining flexible conversational abilities while also strictly complying with service-specific constraints. This can be seen as two conflicting requirements due to the probabilistic nature of LLMs. In this paper, we propose our approach to addressing this challenge and detail the strategies we employed to overcome their inherent limitations in real-world applications. We conduct a practical case study of a conversational agent designed for the e-commerce domain, detailing our implementation workflow and optimizations. Our findings provide insights into bridging the gap between academic research and real-world application, introducing a framework for developing scalable, controllable, and reliable AI-driven agents.

A Practical Approach for Building Production-Grade Conversational Agents with Workflow Graphs

TL;DR

The paper tackles the challenge of deploying production-grade conversational agents by reconciling flexible LLM behavior with strict domain constraints through a graph-based workflow (DAG) framework. It introduces per-node prompts, constrained decoding, and history manipulation, complemented by a data-collection pipeline using a prototype agent and loss-masked fine-tuning to preserve node-specific guidance. The authors demonstrate a real-world e-commerce application, achieving a 52% improvement in task accuracy and a 50% improvement in format adherence over baselines and GPT-4o, with the internal model even surpassing GPT-4o in several metrics. The framework offers a scalable, controllable approach for building reliable AI agents suitable for mobile messaging and other constraint-heavy domains. The work bridges research and practice, enabling production-ready AI assistants with robust tooling, evaluation, and governance considerations.

Abstract

The advancement of Large Language Models (LLMs) has led to significant improvements in various service domains, including search, recommendation, and chatbot applications. However, applying state-of-the-art (SOTA) research to industrial settings presents challenges, as it requires maintaining flexible conversational abilities while also strictly complying with service-specific constraints. This can be seen as two conflicting requirements due to the probabilistic nature of LLMs. In this paper, we propose our approach to addressing this challenge and detail the strategies we employed to overcome their inherent limitations in real-world applications. We conduct a practical case study of a conversational agent designed for the e-commerce domain, detailing our implementation workflow and optimizations. Our findings provide insights into bridging the gap between academic research and real-world application, introducing a framework for developing scalable, controllable, and reliable AI-driven agents.

Paper Structure

This paper contains 28 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: A mobile messenger conversation between a user and our e-commerce agent. The first two turns require external tool calls to respond without hallucination. There are also output format constraints to make the responses readable in a mobile environment, such as emoji bullets.
  • Figure 2: An example workflow graph. Each LLM calling node (green colored) has its system prompt and a custom routine (modify_history) to manipulate conversation histories. The tool nodes (pink striped) are used to call pre-defined external tools and have the schemas for input and output. For clarity, we only show nodes related to gift recommendations and omit some content of the system prompts, including few-shot examples of the responses.
  • Figure 3: Evaluation Prompt for Task Accuracy.
  • Figure 4: Evaluation Prompt for Response Quality.
  • Figure 5: Example use cases on AI Shopping Mate