Table of Contents
Fetching ...

FACTS About Building Retrieval Augmented Generation-based Chatbots

Rama Akkiraju, Anbang Xu, Deepak Bora, Tan Yu, Lu An, Vishal Seth, Aaditya Shukla, Pritam Gundecha, Hridhay Mehta, Ashwin Jha, Prithvi Raj, Abhinav Balasubramanian, Murali Maram, Guru Muthusamy, Shivakesh Reddy Annepally, Sidney Knowles, Min Du, Nick Burnett, Sean Javiya, Ashok Marannan, Mamta Kumari, Surbhi Jha, Ethan Dereszenski, Anupam Chakraborty, Subhash Ranjan, Amina Terfai, Anoop Surya, Tracey Mercer, Vinodh Kumar Thanigachalam, Tamar Bar, Sanjana Krishnan, Samy Kilaru, Jasmine Jaksic, Nave Algarici, Jacob Liberman, Joey Conway, Sonu Nayyar, Justin Boitano

TL;DR

The paper tackles building secure, enterprise-grade chatbots powered by Retrieval-Augmented Generation (RAG) to maintain up-to-date knowledge while respecting access controls. It proposes the FACTS framework—Freshness, Architectures, Cost, Testing, Security—and documents 15 RAG pipeline control points, illustrated through three NVIDIA NVBot deployments (NVInfo, NVHelp, Scout). It provides empirical comparisons of accuracy versus latency across large and small LLMs and discusses practical architectural decisions, testing practices, and security guardrails. The work offers a holistic, practitioner-focused blueprint for designing, evaluating, and securing enterprise chatbots, with implications for multi-bot orchestration and copilot-style integrations in real-world environments.

Abstract

Enterprise chatbots, powered by generative AI, are emerging as key applications to enhance employee productivity. Retrieval Augmented Generation (RAG), Large Language Models (LLMs), and orchestration frameworks like Langchain and Llamaindex are crucial for building these chatbots. However, creating effective enterprise chatbots is challenging and requires meticulous RAG pipeline engineering. This includes fine-tuning embeddings and LLMs, extracting documents from vector databases, rephrasing queries, reranking results, designing prompts, honoring document access controls, providing concise responses, including references, safeguarding personal information, and building orchestration agents. We present a framework for building RAG-based chatbots based on our experience with three NVIDIA chatbots: for IT/HR benefits, financial earnings, and general content. Our contributions are three-fold: introducing the FACTS framework (Freshness, Architectures, Cost, Testing, Security), presenting fifteen RAG pipeline control points, and providing empirical results on accuracy-latency tradeoffs between large and small LLMs. To the best of our knowledge, this is the first paper of its kind that provides a holistic view of the factors as well as solutions for building secure enterprise-grade chatbots."

FACTS About Building Retrieval Augmented Generation-based Chatbots

TL;DR

The paper tackles building secure, enterprise-grade chatbots powered by Retrieval-Augmented Generation (RAG) to maintain up-to-date knowledge while respecting access controls. It proposes the FACTS framework—Freshness, Architectures, Cost, Testing, Security—and documents 15 RAG pipeline control points, illustrated through three NVIDIA NVBot deployments (NVInfo, NVHelp, Scout). It provides empirical comparisons of accuracy versus latency across large and small LLMs and discusses practical architectural decisions, testing practices, and security guardrails. The work offers a holistic, practitioner-focused blueprint for designing, evaluating, and securing enterprise chatbots, with implications for multi-bot orchestration and copilot-style integrations in real-world environments.

Abstract

Enterprise chatbots, powered by generative AI, are emerging as key applications to enhance employee productivity. Retrieval Augmented Generation (RAG), Large Language Models (LLMs), and orchestration frameworks like Langchain and Llamaindex are crucial for building these chatbots. However, creating effective enterprise chatbots is challenging and requires meticulous RAG pipeline engineering. This includes fine-tuning embeddings and LLMs, extracting documents from vector databases, rephrasing queries, reranking results, designing prompts, honoring document access controls, providing concise responses, including references, safeguarding personal information, and building orchestration agents. We present a framework for building RAG-based chatbots based on our experience with three NVIDIA chatbots: for IT/HR benefits, financial earnings, and general content. Our contributions are three-fold: introducing the FACTS framework (Freshness, Architectures, Cost, Testing, Security), presenting fifteen RAG pipeline control points, and providing empirical results on accuracy-latency tradeoffs between large and small LLMs. To the best of our knowledge, this is the first paper of its kind that provides a holistic view of the factors as well as solutions for building secure enterprise-grade chatbots."
Paper Structure (12 sections, 7 figures, 1 table)

This paper contains 12 sections, 7 figures, 1 table.

Figures (7)

  • Figure 1: Control Points in a typical RAG pipeline when building Chatbots.
  • Figure 2: Agent architecture for handling complex queries
  • Figure 3: NVHelp answer quality and latency metrics comparison among different models
  • Figure 4: RAG control points, challenges, and remediations
  • Figure 5: Scout Bot: Multi-part query
  • ...and 2 more figures