Table of Contents
Fetching ...

RAG based Question-Answering for Contextual Response Prediction System

Sriram Veturi, Saurabh Vaichal, Reshma Lal Jagadheesh, Nafis Irtiza Tripto, Nian Yan

TL;DR

This work tackles the challenge of grounding large language models in industry-specific knowledge to prevent hallucinations in customer-service contexts. It proposes an end-to-end Retrieval Augmented Generation framework for a Response Prediction System (RPS) deployed in a major retailer's contact centers, combining knowledge-base retrieval with LLM generation and agent history. Through extensive automated and human evaluations, it demonstrates that RAG-based LLMs achieve higher accuracy, alignment, and semantic coherence than a BERT-based baseline, while highlighting latency considerations for real-time deployment when using ReAct and advanced prompting. The findings support the practical viability of RAG-LLMs for knowledge-grounded agent assistance and suggest future work on broader LLM comparisons, query rewriting, and multi-source RAG integration.

Abstract

Large Language Models (LLMs) have shown versatility in various Natural Language Processing (NLP) tasks, including their potential as effective question-answering systems. However, to provide precise and relevant information in response to specific customer queries in industry settings, LLMs require access to a comprehensive knowledge base to avoid hallucinations. Retrieval Augmented Generation (RAG) emerges as a promising technique to address this challenge. Yet, developing an accurate question-answering framework for real-world applications using RAG entails several challenges: 1) data availability issues, 2) evaluating the quality of generated content, and 3) the costly nature of human evaluation. In this paper, we introduce an end-to-end framework that employs LLMs with RAG capabilities for industry use cases. Given a customer query, the proposed system retrieves relevant knowledge documents and leverages them, along with previous chat history, to generate response suggestions for customer service agents in the contact centers of a major retail company. Through comprehensive automated and human evaluations, we show that this solution outperforms the current BERT-based algorithms in accuracy and relevance. Our findings suggest that RAG-based LLMs can be an excellent support to human customer service representatives by lightening their workload.

RAG based Question-Answering for Contextual Response Prediction System

TL;DR

This work tackles the challenge of grounding large language models in industry-specific knowledge to prevent hallucinations in customer-service contexts. It proposes an end-to-end Retrieval Augmented Generation framework for a Response Prediction System (RPS) deployed in a major retailer's contact centers, combining knowledge-base retrieval with LLM generation and agent history. Through extensive automated and human evaluations, it demonstrates that RAG-based LLMs achieve higher accuracy, alignment, and semantic coherence than a BERT-based baseline, while highlighting latency considerations for real-time deployment when using ReAct and advanced prompting. The findings support the practical viability of RAG-LLMs for knowledge-grounded agent assistance and suggest future work on broader LLM comparisons, query rewriting, and multi-source RAG integration.

Abstract

Large Language Models (LLMs) have shown versatility in various Natural Language Processing (NLP) tasks, including their potential as effective question-answering systems. However, to provide precise and relevant information in response to specific customer queries in industry settings, LLMs require access to a comprehensive knowledge base to avoid hallucinations. Retrieval Augmented Generation (RAG) emerges as a promising technique to address this challenge. Yet, developing an accurate question-answering framework for real-world applications using RAG entails several challenges: 1) data availability issues, 2) evaluating the quality of generated content, and 3) the costly nature of human evaluation. In this paper, we introduce an end-to-end framework that employs LLMs with RAG capabilities for industry use cases. Given a customer query, the proposed system retrieves relevant knowledge documents and leverages them, along with previous chat history, to generate response suggestions for customer service agents in the contact centers of a major retail company. Through comprehensive automated and human evaluations, we show that this solution outperforms the current BERT-based algorithms in accuracy and relevance. Our findings suggest that RAG-based LLMs can be an excellent support to human customer service representatives by lightening their workload.
Paper Structure (35 sections, 4 figures, 11 tables)

This paper contains 35 sections, 4 figures, 11 tables.

Figures (4)

  • Figure 1: Example of the Response Prediction System. (A): For a valid query, the system retrieves the relevant document and proposes the appropriate responses from where the agent choose. (B): For an out-of-domain query, it guides the user to ask a relevant question.
  • Figure 2: Overview of the systems: (A) Agents respond to queries by manually searching for relevant documents, (B) The existing BERT-based system, which extracts relevant Q/A pairs from the given query and provides suggested answers to the agents, (C) The proposed RAG LLM system, where the LLM retrieves relevant KB articles (if necessary) and generates answers based on the query and the retrieved articles.
  • Figure 3: End to end RAG LLM framework
  • Figure 4: Cosine similarity score between query and ScaNN retrieved Document; retrieval threshold(0.7)