Table of Contents
Fetching ...

Augmenting Compliance-Guaranteed Customer Service Chatbots: Context-Aware Knowledge Expansion with Large Language Models

Mengze Hong, Chen Jason Zhang, Di Jiang, Yuanqin He

TL;DR

This paper addresses the challenge of building compliant and verifiable customer-service chatbots by augmenting a retrieval-based knowledge base with Similar Question Generation (SQG). It introduces context-aware one-to-many generation and intention-enhanced conditioning, supported by training losses $\mathcal{L}_{ft}$ and $\mathcal{L}_{Intention}$, to broaden semantic coverage while preserving the source Q&A mappings. An optimization framework for dynamic demonstration selection and budget-constrained similar-question selection is developed, including a greedy algorithm with a $1 - 1/e$ approximation and proofs of NP-hardness and submodularity. Empirical results on a Chinese financial-domain dataset show substantial improvements in user satisfaction (over 92%) and diversity, with practical deployment benefits in production settings and valuable insights for deploying LLM-guided assistance in non-generative, hallucination-free systems.

Abstract

Retrieval-based chatbots leverage human-verified Q\&A knowledge to deliver accurate, verifiable responses, making them ideal for customer-centric applications where compliance with regulatory and operational standards is critical. To effectively handle diverse customer inquiries, augmenting the knowledge base with "similar questions" that retain semantic meaning while incorporating varied expressions is a cost-effective strategy. In this paper, we introduce the Similar Question Generation (SQG) task for LLM training and inference, proposing context-aware approaches to enable comprehensive semantic exploration and enhanced alignment with source question-answer relationships. We formulate optimization techniques for constructing in-context prompts and selecting an optimal subset of similar questions to expand chatbot knowledge under budget constraints. Both quantitative and human evaluations validate the effectiveness of these methods, achieving a 92% user satisfaction rate in a deployed chatbot system, reflecting an 18% improvement over the unaugmented baseline. These findings highlight the practical benefits of SQG and emphasize the potential of LLMs, not as direct chatbot interfaces, but in supporting non-generative systems for hallucination-free, compliance-guaranteed applications.

Augmenting Compliance-Guaranteed Customer Service Chatbots: Context-Aware Knowledge Expansion with Large Language Models

TL;DR

This paper addresses the challenge of building compliant and verifiable customer-service chatbots by augmenting a retrieval-based knowledge base with Similar Question Generation (SQG). It introduces context-aware one-to-many generation and intention-enhanced conditioning, supported by training losses and , to broaden semantic coverage while preserving the source Q&A mappings. An optimization framework for dynamic demonstration selection and budget-constrained similar-question selection is developed, including a greedy algorithm with a approximation and proofs of NP-hardness and submodularity. Empirical results on a Chinese financial-domain dataset show substantial improvements in user satisfaction (over 92%) and diversity, with practical deployment benefits in production settings and valuable insights for deploying LLM-guided assistance in non-generative, hallucination-free systems.

Abstract

Retrieval-based chatbots leverage human-verified Q\&A knowledge to deliver accurate, verifiable responses, making them ideal for customer-centric applications where compliance with regulatory and operational standards is critical. To effectively handle diverse customer inquiries, augmenting the knowledge base with "similar questions" that retain semantic meaning while incorporating varied expressions is a cost-effective strategy. In this paper, we introduce the Similar Question Generation (SQG) task for LLM training and inference, proposing context-aware approaches to enable comprehensive semantic exploration and enhanced alignment with source question-answer relationships. We formulate optimization techniques for constructing in-context prompts and selecting an optimal subset of similar questions to expand chatbot knowledge under budget constraints. Both quantitative and human evaluations validate the effectiveness of these methods, achieving a 92% user satisfaction rate in a deployed chatbot system, reflecting an 18% improvement over the unaugmented baseline. These findings highlight the practical benefits of SQG and emphasize the potential of LLMs, not as direct chatbot interfaces, but in supporting non-generative systems for hallucination-free, compliance-guaranteed applications.

Paper Structure

This paper contains 38 sections, 2 theorems, 19 equations, 3 figures, 6 tables, 2 algorithms.

Key Result

Theorem D.1

The problem of selecting a subset $S \subseteq Q^{*}$ to maximize the sum of pairwise distances subject to the budget constraint $\sum_{q \in S} \text{cost}(q) \leq B$, is NP-hard.

Figures (3)

  • Figure 1: Schematic overview of a compliance-guaranteed chatbot with a predefined knowledge base for Match-and-Respond. The yellow region highlights the questions augmented by the similar question generation.
  • Figure 2: Illustration of the generated questions in semantic space with respect to the source question and the corresponding answer. The blue region represents the desired semantic space surrounding the source question. (a) Standard one-to-one objective: generated questions often either truncate or fall outside this desired region. (b) Intent-Enhanced Batch Generation: the green region indicates the expanded exploration region that meets the semantic consistency of the source QA pair.
  • Figure 3: Performance comparison of similar question generation methods with varying number of questions.

Theorems & Definitions (5)

  • Theorem D.1
  • proof
  • Theorem D.2
  • Definition 1: Submodularity
  • proof