Table of Contents
Fetching ...

First Token Probability Guided RAG for Telecom Question Answering

Tingwei Chen, Jiayi Chen, Zijian Zhao, Haolong Chen, Liang Zhang, Guangxu Zhu

TL;DR

The paper tackles domain-specific MCQA in telecommunications and the hallucination risk of LLMs. It introduces a first-token probability guided RAG framework that uses the probability of the initial generated token as a confidence signal to dynamically adjust retrieved context and hyperparameters. Key components include chunk-based RAG with a fixed chunk size, chunk windowing, embedding-model choices, and FAISS indexing, along with two hyperparameter strategies: Threshold and Best Probability. Empirical results on a telecom MCQA dataset show up to $78.4\%$ accuracy, and a $26.8\%$ gain when combining diverse configurations, indicating reduced hallucinations and improved domain-specific MCQA performance.

Abstract

Large Language Models (LLMs) have garnered significant attention for their impressive general-purpose capabilities. For applications requiring intricate domain knowledge, Retrieval-Augmented Generation (RAG) has shown a distinct advantage in incorporating domain-specific information into LLMs. However, existing RAG research has not fully addressed the challenges of Multiple Choice Question Answering (MCQA) in telecommunications, particularly in terms of retrieval quality and mitigating hallucinations. To tackle these challenges, we propose a novel first token probability guided RAG framework. This framework leverages confidence scores to optimize key hyperparameters, such as chunk number and chunk window size, while dynamically adjusting the context. Our method starts by retrieving the most relevant chunks and generates a single token as the potential answer. The probabilities of all options are then normalized to serve as confidence scores, which guide the dynamic adjustment of the context. By iteratively optimizing the hyperparameters based on these confidence scores, we can continuously improve RAG performance. We conducted experiments to validate the effectiveness of our framework, demonstrating its potential to enhance accuracy in domain-specific MCQA tasks.

First Token Probability Guided RAG for Telecom Question Answering

TL;DR

The paper tackles domain-specific MCQA in telecommunications and the hallucination risk of LLMs. It introduces a first-token probability guided RAG framework that uses the probability of the initial generated token as a confidence signal to dynamically adjust retrieved context and hyperparameters. Key components include chunk-based RAG with a fixed chunk size, chunk windowing, embedding-model choices, and FAISS indexing, along with two hyperparameter strategies: Threshold and Best Probability. Empirical results on a telecom MCQA dataset show up to accuracy, and a gain when combining diverse configurations, indicating reduced hallucinations and improved domain-specific MCQA performance.

Abstract

Large Language Models (LLMs) have garnered significant attention for their impressive general-purpose capabilities. For applications requiring intricate domain knowledge, Retrieval-Augmented Generation (RAG) has shown a distinct advantage in incorporating domain-specific information into LLMs. However, existing RAG research has not fully addressed the challenges of Multiple Choice Question Answering (MCQA) in telecommunications, particularly in terms of retrieval quality and mitigating hallucinations. To tackle these challenges, we propose a novel first token probability guided RAG framework. This framework leverages confidence scores to optimize key hyperparameters, such as chunk number and chunk window size, while dynamically adjusting the context. Our method starts by retrieving the most relevant chunks and generates a single token as the potential answer. The probabilities of all options are then normalized to serve as confidence scores, which guide the dynamic adjustment of the context. By iteratively optimizing the hyperparameters based on these confidence scores, we can continuously improve RAG performance. We conducted experiments to validate the effectiveness of our framework, demonstrating its potential to enhance accuracy in domain-specific MCQA tasks.
Paper Structure (20 sections, 4 figures, 2 tables, 1 algorithm)

This paper contains 20 sections, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: Illustration of the proposed framework of first token probability Guided RAG
  • Figure 2: Accuracy versus the number of top K chunks included in the prompt, comparing the performance of three different embedding models.
  • Figure 3: Probability distributions of correct and wrong predictions with RAG. The x-axis denotes the first token probability, and the y-axis shows the normalized density of these predictions.
  • Figure 4: Number of completed questions and corresponding accuracy as a function of chunk number. The rightmost bar represents the results where the highest probability answer is selected from all chunk numbers combined.