First Token Probability Guided RAG for Telecom Question Answering

Tingwei Chen; Jiayi Chen; Zijian Zhao; Haolong Chen; Liang Zhang; Guangxu Zhu

First Token Probability Guided RAG for Telecom Question Answering

Tingwei Chen, Jiayi Chen, Zijian Zhao, Haolong Chen, Liang Zhang, Guangxu Zhu

TL;DR

The paper tackles domain-specific MCQA in telecommunications and the hallucination risk of LLMs. It introduces a first-token probability guided RAG framework that uses the probability of the initial generated token as a confidence signal to dynamically adjust retrieved context and hyperparameters. Key components include chunk-based RAG with a fixed chunk size, chunk windowing, embedding-model choices, and FAISS indexing, along with two hyperparameter strategies: Threshold and Best Probability. Empirical results on a telecom MCQA dataset show up to $78.4\%$ accuracy, and a $26.8\%$ gain when combining diverse configurations, indicating reduced hallucinations and improved domain-specific MCQA performance.

Abstract

Large Language Models (LLMs) have garnered significant attention for their impressive general-purpose capabilities. For applications requiring intricate domain knowledge, Retrieval-Augmented Generation (RAG) has shown a distinct advantage in incorporating domain-specific information into LLMs. However, existing RAG research has not fully addressed the challenges of Multiple Choice Question Answering (MCQA) in telecommunications, particularly in terms of retrieval quality and mitigating hallucinations. To tackle these challenges, we propose a novel first token probability guided RAG framework. This framework leverages confidence scores to optimize key hyperparameters, such as chunk number and chunk window size, while dynamically adjusting the context. Our method starts by retrieving the most relevant chunks and generates a single token as the potential answer. The probabilities of all options are then normalized to serve as confidence scores, which guide the dynamic adjustment of the context. By iteratively optimizing the hyperparameters based on these confidence scores, we can continuously improve RAG performance. We conducted experiments to validate the effectiveness of our framework, demonstrating its potential to enhance accuracy in domain-specific MCQA tasks.

First Token Probability Guided RAG for Telecom Question Answering

TL;DR

accuracy, and a

gain when combining diverse configurations, indicating reduced hallucinations and improved domain-specific MCQA performance.

Abstract

Paper Structure (20 sections, 4 figures, 2 tables, 1 algorithm)

This paper contains 20 sections, 4 figures, 2 tables, 1 algorithm.

Introduction
Methodology
Language models
Domain-Specific Knowledge Embedding via RAG
Chunk Size
Chunk Window Size
Embedding Models
Indexing Strategy
Top-K Chunk Number
Prompt Engineering
First Token Probability
Hyperparameters Optimization
Threshold Method
Best Probability Method
Experiments
...and 5 more sections

Figures (4)

Figure 1: Illustration of the proposed framework of first token probability Guided RAG
Figure 2: Accuracy versus the number of top K chunks included in the prompt, comparing the performance of three different embedding models.
Figure 3: Probability distributions of correct and wrong predictions with RAG. The x-axis denotes the first token probability, and the y-axis shows the normalized density of these predictions.
Figure 4: Number of completed questions and corresponding accuracy as a function of chunk number. The rightmost bar represents the results where the highest probability answer is selected from all chunk numbers combined.

First Token Probability Guided RAG for Telecom Question Answering

TL;DR

Abstract

First Token Probability Guided RAG for Telecom Question Answering

Authors

TL;DR

Abstract

Table of Contents

Figures (4)