Table of Contents
Fetching ...

TeleOracle: Fine-Tuned Retrieval-Augmented Generation with Long-Context Support for Network

Nouf Alabbasi, Omar Erak, Omar Alhussein, Ismail Lotfi, Sami Muhaidat, Merouane Debbah

TL;DR

TeleOracle, a telecom-specialized retrieval-augmented generation (RAG) system built on the Phi-2 small language model (SLM), is presented, showing that the model not only performs on par with the much larger LLMs but also achieves a higher faithfulness score, indicating higher adherence to the retrieved context.

Abstract

The telecommunications industry's rapid evolution demands intelligent systems capable of managing complex networks and adapting to emerging technologies. While large language models (LLMs) show promise in addressing these challenges, their deployment in telecom environments faces significant constraints due to edge device limitations and inconsistent documentation. To bridge this gap, we present TeleOracle, a telecom-specialized retrieval-augmented generation (RAG) system built on the Phi-2 small language model (SLM). To improve context retrieval, TeleOracle employs a two-stage retriever that incorporates semantic chunking and hybrid keyword and semantic search. Additionally, we expand the context window during inference to enhance the model's performance on open-ended queries. We also employ low-rank adaption for efficient fine-tuning. A thorough analysis of the model's performance indicates that our RAG framework is effective in aligning Phi-2 to the telecom domain in a downstream question and answer (QnA) task, achieving a 30% improvement in accuracy over the base Phi-2 model, reaching an overall accuracy of 81.20%. Notably, we show that our model not only performs on par with the much larger LLMs but also achieves a higher faithfulness score, indicating higher adherence to the retrieved context.

TeleOracle: Fine-Tuned Retrieval-Augmented Generation with Long-Context Support for Network

TL;DR

TeleOracle, a telecom-specialized retrieval-augmented generation (RAG) system built on the Phi-2 small language model (SLM), is presented, showing that the model not only performs on par with the much larger LLMs but also achieves a higher faithfulness score, indicating higher adherence to the retrieved context.

Abstract

The telecommunications industry's rapid evolution demands intelligent systems capable of managing complex networks and adapting to emerging technologies. While large language models (LLMs) show promise in addressing these challenges, their deployment in telecom environments faces significant constraints due to edge device limitations and inconsistent documentation. To bridge this gap, we present TeleOracle, a telecom-specialized retrieval-augmented generation (RAG) system built on the Phi-2 small language model (SLM). To improve context retrieval, TeleOracle employs a two-stage retriever that incorporates semantic chunking and hybrid keyword and semantic search. Additionally, we expand the context window during inference to enhance the model's performance on open-ended queries. We also employ low-rank adaption for efficient fine-tuning. A thorough analysis of the model's performance indicates that our RAG framework is effective in aligning Phi-2 to the telecom domain in a downstream question and answer (QnA) task, achieving a 30% improvement in accuracy over the base Phi-2 model, reaching an overall accuracy of 81.20%. Notably, we show that our model not only performs on par with the much larger LLMs but also achieves a higher faithfulness score, indicating higher adherence to the retrieved context.

Paper Structure

This paper contains 16 sections, 3 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Overview of TeleOracle architecture with semantic chunking, two-stage retrieval process, extended context support at inference time, and fine-tuned Phi-2 SLM integration with LoRA.
  • Figure 2: Comparison of fixed-size chunking and semantic chunking applied to an excerpt from a 3GPP document.
  • Figure 3: Comparison between vector search and hybrid search retrievers. The retrieved text for the question "What does the SA5 Work Item 'CH14-V8' focus on in the VoLTE roaming architecture?" is shown. The answer to the question is highlighted in blue.
  • Figure 4: Bi-Encoder vs. Cross-Encoder architectures. In the Bi-Encoder, query and chunk are encoded separately, and cosine similarity is used to compare embeddings. The Cross-Encoder jointly encodes the query and chunk, producing a classification score.
  • Figure 5: Example of how language models lose important context when input sequences exceed their fixed context window. As the sequence length grows, the model forgets key tokens like "SIP" and generates less relevant outputs, such as "hardware," due to its inability to retain words beyond the context window.
  • ...and 4 more figures