Table of Contents
Fetching ...

Leveraging Fine-Tuned Retrieval-Augmented Generation with Long-Context Support: For 3GPP Standards

Omar Erak, Nouf Alabbasi, Omar Alhussein, Ismail Lotfi, Amr Hussein, Sami Muhaidat, Merouane Debbah

TL;DR

This work addresses the challenge that large language models struggle with telecom standards by deploying a fine-tuned Phi-2 RAG system as an on-edge oracle for 3GPP documents. It introduces forward-looking semantic chunking, a cross-encoder re-ranker, SelfExtend context expansion, and LoRA-based fine-tuning to enable efficient, multi-context information processing on resource-constrained devices. The approach achieves competitive accuracy, approaching the performance of much larger models like GPT-4o while maintaining edge-deployable efficiency, and demonstrates meaningful gains from re-ranking and extended context. Overall, it provides a reusable framework for agentic telecom tasks and outlines avenues for future improvements in embedding, structured data handling, and broader standardization tasks, with open-source potential to foster collaboration.

Abstract

Recent studies show that large language models (LLMs) struggle with technical standards in telecommunications. We propose a fine-tuned retrieval-augmented generation (RAG) system based on the Phi-2 small language model (SLM) to serve as an oracle for communication networks. Our developed system leverages forward-looking semantic chunking to adaptively determine parsing breakpoints based on embedding similarity, enabling effective processing of diverse document formats. To handle the challenge of multiple similar contexts in technical standards, we employ a re-ranking algorithm to prioritize the most relevant retrieved chunks. Recognizing the limitations of Phi-2's small context window, we implement a recent technique, namely SelfExtend, to expand the context window during inference, which not only boosts the performance but also can accommodate a wider range of user queries and design requirements from customers to specialized technicians. For fine-tuning, we utilize the low-rank adaptation (LoRA) technique to enhance computational efficiency during training and enable effective fine-tuning on small datasets. Our comprehensive experiments demonstrate substantial improvements over existing question-answering approaches in the telecom domain, achieving performance that exceeds larger language models such as GPT-4 (which is about 880 times larger in size). This work presents a novel approach to leveraging SLMs for communication networks, offering a balance of efficiency and performance. This work can serve as a foundation towards agentic language models for networks.

Leveraging Fine-Tuned Retrieval-Augmented Generation with Long-Context Support: For 3GPP Standards

TL;DR

This work addresses the challenge that large language models struggle with telecom standards by deploying a fine-tuned Phi-2 RAG system as an on-edge oracle for 3GPP documents. It introduces forward-looking semantic chunking, a cross-encoder re-ranker, SelfExtend context expansion, and LoRA-based fine-tuning to enable efficient, multi-context information processing on resource-constrained devices. The approach achieves competitive accuracy, approaching the performance of much larger models like GPT-4o while maintaining edge-deployable efficiency, and demonstrates meaningful gains from re-ranking and extended context. Overall, it provides a reusable framework for agentic telecom tasks and outlines avenues for future improvements in embedding, structured data handling, and broader standardization tasks, with open-source potential to foster collaboration.

Abstract

Recent studies show that large language models (LLMs) struggle with technical standards in telecommunications. We propose a fine-tuned retrieval-augmented generation (RAG) system based on the Phi-2 small language model (SLM) to serve as an oracle for communication networks. Our developed system leverages forward-looking semantic chunking to adaptively determine parsing breakpoints based on embedding similarity, enabling effective processing of diverse document formats. To handle the challenge of multiple similar contexts in technical standards, we employ a re-ranking algorithm to prioritize the most relevant retrieved chunks. Recognizing the limitations of Phi-2's small context window, we implement a recent technique, namely SelfExtend, to expand the context window during inference, which not only boosts the performance but also can accommodate a wider range of user queries and design requirements from customers to specialized technicians. For fine-tuning, we utilize the low-rank adaptation (LoRA) technique to enhance computational efficiency during training and enable effective fine-tuning on small datasets. Our comprehensive experiments demonstrate substantial improvements over existing question-answering approaches in the telecom domain, achieving performance that exceeds larger language models such as GPT-4 (which is about 880 times larger in size). This work presents a novel approach to leveraging SLMs for communication networks, offering a balance of efficiency and performance. This work can serve as a foundation towards agentic language models for networks.
Paper Structure (14 sections, 2 equations, 4 figures, 2 tables)

This paper contains 14 sections, 2 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of proposed RAG architecture with semantic chunking, extended context support, and fine-tuned Phi-2 SLM integration for 3GPP document processing.
  • Figure 2: Comparison of fixed-size chunking and semantic chunking applied to an excerpt from a 3GPP document.
  • Figure 3: Prompt structure that includes retrieved context and instructions.
  • Figure 4: Schematic illustration of the Low-Rank Adaptation (LoRA) technique for efficient fine-tuning of neural networks with low-rank matrices (e.g., LM).