Table of Contents
Fetching ...

SafeCOMM: A Study on Safety Degradation in Fine-Tuned Telecom Large Language Models

Aladin Djuhera, Swanand Ravindra Kadhe, Farhan Ahmed, Syed Zawad, Fernando Koch, Walid Saad, Holger Boche

TL;DR

This work demonstrates that fine-tuning LLMs on telecom data can degrade safety alignment, affecting both supervised fine-tuning and continual pre-training. To address this, it introduces TeleHarm, a telecom-specific red-teaming benchmark, and evaluates three lightweight safety realignment methods—SafeInstruct, SafeLoRA, and SafeMERGE—across multiple models and telecom datasets. The results show that while SFT and CPT can raise harmful outputs, the proposed defenses restore safety with minimal loss in telecom task performance, yielding SafeCOMM-enabled models. The findings highlight the importance of explicit safety-focused instruction and post-hoc realignment for telecom-tuned LLMs with practical implications for deploying safer telecom AI systems.

Abstract

Fine-tuning large language models (LLMs) on telecom datasets is a common practice to adapt general-purpose models to the telecom domain. However, little attention has been paid to how this process may compromise model safety. Recent research has shown that even benign fine-tuning can degrade the safety alignment of LLMs, causing them to respond to harmful or unethical user queries. In this paper, we investigate this issue by fine-tuning LLMs on three representative telecom datasets and show that safety degrades even for light telecom domain adaptation. To this end, we introduce TeleHarm, the first telecom-specific red-teaming benchmark, which we use alongside established Direct-Harm and HexPhi datasets to systematically assess harmful behavior. We further extend our analysis to publicly available TeleLLMs that were continually pre-trained on large telecom corpora, revealing that safety alignment is severely lacking, primarily due to the omission of safety-focused instruction tuning. To address these issues, we evaluate three realignment defenses: SafeInstruct, SafeLoRA, SafeMERGE. We show that, across all settings, the proposed defenses can effectively restore safety without compromising telecom task performance, leading to Safe teleCOMMunication (SafeCOMM) models. Our work serves as both a diagnostic study and practical guide for safety realignment in telecom-tuned LLMs, underscoring the need for safety-aware instruction and fine-tuning in the telecom domain.

SafeCOMM: A Study on Safety Degradation in Fine-Tuned Telecom Large Language Models

TL;DR

This work demonstrates that fine-tuning LLMs on telecom data can degrade safety alignment, affecting both supervised fine-tuning and continual pre-training. To address this, it introduces TeleHarm, a telecom-specific red-teaming benchmark, and evaluates three lightweight safety realignment methods—SafeInstruct, SafeLoRA, and SafeMERGE—across multiple models and telecom datasets. The results show that while SFT and CPT can raise harmful outputs, the proposed defenses restore safety with minimal loss in telecom task performance, yielding SafeCOMM-enabled models. The findings highlight the importance of explicit safety-focused instruction and post-hoc realignment for telecom-tuned LLMs with practical implications for deploying safer telecom AI systems.

Abstract

Fine-tuning large language models (LLMs) on telecom datasets is a common practice to adapt general-purpose models to the telecom domain. However, little attention has been paid to how this process may compromise model safety. Recent research has shown that even benign fine-tuning can degrade the safety alignment of LLMs, causing them to respond to harmful or unethical user queries. In this paper, we investigate this issue by fine-tuning LLMs on three representative telecom datasets and show that safety degrades even for light telecom domain adaptation. To this end, we introduce TeleHarm, the first telecom-specific red-teaming benchmark, which we use alongside established Direct-Harm and HexPhi datasets to systematically assess harmful behavior. We further extend our analysis to publicly available TeleLLMs that were continually pre-trained on large telecom corpora, revealing that safety alignment is severely lacking, primarily due to the omission of safety-focused instruction tuning. To address these issues, we evaluate three realignment defenses: SafeInstruct, SafeLoRA, SafeMERGE. We show that, across all settings, the proposed defenses can effectively restore safety without compromising telecom task performance, leading to Safe teleCOMMunication (SafeCOMM) models. Our work serves as both a diagnostic study and practical guide for safety realignment in telecom-tuned LLMs, underscoring the need for safety-aware instruction and fine-tuning in the telecom domain.

Paper Structure

This paper contains 20 sections, 4 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: SFT and CPT with telecom data can compromise safety alignment unless safety considerations are explicitly included throughout the training.
  • Figure 2: Top five unsafe categories (from Llama-Guard’s 14 classes, S1–S14) for Llama-3-8B-Tele-it compared to the safe Llama-3-8B-Instruct counterpart.
  • Figure 3: Per-token KL divergence between telecom-tuned and unaligned Llama-3.1-8B models on unsafe TeleHarm prompts. Alignment appears shallow, affecting mainly the initial prefix tokens (e.g., "I cannot", "I apologize"), while later tokens remain close to the unaligned base. SafeMERGE matches the instruct model, showing that safety can be restored from the first tokens.