Table of Contents
Fetching ...

LongSafety: Enhance Safety for Long-Context LLMs

Mianqiu Huang, Xiaoran Liu, Shaojun Zhou, Mozhi Zhang, Qipeng Guo, Linyang Li, Chenkun Tan, Yang Gao, Pengyu Wang, Linlin Li, Qun Liu, Yaqian Zhou, Xipeng Qiu, Xuanjing Huang

TL;DR

The paper tackles safety gaps in long-context LLMs by introducing LongSafety, a dedicated 17k-sample dataset (avg length 40.9k tokens) across 8 tasks, and LongSafetyBench, a 1k-sample, 10-task long-context safety benchmark (avg length 41.9k). It proposes three long-context safety scenarios (query harmful, partially harmful, fully harmful) and three scalable data-construction pipelines to generate training and evaluation data, followed by demonstrations across multiple LLMs. Empirical results show that fine-tuning with LongSafety improves long-context safety (and short-context safety) without sacrificing general capabilities, and that long-context safety cannot be achieved by simply combining short-context safety data with generic long-context alignment. The work also provides a leaderboard, analyzes generalization to longer contexts and OOD tasks, and discusses data efficiency and limitations, offering a practical path toward safer long-context LLM deployment.

Abstract

Recent advancements in model architectures and length extrapolation techniques have significantly extended the context length of large language models (LLMs), paving the way for their application in increasingly complex tasks. However, despite the growing capabilities of long-context LLMs, the safety issues in long-context scenarios remain underexplored. While safety alignment in short context has been widely studied, the safety concerns of long-context LLMs have not been adequately addressed. In this work, we introduce \textbf{LongSafety}, a comprehensive safety alignment dataset for long-context LLMs, containing 10 tasks and 17k samples, with an average length of 40.9k tokens. Our experiments demonstrate that training with LongSafety can enhance long-context safety performance while enhancing short-context safety and preserving general capabilities. Furthermore, we demonstrate that long-context safety does not equal long-context alignment with short-context safety data and LongSafety has generalizing capabilities in context length and long-context safety scenarios.

LongSafety: Enhance Safety for Long-Context LLMs

TL;DR

The paper tackles safety gaps in long-context LLMs by introducing LongSafety, a dedicated 17k-sample dataset (avg length 40.9k tokens) across 8 tasks, and LongSafetyBench, a 1k-sample, 10-task long-context safety benchmark (avg length 41.9k). It proposes three long-context safety scenarios (query harmful, partially harmful, fully harmful) and three scalable data-construction pipelines to generate training and evaluation data, followed by demonstrations across multiple LLMs. Empirical results show that fine-tuning with LongSafety improves long-context safety (and short-context safety) without sacrificing general capabilities, and that long-context safety cannot be achieved by simply combining short-context safety data with generic long-context alignment. The work also provides a leaderboard, analyzes generalization to longer contexts and OOD tasks, and discusses data efficiency and limitations, offering a practical path toward safer long-context LLM deployment.

Abstract

Recent advancements in model architectures and length extrapolation techniques have significantly extended the context length of large language models (LLMs), paving the way for their application in increasingly complex tasks. However, despite the growing capabilities of long-context LLMs, the safety issues in long-context scenarios remain underexplored. While safety alignment in short context has been widely studied, the safety concerns of long-context LLMs have not been adequately addressed. In this work, we introduce \textbf{LongSafety}, a comprehensive safety alignment dataset for long-context LLMs, containing 10 tasks and 17k samples, with an average length of 40.9k tokens. Our experiments demonstrate that training with LongSafety can enhance long-context safety performance while enhancing short-context safety and preserving general capabilities. Furthermore, we demonstrate that long-context safety does not equal long-context alignment with short-context safety data and LongSafety has generalizing capabilities in context length and long-context safety scenarios.

Paper Structure

This paper contains 65 sections, 4 figures, 14 tables.

Figures (4)

  • Figure 1: Test results on LongSafetyBench. LLMs fine-tuned with our LongSafety (LS) dataset show better safety performance in long-context scenarios. The test context length is set to 32k.
  • Figure 2: Three long-context safety scenarios, query harmful context, partially harmful context, and fully harmful context, with our corresponding data construction pipelines, and a sample in LongSafetyBench with four choices representing four possible LLM behaviors in long-context safety scenarios.
  • Figure 3: Left two figures are task distribution of LongSafety (\ref{['fig:ls_distribution']}) and LongSafetyBench (\ref{['fig:lsb_distribution']}) respectively. Green stands for query harmful, blue for partially harmful, and orange for fully harmful. The right figure is the length distribution of LongSafety and LongSafetyBench. The Y-axis stands for context length and the X-axis for proportion.
  • Figure 4: Safety performance on long and short context throughout the training process of LLaMA3.1-8B-Instruct and Qwen2.5-7B-Instruct fine-tuned with our proposed LongSafety dataset.