LongSafety: Enhance Safety for Long-Context LLMs

Mianqiu Huang; Xiaoran Liu; Shaojun Zhou; Mozhi Zhang; Qipeng Guo; Linyang Li; Chenkun Tan; Yang Gao; Pengyu Wang; Linlin Li; Qun Liu; Yaqian Zhou; Xipeng Qiu; Xuanjing Huang

LongSafety: Enhance Safety for Long-Context LLMs

Mianqiu Huang, Xiaoran Liu, Shaojun Zhou, Mozhi Zhang, Qipeng Guo, Linyang Li, Chenkun Tan, Yang Gao, Pengyu Wang, Linlin Li, Qun Liu, Yaqian Zhou, Xipeng Qiu, Xuanjing Huang

TL;DR

The paper tackles safety gaps in long-context LLMs by introducing LongSafety, a dedicated 17k-sample dataset (avg length 40.9k tokens) across 8 tasks, and LongSafetyBench, a 1k-sample, 10-task long-context safety benchmark (avg length 41.9k). It proposes three long-context safety scenarios (query harmful, partially harmful, fully harmful) and three scalable data-construction pipelines to generate training and evaluation data, followed by demonstrations across multiple LLMs. Empirical results show that fine-tuning with LongSafety improves long-context safety (and short-context safety) without sacrificing general capabilities, and that long-context safety cannot be achieved by simply combining short-context safety data with generic long-context alignment. The work also provides a leaderboard, analyzes generalization to longer contexts and OOD tasks, and discusses data efficiency and limitations, offering a practical path toward safer long-context LLM deployment.

Abstract

Recent advancements in model architectures and length extrapolation techniques have significantly extended the context length of large language models (LLMs), paving the way for their application in increasingly complex tasks. However, despite the growing capabilities of long-context LLMs, the safety issues in long-context scenarios remain underexplored. While safety alignment in short context has been widely studied, the safety concerns of long-context LLMs have not been adequately addressed. In this work, we introduce \textbf{LongSafety}, a comprehensive safety alignment dataset for long-context LLMs, containing 10 tasks and 17k samples, with an average length of 40.9k tokens. Our experiments demonstrate that training with LongSafety can enhance long-context safety performance while enhancing short-context safety and preserving general capabilities. Furthermore, we demonstrate that long-context safety does not equal long-context alignment with short-context safety data and LongSafety has generalizing capabilities in context length and long-context safety scenarios.

LongSafety: Enhance Safety for Long-Context LLMs

TL;DR

Abstract

LongSafety: Enhance Safety for Long-Context LLMs

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)