LongSafety: Evaluating Long-Context Safety of Large Language Models

Yida Lu; Jiale Cheng; Zhexin Zhang; Shiyao Cui; Cunxiang Wang; Xiaotao Gu; Yuxiao Dong; Jie Tang; Hongning Wang; Minlie Huang

LongSafety: Evaluating Long-Context Safety of Large Language Models

Yida Lu, Jiale Cheng, Zhexin Zhang, Shiyao Cui, Cunxiang Wang, Xiaotao Gu, Yuxiao Dong, Jie Tang, Hongning Wang, Minlie Huang

TL;DR

LongSafety addresses the gap in evaluating safety for open-ended long-context tasks by introducing a comprehensive benchmark with 1,543 long-context instances (average length ~5,424 words) spanning 7 safety issues and 6 task types. It couples this with a novel multi-agent safety evaluator (risk analyzer, context summarizer, safety judge) that achieves 92% accuracy on a test set, enabling robust safety judgments across 16 LLMs. Key findings reveal that most models have SR_long below 55%, and strong short-context safety does not guarantee long-context safety, with generation- and sensitive-topic-related risks being especially challenging. The work highlights that relevant contextual content and longer inputs exacerbate safety risks, and provides data, metrics, and methodology to guide future improvements in long-context safety, including scalable data collection and specialized evaluators.

Abstract

As Large Language Models (LLMs) continue to advance in understanding and generating long sequences, new safety concerns have been introduced through the long context. However, the safety of LLMs in long-context tasks remains under-explored, leaving a significant gap in both evaluation and improvement of their safety. To address this, we introduce LongSafety, the first comprehensive benchmark specifically designed to evaluate LLM safety in open-ended long-context tasks. LongSafety encompasses 7 categories of safety issues and 6 user-oriented long-context tasks, with a total of 1,543 test cases, averaging 5,424 words per context. Our evaluation towards 16 representative LLMs reveals significant safety vulnerabilities, with most models achieving safety rates below 55%. Our findings also indicate that strong safety performance in short-context scenarios does not necessarily correlate with safety in long-context tasks, emphasizing the unique challenges and urgency of improving long-context safety. Moreover, through extensive analysis, we identify challenging safety issues and task types for long-context models. Furthermore, we find that relevant context and extended input sequences can exacerbate safety risks in long-context scenarios, highlighting the critical need for ongoing attention to long-context safety challenges. Our code and data are available at https://github.com/thu-coai/LongSafety.

LongSafety: Evaluating Long-Context Safety of Large Language Models

TL;DR

Abstract

LongSafety: Evaluating Long-Context Safety of Large Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)