A Content-Based Framework for Cybersecurity Refusal Decisions in Large Language Models

Noa Linder; Meirav Segal; Omer Antverg; Gil Gekker; Tomer Fichman; Omri Bodenheimer; Edan Maor; Omer Nevo

A Content-Based Framework for Cybersecurity Refusal Decisions in Large Language Models

Noa Linder, Meirav Segal, Omer Antverg, Gil Gekker, Tomer Fichman, Omri Bodenheimer, Edan Maor, Omer Nevo

TL;DR

This paper introduces a content-based framework for designing and auditing cyber refusal policies that makes offense-defense tradeoffs explicit, and demonstrates that this content-grounded approach resolves inconsistencies in current frontier model behavior and allows organizations to construct tunable, risk-aware refusal policies.

Abstract

Large language models and LLM-based agents are increasingly used for cybersecurity tasks that are inherently dual-use. Existing approaches to refusal, spanning academic policy frameworks and commercially deployed systems, often rely on broad topic-based bans or offensive-focused taxonomies. As a result, they can yield inconsistent decisions, over-restrict legitimate defenders, and behave brittlely under obfuscation or request segmentation. We argue that effective refusal requires explicitly modeling the trade-off between offensive risk and defensive benefit, rather than relying solely on intent or offensive classification. In this paper, we introduce a content-based framework for designing and auditing cyber refusal policies that makes offense-defense tradeoffs explicit. The framework characterizes requests along five dimensions: Offensive Action Contribution, Offensive Risk, Technical Complexity, Defensive Benefit, and Expected Frequency for Legitimate Users, grounded in the technical substance of the request rather than stated intent. We demonstrate that this content-grounded approach resolves inconsistencies in current frontier model behavior and allows organizations to construct tunable, risk-aware refusal policies.

A Content-Based Framework for Cybersecurity Refusal Decisions in Large Language Models

TL;DR

Abstract

Paper Structure (26 sections, 5 figures, 3 tables)

This paper contains 26 sections, 5 figures, 3 tables.

Introduction
Balancing Defensive Value Against Offensive Utility
Methodology
Quantifying Offensive Utility
Quantifying Defensive Value
The Framework
Offensive Action Contribution
Offensive Risk
Technical Complexity
Defensive Benefit
Expected Frequency for Legitimate Users
Framework Application
Related Work
Cybersecurity frameworks
Limitations and Future Work
...and 11 more sections

Figures (5)

Figure 1: Refused Prompt vs Complied Prompt. The <credentials> and <path> placeholders were filled with real values during testing.
Figure 2: Five-parameter framework for evaluating cybersecurity requests.
Figure 3: 2-parameter (highly restrictive) policy.
Figure 4: More permissive policy.
Figure 5: Less permissive policy.

A Content-Based Framework for Cybersecurity Refusal Decisions in Large Language Models

TL;DR

Abstract

A Content-Based Framework for Cybersecurity Refusal Decisions in Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (5)