Table of Contents
Fetching ...

Role-Conditioned Refusals: Evaluating Access Control Reasoning in Large Language Models

Đorđe Klisura, Joseph Khoury, Ashish Kundu, Ram Krishnan, Anthony Rios

TL;DR

The paper addresses how to enforce RBAC-style access control in large language models that generate SQL queries, introducing a unified framework and RBAC-augmented datasets to evaluate role-conditioned refusals. It compares three enforcement strategies—direct prompting, a two-step generator–verifier pipeline, and LoRA-based fine-tuning—across multiple model families. The key finding is that explicit verification improves refusal precision and reduces false permits, while fine-tuning enhances utility by internalizing permission reasoning; longer, more complex policies degrade reliability across all methods. The work demonstrates the value of combining reasoning with structured access checks and releases the RBAC-augmented datasets and code to support broader evaluation and deployment in data-sensitive environments.

Abstract

Access control is a cornerstone of secure computing, yet large language models often blur role boundaries by producing unrestricted responses. We study role-conditioned refusals, focusing on the LLM's ability to adhere to access control policies by answering when authorized and refusing when not. To evaluate this behavior, we created a novel dataset that extends the Spider and BIRD text-to-SQL datasets, both of which have been modified with realistic PostgreSQL role-based policies at the table and column levels. We compare three designs: (i) zero or few-shot prompting, (ii) a two-step generator-verifier pipeline that checks SQL against policy, and (iii) LoRA fine-tuned models that learn permission awareness directly. Across multiple model families, explicit verification (the two-step framework) improves refusal precision and lowers false permits. At the same time, fine-tuning achieves a stronger balance between safety and utility (i.e., when considering execution accuracy). Longer and more complex policies consistently reduce the reliability of all systems. We release RBAC-augmented datasets and code.

Role-Conditioned Refusals: Evaluating Access Control Reasoning in Large Language Models

TL;DR

The paper addresses how to enforce RBAC-style access control in large language models that generate SQL queries, introducing a unified framework and RBAC-augmented datasets to evaluate role-conditioned refusals. It compares three enforcement strategies—direct prompting, a two-step generator–verifier pipeline, and LoRA-based fine-tuning—across multiple model families. The key finding is that explicit verification improves refusal precision and reduces false permits, while fine-tuning enhances utility by internalizing permission reasoning; longer, more complex policies degrade reliability across all methods. The work demonstrates the value of combining reasoning with structured access checks and releases the RBAC-augmented datasets and code to support broader evaluation and deployment in data-sensitive environments.

Abstract

Access control is a cornerstone of secure computing, yet large language models often blur role boundaries by producing unrestricted responses. We study role-conditioned refusals, focusing on the LLM's ability to adhere to access control policies by answering when authorized and refusing when not. To evaluate this behavior, we created a novel dataset that extends the Spider and BIRD text-to-SQL datasets, both of which have been modified with realistic PostgreSQL role-based policies at the table and column levels. We compare three designs: (i) zero or few-shot prompting, (ii) a two-step generator-verifier pipeline that checks SQL against policy, and (iii) LoRA fine-tuned models that learn permission awareness directly. Across multiple model families, explicit verification (the two-step framework) improves refusal precision and lowers false permits. At the same time, fine-tuning achieves a stronger balance between safety and utility (i.e., when considering execution accuracy). Longer and more complex policies consistently reduce the reliability of all systems. We release RBAC-augmented datasets and code.

Paper Structure

This paper contains 38 sections, 2 equations, 5 figures, 12 tables.

Figures (5)

  • Figure 1: Illustration of role-conditioned refusals. An authorized clinician receives the correct response, while an office clerk is denied access to the same query.
  • Figure 2: Overview of our access-control evaluation framework for LLMs. We extend Spider and BIRD with role-based policies defining four levels of visibility, from full to minimal access. Models are then tested under three setups: a single-step decision, a two-step generator–verifier pipeline, and a fine-tuned permission-aware model.
  • Figure 3: Effect of access-policy length on refusal performance (BIRD). Bars show average $F_1$ scores across three experimental settings: Setting 1 (GPT few-shot), Setting 2 (GPT$\rightarrow$GPT zero-shot), and Setting 3 (fine-tuned Mistral).
  • Figure 4: Verifier-swap ablation (Setting 2). Precision--recall trade-off on the Spider dataset.
  • Figure 5: Failure example where the single-step model (Setting 1) incorrectly returns a query accessing restricted data, while the verifier in Setting 2 correctly identifies the violation and denies it.