Chain-of-Authorization: Internalizing Authorization into Large Language Models via Reasoning Trajectories

Yang Li; Yule Liu; Xinlei He; Youjian Zhao; Qi Li; Ke Xu

Chain-of-Authorization: Internalizing Authorization into Large Language Models via Reasoning Trajectories

Yang Li, Yule Liu, Xinlei He, Youjian Zhao, Qi Li, Ke Xu

Abstract

Large Language Models (LLMs) have become core cognitive components in modern artificial intelligence (AI) systems, combining internal knowledge with external context to perform complex tasks. However, LLMs typically treat all accessible data indiscriminately, lacking inherent awareness of knowledge ownership and access boundaries. This deficiency heightens risks of sensitive data leakage and adversarial manipulation, potentially enabling unauthorized system access and severe security crises. Existing protection strategies rely on rigid, uniform defense that prevent dynamic authorization. Structural isolation methods faces scalability bottlenecks, while prompt guidance methods struggle with fine-grained permissions distinctions. Here, we propose the Chain-of-Authorization (CoA) framework, a secure training and reasoning paradigm that internalizes authorization logic into LLMs' core capabilities. Unlike passive external defneses, CoA restructures the model's information flow: it embeds permission context at input and requires generating explicit authorization reasoning trajectory that includes resource review, identity resolution, and decision-making stages before final response. Through supervised fine-tuning on data covering various authorization status, CoA integrates policy execution with task responses, making authorization a causal prerequisite for substantive responses. Extensive evaluations show that CoA not only maintains comparable utility in authorized scenarios but also overcomes the cognitive confusion when permissions mismatches. It exhibits high rejection rates against various unauthorized and adversarial access. This mechanism leverages LLMs' reasoning capability to perform dynamic authorization, using natural language understanding as a proactive security mechanism for deploying reliable LLMs in modern AI systems.

Chain-of-Authorization: Internalizing Authorization into Large Language Models via Reasoning Trajectories

Abstract

Paper Structure (24 sections, 11 equations, 8 figures, 4 tables)

This paper contains 24 sections, 11 equations, 8 figures, 4 tables.

Main
Internalizing Authorization via Reasoning Trajectories
Formalizing Authorization and the Challenge of Permission Mismatch
Internalizing Policy Function via Chain-of-Authorization
Mechanistic Design: Permission-Aware Input-Output Reformulation
Learning Paradigm: Internalizing Authorization via Supervised Fine-tuning
Evaluation
Model Utility on Authorized States
Authorization under Different Permissions
Robustness Against Adversarial Prompts
Visualization: What does CoA do?
Causal Analysis of CoA Trajectories
Ablation Study
Discussion
Methods
...and 9 more sections

Figures (8)

Figure 1: Overview of the Chain-of-Authorization (CoA) framework. The upper half illustrates the mechanistic design, in which permission labels are injected into the input, forcing the model to generate a structured authorization trajectory before the final response. The lower half depicts the learning paradigm, which fine-tunes the LLM with synthesized data reflecting three authorization states, ensuring the authorization policy is embedded within the LLM.
Figure 2: Chain-of-Authorization template for internal knowledge authorization. Here, [Prompt Permission] identifies the specific permission required to access the internal knowledge relevant to the prompt. [User Permission] retrieves the actual authorization associated with the user. [Decision] signifies the logical conclusion (e.g., match, mismatch or no permission). [Response] returns the final output generated for the user, which is conditioned on the preceding authorization decision.
Figure 3: Accuracy across five datasets under mismatch and public authorization states, where each circle represents the average accuracy of a method on a group of specific datasets across three backbone models, with the line mapping to its variance.
Figure 4: Refusal rate across five datasets under mismatched and public authorization state, where each circle represents the average refusal rate of a method on a group of specific dataset across three backbone models, with the line mapping to its variance.
Figure 5: Visualization of hidden states across different authorization scenarios.
...and 3 more figures

Chain-of-Authorization: Internalizing Authorization into Large Language Models via Reasoning Trajectories

Abstract

Chain-of-Authorization: Internalizing Authorization into Large Language Models via Reasoning Trajectories

Authors

Abstract

Table of Contents

Figures (8)