Table of Contents
Fetching ...

Extending the Formalism and Theoretical Foundations of Cryptography to AI

Federico Villa, F. Betül Durak, Tadayoshi Kohno, Tapdig Maharramli, Franziska Roesner

TL;DR

This work develops a formal treatment of agentic access control by defining an AIOracle algorithmically and introducing a security-game framework that captures completeness (in the absence of an adversary) and adversarial robustness, and unifies confidentiality, integrity, and availability within a single model.

Abstract

Recent progress in (Large) Language Models (LMs) has enabled the development of autonomous LM-based agents capable of executing complex tasks with minimal supervision. These agents have started to be integrated into systems with significant autonomy and authority. The security community has been studying their security. One emerging direction to mitigate security risks is to constrain agent behaviours via access control and permissioning mechanisms. Existing permissioning proposals, however, remain difficult to compare due to the absence of a shared formal foundation. This work provides such a foundation. We first systematize the landscape by constructing an attack taxonomy tailored to language models, the computational primitives of agentic systems. We then develop a formal treatment of agentic access control by defining an AIOracle algorithmically and introducing a security-game framework that captures completeness (in the absence of an adversary) and adversarial robustness. Our security game unifies confidentiality, integrity, and availability within a single model. Using this framework, we show that existing approaches to confidentiality of training data fundamentally conflict with completeness. Finally, we formalize a modular decomposition of helpfulness and harmlessness objectives and prove its soundness, in order to enable principled reasoning about the security of agentic system designs. Our studies suggests that if we were to design a secure system with measurable security, then we might want to use a modular approach to break the problem into sub-problems and let the composition on different modules complete the design. Our studies show that this natural approach with the relevant formalism is needed to prove security reductions.

Extending the Formalism and Theoretical Foundations of Cryptography to AI

TL;DR

This work develops a formal treatment of agentic access control by defining an AIOracle algorithmically and introducing a security-game framework that captures completeness (in the absence of an adversary) and adversarial robustness, and unifies confidentiality, integrity, and availability within a single model.

Abstract

Recent progress in (Large) Language Models (LMs) has enabled the development of autonomous LM-based agents capable of executing complex tasks with minimal supervision. These agents have started to be integrated into systems with significant autonomy and authority. The security community has been studying their security. One emerging direction to mitigate security risks is to constrain agent behaviours via access control and permissioning mechanisms. Existing permissioning proposals, however, remain difficult to compare due to the absence of a shared formal foundation. This work provides such a foundation. We first systematize the landscape by constructing an attack taxonomy tailored to language models, the computational primitives of agentic systems. We then develop a formal treatment of agentic access control by defining an AIOracle algorithmically and introducing a security-game framework that captures completeness (in the absence of an adversary) and adversarial robustness. Our security game unifies confidentiality, integrity, and availability within a single model. Using this framework, we show that existing approaches to confidentiality of training data fundamentally conflict with completeness. Finally, we formalize a modular decomposition of helpfulness and harmlessness objectives and prove its soundness, in order to enable principled reasoning about the security of agentic system designs. Our studies suggests that if we were to design a secure system with measurable security, then we might want to use a modular approach to break the problem into sub-problems and let the composition on different modules complete the design. Our studies show that this natural approach with the relevant formalism is needed to prove security reductions.
Paper Structure (47 sections, 1 theorem, 4 equations, 8 figures, 2 tables, 7 algorithms)

This paper contains 47 sections, 1 theorem, 4 equations, 8 figures, 2 tables, 7 algorithms.

Key Result

Theorem 1

If BAIO and CAIO are complete and $\textsf{ATK}$-$\psi$-secure AIOracles for $\Psi = \Psi_1$ and $\Psi = \Psi_2$ and for $\phi'_1$ and $\phi_2$ respectively, then the dual construction is complete and $\textsf{ATK}$-$\psi$- AIOracle for $\Psi = \Psi_1 \vee \Psi_2$ and for $\phi = \phi_1 \wedge \phi_

Figures (8)

  • Figure 1: Data and algorithm pipeline in LMs. In AIOracle, we will merge Finetune and Reinforce with Learn into LEARN. Iterated application of Infer will be represented in INFER algorithm where the first output becomes result.
  • Figure 2: Objectives–Security–Source mapping of AIOracle.
  • Figure 3: An overview of LM attack categories grouped by the model's phases. Arrows denote progressive specialization in the type of attacks.
  • Figure 4: DPD$(n)$
  • Figure 5: Security_Game$(\textsf{ATK}, \textsf{source})$
  • ...and 3 more figures

Theorems & Definitions (4)

  • Definition 1
  • Definition 2
  • Definition 3
  • Theorem 1