Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization

Furkan Mumcu; Yasin Yilmaz

Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization

Furkan Mumcu, Yasin Yilmaz

TL;DR

Adversarially-Aligned Jacobian Regularization (AAJR) is introduced, a trajectory-aligned approach that controls sensitivity strictly along adversarial ascent directions and yields a strictly larger admissible policy class than global constraints under mild conditions, implying a weakly smaller approximation gap and reduced nominal performance degradation.

Abstract

As Large Language Models (LLMs) transition into autonomous multi-agent ecosystems, robust minimax training becomes essential yet remains prone to instability when highly non-linear policies induce extreme local curvature in the inner maximization. Standard remedies that enforce global Jacobian bounds are overly conservative, suppressing sensitivity in all directions and inducing a large Price of Robustness. We introduce Adversarially-Aligned Jacobian Regularization (AAJR), a trajectory-aligned approach that controls sensitivity strictly along adversarial ascent directions. We prove that AAJR yields a strictly larger admissible policy class than global constraints under mild conditions, implying a weakly smaller approximation gap and reduced nominal performance degradation. Furthermore, we derive step-size conditions under which AAJR controls effective smoothness along optimization trajectories and ensures inner-loop stability. These results provide a structural theory for agentic robustness that decouples minimax stability from global expressivity restrictions.

Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization

TL;DR

Abstract

Paper Structure (23 sections, 5 theorems, 47 equations)

This paper contains 23 sections, 5 theorems, 47 equations.

Introduction
Preliminaries
Notation and Multi-Agent Objective
Minimax Optimization and GDA Dynamics
Lipschitz Continuity and the Policy Jacobian
Problem Formulation
Stability Requirement and Global Sensitivity Control
Globally Constrained Policy Class
Price of Robustness
Core bottleneck.
Standing Assumptions
Directional Jacobian Constraints for Structural Agentic Robustness
Adversarial Ascent Trajectories and Directional Sensitivity
Adaptive Hypothesis Class and Price-of-Robustness Implications
A Practical Surrogate: Adversarially-Aligned Jacobian Regularization
...and 8 more sections

Key Result

Proposition 1

If $\pi\in \mathcal{F}_\gamma$ and $\gamma \le \gamma_{\mathrm{adv}}$, then $\pi \in \mathcal{F}_{\mathrm{ad}}(\gamma_{\mathrm{adv}})$.

Theorems & Definitions (9)

Proposition 1: Global constraints imply directional constraints
Theorem 1: Class inclusion and strict expansion
proof
Corollary 1: Price-of-robustness ordering
proof
Theorem 2: Trajectory-wise effective directional smoothness
proof
Theorem 3: Stability of PGA under Directional Jacobian Control
proof

Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization

TL;DR

Abstract

Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (9)