Table of Contents
Fetching ...

Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization

Furkan Mumcu, Yasin Yilmaz

TL;DR

Adversarially-Aligned Jacobian Regularization (AAJR) is introduced, a trajectory-aligned approach that controls sensitivity strictly along adversarial ascent directions and yields a strictly larger admissible policy class than global constraints under mild conditions, implying a weakly smaller approximation gap and reduced nominal performance degradation.

Abstract

As Large Language Models (LLMs) transition into autonomous multi-agent ecosystems, robust minimax training becomes essential yet remains prone to instability when highly non-linear policies induce extreme local curvature in the inner maximization. Standard remedies that enforce global Jacobian bounds are overly conservative, suppressing sensitivity in all directions and inducing a large Price of Robustness. We introduce Adversarially-Aligned Jacobian Regularization (AAJR), a trajectory-aligned approach that controls sensitivity strictly along adversarial ascent directions. We prove that AAJR yields a strictly larger admissible policy class than global constraints under mild conditions, implying a weakly smaller approximation gap and reduced nominal performance degradation. Furthermore, we derive step-size conditions under which AAJR controls effective smoothness along optimization trajectories and ensures inner-loop stability. These results provide a structural theory for agentic robustness that decouples minimax stability from global expressivity restrictions.

Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization

TL;DR

Adversarially-Aligned Jacobian Regularization (AAJR) is introduced, a trajectory-aligned approach that controls sensitivity strictly along adversarial ascent directions and yields a strictly larger admissible policy class than global constraints under mild conditions, implying a weakly smaller approximation gap and reduced nominal performance degradation.

Abstract

As Large Language Models (LLMs) transition into autonomous multi-agent ecosystems, robust minimax training becomes essential yet remains prone to instability when highly non-linear policies induce extreme local curvature in the inner maximization. Standard remedies that enforce global Jacobian bounds are overly conservative, suppressing sensitivity in all directions and inducing a large Price of Robustness. We introduce Adversarially-Aligned Jacobian Regularization (AAJR), a trajectory-aligned approach that controls sensitivity strictly along adversarial ascent directions. We prove that AAJR yields a strictly larger admissible policy class than global constraints under mild conditions, implying a weakly smaller approximation gap and reduced nominal performance degradation. Furthermore, we derive step-size conditions under which AAJR controls effective smoothness along optimization trajectories and ensures inner-loop stability. These results provide a structural theory for agentic robustness that decouples minimax stability from global expressivity restrictions.
Paper Structure (23 sections, 5 theorems, 47 equations)

This paper contains 23 sections, 5 theorems, 47 equations.

Key Result

Proposition 1

If $\pi\in \mathcal{F}_\gamma$ and $\gamma \le \gamma_{\mathrm{adv}}$, then $\pi \in \mathcal{F}_{\mathrm{ad}}(\gamma_{\mathrm{adv}})$.

Theorems & Definitions (9)

  • Proposition 1: Global constraints imply directional constraints
  • Theorem 1: Class inclusion and strict expansion
  • proof
  • Corollary 1: Price-of-robustness ordering
  • proof
  • Theorem 2: Trajectory-wise effective directional smoothness
  • proof
  • Theorem 3: Stability of PGA under Directional Jacobian Control
  • proof