PID Control-Based Self-Healing to Improve the Robustness of Large Language Models

Zhuotong Chen; Zihu Wang; Yifan Yang; Qianxiao Li; Zheng Zhang

PID Control-Based Self-Healing to Improve the Robustness of Large Language Models

Zhuotong Chen, Zihu Wang, Yifan Yang, Qianxiao Li, Zheng Zhang

TL;DR

This work introduces a PID control-based self-healing scheme to enhance the robustness of pre-trained large language models against perturbations encountered during online inference. By modeling an LLM as a discrete dynamical system and designing running losses via embedding manifolds for state, accumulation, and derivative, the approach yields corrective controls at each layer. It offers analytic solutions under linearity/orthogonality assumptions (with Assumptions 1 and 2) and a theoretical error bound showing perturbations in the orthogonal complement decay, while enabling Tucker-based construction of embedding subspaces for practical deployment. Empirically, the method improves robustness across SNLI, MNLI, and ANLI datasets with modest inference-time overhead and favorable comparisons to baseline adversarial training, demonstrating the practical viability of a low-cost, feedback-driven defense for trustworthy NLP.

Abstract

Despite the effectiveness of deep neural networks in numerous natural language processing applications, recent findings have exposed the vulnerability of these language models when minor perturbations are introduced. While appearing semantically indistinguishable to humans, these perturbations can significantly reduce the performance of well-trained language models, raising concerns about the reliability of deploying them in safe-critical situations. In this work, we construct a computationally efficient self-healing process to correct undesired model behavior during online inference when perturbations are applied to input data. This is formulated as a trajectory optimization problem in which the internal states of the neural network layers are automatically corrected using a PID (Proportional-Integral-Derivative) control mechanism. The P controller targets immediate state adjustments, while the I and D controllers consider past states and future dynamical trends, respectively. We leverage the geometrical properties of the training data to design effective linear PID controllers. This approach reduces the computational cost to that of using just the P controller, instead of the full PID control. Further, we introduce an analytical method for approximating the optimal control solutions, enhancing the real-time inference capabilities of this controlled system. Moreover, we conduct a theoretical error analysis of the analytic solution in a simplified setting. The proposed PID control-based self-healing is a low cost framework that improves the robustness of pre-trained large language models, whether standard or robustly trained, against a wide range of perturbations. A detailed implementation can be found in:https://github.com/zhuotongchen/PID-Control-Based-Self-Healing-to-Improve-the-Robustness-of-Large-Language-Models.

PID Control-Based Self-Healing to Improve the Robustness of Large Language Models

TL;DR

Abstract

Paper Structure (40 sections, 11 theorems, 70 equations, 2 figures, 9 tables, 1 algorithm)

This paper contains 40 sections, 11 theorems, 70 equations, 2 figures, 9 tables, 1 algorithm.

Introduction
Background on PID Control
The PID Control-Based Self-Healing Framework for Large Language Models
PID Control Design via Embedding Manifolds
An Analytic Solution for Fast Inference
An analytic solution under Assumption $1$.
An analytic solution under Assumption $2$.
Theoretical Error Analysis
Additional Details for Constructing PID Control
Numerical Experiments
Experimental Setup
Evaluation methods:
Baseline methods:
PID control implementation details:
Threat model:
...and 25 more sections

Key Result

Proposition 0

Consider the following objective function, the optimal value function, parametrized as $V(\mathbf{x}_t) = \mathbf{x}_t^\top \mathbf{P}_t \mathbf{x}_t$, satisfies the Riccati equation: The optimal control solution is given by where $\mathbf{Q}_t = \mathbf{Q}_t^P + \mathbf{Q}_t^I + \mathbf{Q}_t^D$.

Figures (2)

Figure 1: The structures of feed-forward deep neural network (highlighted in blue) and the proposed PID control method (highlighted in red).
Figure 2: (a) and (b) are radar plots that summarize Distilbert and RoBERTaLarge in Table \ref{['table: snli measurement']} for SNLI dataset, respectively. (c) and (d) are radar plots that summarize Distilbert and RoBERTaLarge in Table \ref{['table: mnli measurement']} for MNLI dataset, respectively.

Theorems & Definitions (19)

Proposition 0
Remark 1
Proposition 1
Theorem 2
Proposition 2
proof
Lemma 3
proof
Proposition 3
Lemma 4
...and 9 more

PID Control-Based Self-Healing to Improve the Robustness of Large Language Models

TL;DR

Abstract

PID Control-Based Self-Healing to Improve the Robustness of Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (19)