Table of Contents
Fetching ...

PID Control-Based Self-Healing to Improve the Robustness of Large Language Models

Zhuotong Chen, Zihu Wang, Yifan Yang, Qianxiao Li, Zheng Zhang

TL;DR

This work introduces a PID control-based self-healing scheme to enhance the robustness of pre-trained large language models against perturbations encountered during online inference. By modeling an LLM as a discrete dynamical system and designing running losses via embedding manifolds for state, accumulation, and derivative, the approach yields corrective controls at each layer. It offers analytic solutions under linearity/orthogonality assumptions (with Assumptions 1 and 2) and a theoretical error bound showing perturbations in the orthogonal complement decay, while enabling Tucker-based construction of embedding subspaces for practical deployment. Empirically, the method improves robustness across SNLI, MNLI, and ANLI datasets with modest inference-time overhead and favorable comparisons to baseline adversarial training, demonstrating the practical viability of a low-cost, feedback-driven defense for trustworthy NLP.

Abstract

Despite the effectiveness of deep neural networks in numerous natural language processing applications, recent findings have exposed the vulnerability of these language models when minor perturbations are introduced. While appearing semantically indistinguishable to humans, these perturbations can significantly reduce the performance of well-trained language models, raising concerns about the reliability of deploying them in safe-critical situations. In this work, we construct a computationally efficient self-healing process to correct undesired model behavior during online inference when perturbations are applied to input data. This is formulated as a trajectory optimization problem in which the internal states of the neural network layers are automatically corrected using a PID (Proportional-Integral-Derivative) control mechanism. The P controller targets immediate state adjustments, while the I and D controllers consider past states and future dynamical trends, respectively. We leverage the geometrical properties of the training data to design effective linear PID controllers. This approach reduces the computational cost to that of using just the P controller, instead of the full PID control. Further, we introduce an analytical method for approximating the optimal control solutions, enhancing the real-time inference capabilities of this controlled system. Moreover, we conduct a theoretical error analysis of the analytic solution in a simplified setting. The proposed PID control-based self-healing is a low cost framework that improves the robustness of pre-trained large language models, whether standard or robustly trained, against a wide range of perturbations. A detailed implementation can be found in:https://github.com/zhuotongchen/PID-Control-Based-Self-Healing-to-Improve-the-Robustness-of-Large-Language-Models.

PID Control-Based Self-Healing to Improve the Robustness of Large Language Models

TL;DR

This work introduces a PID control-based self-healing scheme to enhance the robustness of pre-trained large language models against perturbations encountered during online inference. By modeling an LLM as a discrete dynamical system and designing running losses via embedding manifolds for state, accumulation, and derivative, the approach yields corrective controls at each layer. It offers analytic solutions under linearity/orthogonality assumptions (with Assumptions 1 and 2) and a theoretical error bound showing perturbations in the orthogonal complement decay, while enabling Tucker-based construction of embedding subspaces for practical deployment. Empirically, the method improves robustness across SNLI, MNLI, and ANLI datasets with modest inference-time overhead and favorable comparisons to baseline adversarial training, demonstrating the practical viability of a low-cost, feedback-driven defense for trustworthy NLP.

Abstract

Despite the effectiveness of deep neural networks in numerous natural language processing applications, recent findings have exposed the vulnerability of these language models when minor perturbations are introduced. While appearing semantically indistinguishable to humans, these perturbations can significantly reduce the performance of well-trained language models, raising concerns about the reliability of deploying them in safe-critical situations. In this work, we construct a computationally efficient self-healing process to correct undesired model behavior during online inference when perturbations are applied to input data. This is formulated as a trajectory optimization problem in which the internal states of the neural network layers are automatically corrected using a PID (Proportional-Integral-Derivative) control mechanism. The P controller targets immediate state adjustments, while the I and D controllers consider past states and future dynamical trends, respectively. We leverage the geometrical properties of the training data to design effective linear PID controllers. This approach reduces the computational cost to that of using just the P controller, instead of the full PID control. Further, we introduce an analytical method for approximating the optimal control solutions, enhancing the real-time inference capabilities of this controlled system. Moreover, we conduct a theoretical error analysis of the analytic solution in a simplified setting. The proposed PID control-based self-healing is a low cost framework that improves the robustness of pre-trained large language models, whether standard or robustly trained, against a wide range of perturbations. A detailed implementation can be found in:https://github.com/zhuotongchen/PID-Control-Based-Self-Healing-to-Improve-the-Robustness-of-Large-Language-Models.
Paper Structure (40 sections, 11 theorems, 70 equations, 2 figures, 9 tables, 1 algorithm)

This paper contains 40 sections, 11 theorems, 70 equations, 2 figures, 9 tables, 1 algorithm.

Key Result

Proposition 0

Consider the following objective function, the optimal value function, parametrized as $V(\mathbf{x}_t) = \mathbf{x}_t^\top \mathbf{P}_t \mathbf{x}_t$, satisfies the Riccati equation: The optimal control solution is given by where $\mathbf{Q}_t = \mathbf{Q}_t^P + \mathbf{Q}_t^I + \mathbf{Q}_t^D$.

Figures (2)

  • Figure 1: The structures of feed-forward deep neural network (highlighted in blue) and the proposed PID control method (highlighted in red).
  • Figure 2: (a) and (b) are radar plots that summarize Distilbert and RoBERTaLarge in Table \ref{['table: snli measurement']} for SNLI dataset, respectively. (c) and (d) are radar plots that summarize Distilbert and RoBERTaLarge in Table \ref{['table: mnli measurement']} for MNLI dataset, respectively.

Theorems & Definitions (19)

  • Proposition 0
  • Remark 1
  • Proposition 1
  • Theorem 2
  • Proposition 2
  • proof
  • Lemma 3
  • proof
  • Proposition 3
  • Lemma 4
  • ...and 9 more