Table of Contents
Fetching ...

Self-Healing Machine Learning: A Framework for Autonomous Adaptation in Real-World Environments

Paulius Rauba, Nabeel Seedat, Krzysztof Kacprzyk, Mihaela van der Schaar

TL;DR

This paper introduces a theoretical framework for self-healing systems and builds an agentic self-healing solution H-LLM which uses large language models to perform self-diagnosis by reasoning about the structure underlying the DGP, and self-adaptation by proposing and evaluating corrective actions.

Abstract

Real-world machine learning systems often encounter model performance degradation due to distributional shifts in the underlying data generating process (DGP). Existing approaches to addressing shifts, such as concept drift adaptation, are limited by their reason-agnostic nature. By choosing from a pre-defined set of actions, such methods implicitly assume that the causes of model degradation are irrelevant to what actions should be taken, limiting their ability to select appropriate adaptations. In this paper, we propose an alternative paradigm to overcome these limitations, called self-healing machine learning (SHML). Contrary to previous approaches, SHML autonomously diagnoses the reason for degradation and proposes diagnosis-based corrective actions. We formalize SHML as an optimization problem over a space of adaptation actions to minimize the expected risk under the shifted DGP. We introduce a theoretical framework for self-healing systems and build an agentic self-healing solution H-LLM which uses large language models to perform self-diagnosis by reasoning about the structure underlying the DGP, and self-adaptation by proposing and evaluating corrective actions. Empirically, we analyze different components of H-LLM to understand why and when it works, demonstrating the potential of self-healing ML.

Self-Healing Machine Learning: A Framework for Autonomous Adaptation in Real-World Environments

TL;DR

This paper introduces a theoretical framework for self-healing systems and builds an agentic self-healing solution H-LLM which uses large language models to perform self-diagnosis by reasoning about the structure underlying the DGP, and self-adaptation by proposing and evaluating corrective actions.

Abstract

Real-world machine learning systems often encounter model performance degradation due to distributional shifts in the underlying data generating process (DGP). Existing approaches to addressing shifts, such as concept drift adaptation, are limited by their reason-agnostic nature. By choosing from a pre-defined set of actions, such methods implicitly assume that the causes of model degradation are irrelevant to what actions should be taken, limiting their ability to select appropriate adaptations. In this paper, we propose an alternative paradigm to overcome these limitations, called self-healing machine learning (SHML). Contrary to previous approaches, SHML autonomously diagnoses the reason for degradation and proposes diagnosis-based corrective actions. We formalize SHML as an optimization problem over a space of adaptation actions to minimize the expected risk under the shifted DGP. We introduce a theoretical framework for self-healing systems and build an agentic self-healing solution H-LLM which uses large language models to perform self-diagnosis by reasoning about the structure underlying the DGP, and self-adaptation by proposing and evaluating corrective actions. Empirically, we analyze different components of H-LLM to understand why and when it works, demonstrating the potential of self-healing ML.

Paper Structure

This paper contains 51 sections, 3 theorems, 22 equations, 11 figures, 11 tables, 1 algorithm.

Key Result

Proposition 1

Under Assumption asmp:independent, the optimal diagnosis $\zeta^*$ has a zero entropy, i.e., $\mathbb{H}(\zeta^*) = 0$.

Figures (11)

  • Figure 1: Different adaptation strategies $a_1, \ldots, a_4$ might result in different performance after an environment change.
  • Figure 2: Our work introduces self-healing machine learning. A healing mechanism $\mathcal{H}$ interacts with a deployed model $f$. $\mathcal{H}$ contains four components: monitoring, diagnosis, adaptation, and testing. The overall goal of SHML is to find optimal adaptation actions to maximize the predictive performance of a model $f$.
  • Figure 3: The self-healing mechanism $\mathcal{H}$ modulates the function $f$ via four stages. The chosen adaptation action $a$ is implemented onto the function $f$ at the next time step.
  • Figure 4: Lower drift detection thresholds can benefit SHML.
  • Figure 5: KL-Divergence between estimated probabilities of which variables are corrupted, and true probabilities, based on outlier factors and corruption coefficients. $\downarrow$ is better.
  • ...and 6 more figures

Theorems & Definitions (8)

  • Definition 1: Certainty of the Diagnosis
  • Definition 2: Optimal Diagnosis
  • Proposition 1
  • Proposition 2: Existence of Optimal Diagnosis
  • proof
  • Definition 3: Backtesting Window
  • Proposition 3
  • proof