Expert-Driven Monitoring of Operational ML Models

Joran Leest; Claudia Raibulet; Ilias Gerostathopoulos; Patricia Lago

Expert-Driven Monitoring of Operational ML Models

Joran Leest, Claudia Raibulet, Ilias Gerostathopoulos, Patricia Lago

TL;DR

This work addresses the challenge of maintaining reliable ML performance under concept drift by advocating Expert Monitoring, which embeds domain expertise into drift detection and response. It proposes a scenario-based framework where experts elicit drift-inducing events, formalize them using a standardized format, and apply Bayesian model comparison to identify which scenario explains observed drift, potentially triggering automated mitigations. The key contributions include a structured elicitation workflow, a Bayesian interpretation of drift scenarios with $P(M|D)$ and Bayes factors, and a pathway toward centralized, on-call governance of drift responses. The approach aims to reduce alert fatigue, improve interpretability of drift causes, and strengthen MLOps practices by combining human expertise with probabilistic reasoning for operational ML systems.

Abstract

We propose Expert Monitoring, an approach that leverages domain expertise to enhance the detection and mitigation of concept drift in machine learning (ML) models. Our approach supports practitioners by consolidating domain expertise related to concept drift-inducing events, making this expertise accessible to on-call personnel, and enabling automatic adaptability with expert oversight.

Expert-Driven Monitoring of Operational ML Models

TL;DR

and Bayes factors, and a pathway toward centralized, on-call governance of drift responses. The approach aims to reduce alert fatigue, improve interpretability of drift causes, and strengthen MLOps practices by combining human expertise with probabilistic reasoning for operational ML systems.

Abstract

Paper Structure (18 sections, 2 equations, 4 figures)

This paper contains 18 sections, 2 equations, 4 figures.

Introduction
Navigating the Darkness: Concept Drift in Practice
The Practical Challenges of Concept Drift Detection -- Where the Shadows Lie
Concept Drift Detection Without Labeled Data
Deciphering the Nature of Concept drift Post-Detection
Domain Expertise -- A Light in Dark Places, When All Other Lights Go Out
Addressing Concept Drift with Domain Expertise
Why Is It Hard to Rely on Domain Expertise?
Approach
Scenario Specification
Expert Knowledge Elicitation
Retrospective Analysis
Scenario Specification Format
Scenario Identification
Bayesian Model Comparison
...and 3 more sections

Figures (4)

Figure 1: Data drift in a customer churn prediction model.
Figure 2: An illustrative case for customer churn prediction, showing expert assessments for three cases of feature drift.
Figure 3: A visual depiction of our approach, Expert Monitoring. In step (A), ML engineers consolidate domain expertise within the organization using a standardized format. In step (B), upon detecting feature drift, scenarios are evaluated using Bayesian model comparison. Afterwards, the ML engineer is informed about potential causes, or an automated response is triggered.
Figure 4: Detection accuracy vs. estimate uncertainty and error (in proportion relative to actual parameter values) on simulated scenarios, with a Bayes factor threshold of 5.

Expert-Driven Monitoring of Operational ML Models

TL;DR

Abstract

Expert-Driven Monitoring of Operational ML Models

Authors

TL;DR

Abstract

Table of Contents

Figures (4)