Model Monitoring in the Absence of Labeled Data via Feature Attributions Distributions

Carlos Mougan

Model Monitoring in the Absence of Labeled Data via Feature Attributions Distributions

Carlos Mougan

TL;DR

This work develops an unsupervised framework for monitoring deployed ML models when labeled data is unavailable, by leveraging distributions of feature attributions (notably SHAP and LIME) to study both AI alignment and performance monitoring under distribution shift. It formalizes Equal Treatment, arguing that independence between explanation distributions and protected attributes provides a stricter, more philosophically grounded fairness notion than traditional Demographic Parity, and introduces the Equal Treatment Inspector, a classifier-two-sample-test-based tool. The thesis introduces the concept of explanation shifts to capture how model explanations change under distribution shifts and integrates an explainable-uncertainty approach to identify drivers of model deterioration. It delivers open-source software (explanationspace, skshift) and extensive empirical validation on synthetic and real tabular data (e.g., ACS, StackOverflow), demonstrating that explanation-based monitoring can detect shifts and diagnose fairness concerns more sensitively than input- or output-based metrics. The work concludes with reflections on limitations, reliability of explanations, and future directions for extending this framework to broader domains and real-world applications, highlighting the ethical implications of aligning ML systems with liberal and Kantian fairness ideals.

Abstract

Model monitoring involves analyzing AI algorithms once they have been deployed and detecting changes in their behaviour. This thesis explores machine learning model monitoring ML before the predictions impact real-world decisions or users. This step is characterized by one particular condition: the absence of labelled data at test time, which makes it challenging, even often impossible, to calculate performance metrics. The thesis is structured around two main themes: (i) AI alignment, measuring if AI models behave in a manner consistent with human values and (ii) performance monitoring, measuring if the models achieve specific accuracy goals or desires. The thesis uses a common methodology that unifies all its sections. It explores feature attribution distributions for both monitoring dimensions. Using these feature attribution explanations, we can exploit their theoretical properties to derive and establish certain guarantees and insights into model monitoring.

Model Monitoring in the Absence of Labeled Data via Feature Attributions Distributions

TL;DR

Abstract

Paper Structure (161 sections, 6 theorems, 29 equations, 31 figures, 20 tables)

This paper contains 161 sections, 6 theorems, 29 equations, 31 figures, 20 tables.

Introduction to Model Monitoring
AI Alignment and Feature Attributions Challenges
Performance Monitoring with Feature Attributions Distributions
Reproducibility and Software Contribution Objectives
Summary Requirements for Model Monitoring in the Absence of Labeled Data via Feature Attributions Distributions
Overview of the Thesis
Foundations
Mathematical Notation
Feature Attribution Explainations
Shapley Values
Efficiency.
Uninformativeness.
Linear Models and IID Data
LIME: Local Interpretable Model-Agnostic Explanations
AI Alignment
...and 146 more sections

Key Result

Lemma 3.5

Consider a linear model $f_\beta(x) = \beta_0 + \sum_j \beta_j \cdot x_j$. Let $Z$ be the $i$-th feature, i.e. $Z = X_i$, and let $\mathcal{D}\xspace_X{}$ be such that $\mathit{distinct}(\mathcal{D}\xspace_X{}, i)>1$. If the features in $X$ are independent, then $\mathcal{S}(f_\beta, \mathcal{D}\xsp

Figures (31)

Figure 1: Model monitoring occurs within the AI system life cycle. During model development, task requirements and data are collected, models are trained and evaluated, and parameters are refined. Finally, the model is assessed against benchmarks and metrics using a hold-out dataset.
Figure 2: Life cycle of monitoring a supervised learning binary classifier. Once a model is deployed, it encounters new, unlabelled data, prompting the need to observe and analyse its behaviour to detect potential issues before serving the predictions in the real world. In some cases, after serving the model to users, the acquisition of ground truth data allows for further evaluation. Leveraging these results and the available ground truth data, the original model can undergo retraining.
Figure 3: Equal Treatment Inspector workflow. The model $f_\theta$ is learned based on training data, $\mathcal{D}\xspace = \{(x_i,y_i)\}$, and outputs the explanations $\mathcal{S}(f_\theta,\mathcal{D}\xspace_X)$. The Classifier Two-Sample Test receives the explanations to predict the protected attribute, $Z$. The AUC of the two-sample test classifier $g_\psi$ decides for or against equal treatment. We can interpret the driver for unequal treatment on $g_\psi$ with explainable AI techniques
Figure 4: Comparing the power (the higher, the better) of C2ST based on AUC with Brunner-Munzel test (AUC Test BM) vs Accuracy vs AUC with an asymptotic normal approximation of the Wilcoxon–Mann–Whitney statistics (AUC Test A). Upper: balanced groups ($P(Z=1)=0.5$). Lower: unbalanced groups ($P(Z=1)=0.2$).
Figure 5: Coefficient of $g_{\psi}$ over $\gamma$ for synthetic datasets in two experimental scenarios.
...and 26 more figures

Theorems & Definitions (38)

Definition 2.1
Definition 2.2
Definition 2.3
Definition 2.4
Definition 2.5
Definition 2.6: Univariate data shift
Definition 2.7: Covariate data shift
Definition 2.8: Predictions Shift
Definition 2.9: Concept Shift
Definition 3.1
...and 28 more

Model Monitoring in the Absence of Labeled Data via Feature Attributions Distributions

TL;DR

Abstract

Model Monitoring in the Absence of Labeled Data via Feature Attributions Distributions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (31)

Theorems & Definitions (38)