Table of Contents
Fetching ...

Interpretable Generalized Additive Models for Datasets with Missing Values

Hayden McTavish, Jon Donnelly, Margo Seltzer, Cynthia Rudin

TL;DR

This work tackles interpretability for datasets with missing values by introducing M-GAM, a sparse generalized additive model that directly incorporates missingness indicators and their interactions. By leveraging $\ell_0$ regularization, M-GAM maintains sparsity and interpretability while achieving competitive or superior accuracy relative to impute-then-predict methods, especially under informative MAR missingness. The model is also significantly faster than multiple imputation pipelines and remains effective on real-world data. Overall, MGAM provides a transparent, scalable approach to predictive modeling with missing data, with code and reproducible experiments available.

Abstract

Many important datasets contain samples that are missing one or more feature values. Maintaining the interpretability of machine learning models in the presence of such missing data is challenging. Singly or multiply imputing missing values complicates the model's mapping from features to labels. On the other hand, reasoning on indicator variables that represent missingness introduces a potentially large number of additional terms, sacrificing sparsity. We solve these problems with M-GAM, a sparse, generalized, additive modeling approach that incorporates missingness indicators and their interaction terms while maintaining sparsity through l0 regularization. We show that M-GAM provides similar or superior accuracy to prior methods while significantly improving sparsity relative to either imputation or naive inclusion of indicator variables.

Interpretable Generalized Additive Models for Datasets with Missing Values

TL;DR

This work tackles interpretability for datasets with missing values by introducing M-GAM, a sparse generalized additive model that directly incorporates missingness indicators and their interactions. By leveraging regularization, M-GAM maintains sparsity and interpretability while achieving competitive or superior accuracy relative to impute-then-predict methods, especially under informative MAR missingness. The model is also significantly faster than multiple imputation pipelines and remains effective on real-world data. Overall, MGAM provides a transparent, scalable approach to predictive modeling with missing data, with code and reproducible experiments available.

Abstract

Many important datasets contain samples that are missing one or more feature values. Maintaining the interpretability of machine learning models in the presence of such missing data is challenging. Singly or multiply imputing missing values complicates the model's mapping from features to labels. On the other hand, reasoning on indicator variables that represent missingness introduces a potentially large number of additional terms, sacrificing sparsity. We solve these problems with M-GAM, a sparse, generalized, additive modeling approach that incorporates missingness indicators and their interaction terms while maintaining sparsity through l0 regularization. We show that M-GAM provides similar or superior accuracy to prior methods while significantly improving sparsity relative to either imputation or naive inclusion of indicator variables.

Paper Structure

This paper contains 33 sections, 6 theorems, 44 equations, 19 figures, 1 table.

Key Result

Proposition 3.1

Let $I:(\mathbb{R} \cup \text{NA})^{d} \to \mathbb{R}^{d}$ be an oracle imputation function that replaces all missing values in a vector with the correct non-missing entry. For a random variable $X \in \mathbb{R}^d$, let $f_1(X):=\mathbf{1}_{[\mathbb{E}[Y|I(X)]>0.5]}$ be the Bayes' optimal model usi

Figures (19)

  • Figure 1: A comparison of how GAMs that use underlying imputation (middle row) and M-GAMs (bottom row) behave when a feature is missing. Top: When no data are missing, the overall output logit for both models is the sum of three univariate shape functions. Middle: When $X3$ is missing, it is imputed as $X3 = X1 + 2X2$, producing a 3D shape function that is difficult to understand. Bottom: M-GAM uses simple adjustments to existing univariate shape curves when $X3$ is missing (using the green curves instead of the light blue ones), making its reasoning process simple to follow. If the data were more than 3 dimensional, we would not be able to visualize the model with imputation, but M-GAM would still be easily visualized.
  • Figure 2: A generalized additive model (GAM) for the Explainable ML Challenge data from competition with missingness incorporated. This model handles missingness interpretably by explicitly providing alternative shape functions when a variable is missing. For example, in this model the shape function for variable 2 is adjusted when variable 3 is missing, and the shape function for variable 3 is removed. This model achieves comparable performance to convoluted black box approaches (such as random forests and/or MICE), but provides global interpretability (the entire model can easily be inspected) and local interpretability (the shape functions applied for a given sample can be easily visualized). An expanded version of this figure with variable names can be found in Appendix Figure \ref{['fig:app-expanded_fig_2']}. Shape functions in the right section are shared across all missing variable combinations. The type of missingness is indicated in parentheses next to the missing variable. Section \ref{['sec:app_extra_viz']} visualizes additional M-GAM$^{}$s.
  • Figure 3: Sparsity of M-GAM when synthetic MAR missingness is added to up to $25\%$ (left column) and $50\%$ (right column) of entries in FICO (top row) and Breast Cancer (bottom row). We compare to several alternatives for GAMs with missing data: ensembling 10 GAMs fit on multiple imputation (for MIWAE, MICE, and MissForest), 0-value imputation ("GAM"), mean-value imputation ("GAM w/ MVI"), and selective addition of missingness indicators ("SMIM"). The number of non-zero coefficients for multiple imputation cannot be evaluated because the models depend on both the GAM coefficients and the underlying imputation mechanisms, resulting in high dimensional shape functions as in Figure \ref{['fig:impu_breaks_gam']}. Error bars report standard error over 10 train-test splits.
  • Figure 4: Test performance of three models at various levels of sparsity on the unaltered FICO and Breast Cancer datasets, with the same baselines as in Figure \ref{['fig:synthetic-sparsity-acc']}
  • Figure 5: Runtime of different methods on Breast Cancer, FICO, MIMIC, and Pharyngitis. For each imputation method, we report the total time required to impute missing data and fit the best performing impute-then-predict classifier for that dataset and imputation method. M-GAM (Ind) is an M-GAM with indicators and M-GAM (Int) is an M-GAM with indicators and interaction terms. Error bars report standard error of total runtime over 10 train-test splits.
  • ...and 14 more figures

Theorems & Definitions (12)

  • Proposition 3.1
  • Corollary 3.2
  • Definition 3.3
  • Theorem 3.4
  • Proposition A.1
  • proof
  • proof
  • Corollary B.1
  • proof
  • Theorem C.1
  • ...and 2 more