Interpretable Generalized Additive Models for Datasets with Missing Values

Hayden McTavish; Jon Donnelly; Margo Seltzer; Cynthia Rudin

Interpretable Generalized Additive Models for Datasets with Missing Values

Hayden McTavish, Jon Donnelly, Margo Seltzer, Cynthia Rudin

TL;DR

This work tackles interpretability for datasets with missing values by introducing M-GAM, a sparse generalized additive model that directly incorporates missingness indicators and their interactions. By leveraging $\ell_0$ regularization, M-GAM maintains sparsity and interpretability while achieving competitive or superior accuracy relative to impute-then-predict methods, especially under informative MAR missingness. The model is also significantly faster than multiple imputation pipelines and remains effective on real-world data. Overall, MGAM provides a transparent, scalable approach to predictive modeling with missing data, with code and reproducible experiments available.

Abstract

Many important datasets contain samples that are missing one or more feature values. Maintaining the interpretability of machine learning models in the presence of such missing data is challenging. Singly or multiply imputing missing values complicates the model's mapping from features to labels. On the other hand, reasoning on indicator variables that represent missingness introduces a potentially large number of additional terms, sacrificing sparsity. We solve these problems with M-GAM, a sparse, generalized, additive modeling approach that incorporates missingness indicators and their interaction terms while maintaining sparsity through l0 regularization. We show that M-GAM provides similar or superior accuracy to prior methods while significantly improving sparsity relative to either imputation or naive inclusion of indicator variables.

Interpretable Generalized Additive Models for Datasets with Missing Values

TL;DR

Abstract

Interpretable Generalized Additive Models for Datasets with Missing Values

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (19)

Theorems & Definitions (12)