Table of Contents
Fetching ...

Does the Model Say What the Data Says? A Simple Heuristic for Model Data Alignment

Henry Salgado, Meagan R. Kendall, Martine Ceberio

TL;DR

The paper addresses the interpretability gap by proposing a simple, data-driven baseline to assess model–data alignment using Rubin’s Potential Outcomes-inspired framing. It computes a feature-wise standardized mean difference across outcome groups to rank features by their discriminative power in the data, and then compares this data-derived ranking with model explanations such as SHAP values and decision-tree importances. Experiments on Titanic and Pima Diabetes datasets show moderate to strong agreement between data-driven rankings and model explanations, suggesting that models often learn patterns consistent with the data structure. The approach is computationally efficient and provides practitioners a model-agnostic diagnostic, with limitations including binary classification scope and the need for extensions to multi-class and regression tasks.

Abstract

In this work, we propose a simple and computationally efficient framework for evaluating whether machine learning models align with the structure of the data they learn from; that is, whether the model says what the data says. Unlike existing interpretability methods that focus exclusively on explaining model behavior, our approach establishes a baseline derived directly from the data itself. Drawing inspiration from Rubin's Potential Outcomes Framework, we quantify how strongly each feature separates the two outcome groups in a binary classification task, moving beyond traditional descriptive statistics to estimate each feature's effect on the outcome. By comparing these data-derived feature rankings with model-based explanations, we provide practitioners with an interpretable and model-agnostic method for assessing model-data alignment.

Does the Model Say What the Data Says? A Simple Heuristic for Model Data Alignment

TL;DR

The paper addresses the interpretability gap by proposing a simple, data-driven baseline to assess model–data alignment using Rubin’s Potential Outcomes-inspired framing. It computes a feature-wise standardized mean difference across outcome groups to rank features by their discriminative power in the data, and then compares this data-derived ranking with model explanations such as SHAP values and decision-tree importances. Experiments on Titanic and Pima Diabetes datasets show moderate to strong agreement between data-driven rankings and model explanations, suggesting that models often learn patterns consistent with the data structure. The approach is computationally efficient and provides practitioners a model-agnostic diagnostic, with limitations including binary classification scope and the need for extensions to multi-class and regression tasks.

Abstract

In this work, we propose a simple and computationally efficient framework for evaluating whether machine learning models align with the structure of the data they learn from; that is, whether the model says what the data says. Unlike existing interpretability methods that focus exclusively on explaining model behavior, our approach establishes a baseline derived directly from the data itself. Drawing inspiration from Rubin's Potential Outcomes Framework, we quantify how strongly each feature separates the two outcome groups in a binary classification task, moving beyond traditional descriptive statistics to estimate each feature's effect on the outcome. By comparing these data-derived feature rankings with model-based explanations, we provide practitioners with an interpretable and model-agnostic method for assessing model-data alignment.

Paper Structure

This paper contains 20 sections, 3 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Rank comparison scatter plots for the Titanic dataset. Points close to the diagonal indicate agreement between the feature ranking methods.
  • Figure 2: Rank comparison scatter plots for the Diabetes dataset. Both comparisons show strong diagonal clustering, indicating high alignment between SMD-derived feature rankings and model-based importance measures.