Table of Contents
Fetching ...

Towards Explainable Artificial Intelligence (XAI): A Data Mining Perspective

Haoyi Xiong, Xuhong Li, Xiaofei Zhang, Jiamin Chen, Xinhao Sun, Yuchen Li, Zeyi Sun, Mengnan Du

TL;DR

This paper reframes explainable AI (XAI) through a data-mining lens, organizing XAI methods by three purposes (interpretations of models, influences of training data, and domain-oriented insights) and mapping them onto a four-stage data-mining workflow (data acquisition, preparation, modeling, results reporting). It provides a comprehensive taxonomy of techniques across modalities (images, text, tabular) and data artifacts (training data, logs, prototypes, activations), detailing concrete methods such as LIME, SHAP, influence functions, TracIn, ProtoPNet, TCAV, and counterfactuals. It also discusses data valuation and anomaly detection as pivotal lenses for understanding model decisions, and highlights societal and scientific applications of XAI, including fairness, ethics, accountability, and interdisciplinary discovery. The paper identifies key limitations (data quality, scaling, evaluation frameworks) and outlines future directions toward scalable, trustworthy, and human-centered AI grounded in data-centric explainability.

Abstract

Given the complexity and lack of transparency in deep neural networks (DNNs), extensive efforts have been made to make these systems more interpretable or explain their behaviors in accessible terms. Unlike most reviews, which focus on algorithmic and model-centric perspectives, this work takes a "data-centric" view, examining how data collection, processing, and analysis contribute to explainable AI (XAI). We categorize existing work into three categories subject to their purposes: interpretations of deep models, referring to feature attributions and reasoning processes that correlate data points with model outputs; influences of training data, examining the impact of training data nuances, such as data valuation and sample anomalies, on decision-making processes; and insights of domain knowledge, discovering latent patterns and fostering new knowledge from data and models to advance social values and scientific discovery. Specifically, we distill XAI methodologies into data mining operations on training and testing data across modalities, such as images, text, and tabular data, as well as on training logs, checkpoints, models and other DNN behavior descriptors. In this way, our study offers a comprehensive, data-centric examination of XAI from a lens of data mining methods and applications.

Towards Explainable Artificial Intelligence (XAI): A Data Mining Perspective

TL;DR

This paper reframes explainable AI (XAI) through a data-mining lens, organizing XAI methods by three purposes (interpretations of models, influences of training data, and domain-oriented insights) and mapping them onto a four-stage data-mining workflow (data acquisition, preparation, modeling, results reporting). It provides a comprehensive taxonomy of techniques across modalities (images, text, tabular) and data artifacts (training data, logs, prototypes, activations), detailing concrete methods such as LIME, SHAP, influence functions, TracIn, ProtoPNet, TCAV, and counterfactuals. It also discusses data valuation and anomaly detection as pivotal lenses for understanding model decisions, and highlights societal and scientific applications of XAI, including fairness, ethics, accountability, and interdisciplinary discovery. The paper identifies key limitations (data quality, scaling, evaluation frameworks) and outlines future directions toward scalable, trustworthy, and human-centered AI grounded in data-centric explainability.

Abstract

Given the complexity and lack of transparency in deep neural networks (DNNs), extensive efforts have been made to make these systems more interpretable or explain their behaviors in accessible terms. Unlike most reviews, which focus on algorithmic and model-centric perspectives, this work takes a "data-centric" view, examining how data collection, processing, and analysis contribute to explainable AI (XAI). We categorize existing work into three categories subject to their purposes: interpretations of deep models, referring to feature attributions and reasoning processes that correlate data points with model outputs; influences of training data, examining the impact of training data nuances, such as data valuation and sample anomalies, on decision-making processes; and insights of domain knowledge, discovering latent patterns and fostering new knowledge from data and models to advance social values and scientific discovery. Specifically, we distill XAI methodologies into data mining operations on training and testing data across modalities, such as images, text, and tabular data, as well as on training logs, checkpoints, models and other DNN behavior descriptors. In this way, our study offers a comprehensive, data-centric examination of XAI from a lens of data mining methods and applications.
Paper Structure (54 sections, 8 figures, 5 tables)

This paper contains 54 sections, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Overview of Explainable AI as a Data Mining Approach for Interpretations, Influences and Insights
  • Figure 2: Taxonomy of research in Explainable Artificial Intelligence (XAI) from a Data Mining Perspectives: Interpretation of Deep Models, Influences of Training Samples, and Insights of Domain Knowledge.
  • Figure 3: Visualization of Commonly-used Feature Attribution Methods with Vision and NLP Models: (a)--(d) the ViT-base model and derivatives fine-tuned for birds classification wah2011caltech; (e) a BERT model fine-tuned on IMDb movie reviews maas-EtAl:2011:ACL-HLT2011.
  • Figure 4: An Example of Proxy Explainable Models with Global and Local Surrogates for Global and Local Interpretations
  • Figure 5: Visualizing feature importance and logic of reasoning with tree/forest-based surrogates
  • ...and 3 more figures