Towards Explainable Artificial Intelligence (XAI): A Data Mining Perspective
Haoyi Xiong, Xuhong Li, Xiaofei Zhang, Jiamin Chen, Xinhao Sun, Yuchen Li, Zeyi Sun, Mengnan Du
TL;DR
This paper reframes explainable AI (XAI) through a data-mining lens, organizing XAI methods by three purposes (interpretations of models, influences of training data, and domain-oriented insights) and mapping them onto a four-stage data-mining workflow (data acquisition, preparation, modeling, results reporting). It provides a comprehensive taxonomy of techniques across modalities (images, text, tabular) and data artifacts (training data, logs, prototypes, activations), detailing concrete methods such as LIME, SHAP, influence functions, TracIn, ProtoPNet, TCAV, and counterfactuals. It also discusses data valuation and anomaly detection as pivotal lenses for understanding model decisions, and highlights societal and scientific applications of XAI, including fairness, ethics, accountability, and interdisciplinary discovery. The paper identifies key limitations (data quality, scaling, evaluation frameworks) and outlines future directions toward scalable, trustworthy, and human-centered AI grounded in data-centric explainability.
Abstract
Given the complexity and lack of transparency in deep neural networks (DNNs), extensive efforts have been made to make these systems more interpretable or explain their behaviors in accessible terms. Unlike most reviews, which focus on algorithmic and model-centric perspectives, this work takes a "data-centric" view, examining how data collection, processing, and analysis contribute to explainable AI (XAI). We categorize existing work into three categories subject to their purposes: interpretations of deep models, referring to feature attributions and reasoning processes that correlate data points with model outputs; influences of training data, examining the impact of training data nuances, such as data valuation and sample anomalies, on decision-making processes; and insights of domain knowledge, discovering latent patterns and fostering new knowledge from data and models to advance social values and scientific discovery. Specifically, we distill XAI methodologies into data mining operations on training and testing data across modalities, such as images, text, and tabular data, as well as on training logs, checkpoints, models and other DNN behavior descriptors. In this way, our study offers a comprehensive, data-centric examination of XAI from a lens of data mining methods and applications.
