Personalized Interpretable Classification
Zengyou He, Pengju Li, Yifan Tang, Lianyu Hu, Mudi Jiang, Yan Liu
TL;DR
This work introduces personalized interpretable classification, a new problem where a single, easily interpretable rule is constructed for each test sample. It formalizes the objective as maximizing a precision-recall based score $A(r,D)$ while minimizing rule length, and proposes two algorithms: PIC, a greedy, per-sample rule learner, and fPIC, a memory-efficient preprocessing-based variant. Empirical results show that PIC and fPIC achieve competitive accuracy with state-of-the-art interpretable methods and produce concise, per-sample rules, with notable gains in interpretability on diverse datasets and a compelling real-world breast cancer metastasis application. The study highlights the tradeoffs between accuracy, interpretability, and computation, offering practical guidance on when to use each method and suggesting directions for scaling to larger, more complex datasets.
Abstract
How to interpret a data mining model has received much attention recently, because people may distrust a black-box predictive model if they do not understand how the model works. Hence, it will be trustworthy if a model can provide transparent illustrations on how to make the decision. Although many rule-based interpretable classification algorithms have been proposed, all these existing solutions cannot directly construct an interpretable model to provide personalized prediction for each individual test sample. In this paper, we make a first step towards formally introducing personalized interpretable classification as a new data mining problem to the literature. In addition to the problem formulation on this new issue, we present a greedy algorithm called PIC (Personalized Interpretable Classifier) to identify a personalized rule for each individual test sample. To improve the running efficiency, a fast approximate algorithm called fPIC is presented as well. To demonstrate the necessity, feasibility and advantages of such a personalized interpretable classification method, we conduct a series of empirical studies on real data sets. The experimental results show that: (1) The new problem formulation enables us to find interesting rules for test samples that may be missed by existing non-personalized classifiers. (2) Our algorithms can achieve the same-level predictive accuracy as those state-of-the-art (SOTA) interpretable classifiers. (3) On a real data set for predicting breast cancer metastasis, such personalized interpretable classifiers can outperform SOTA methods in terms of both accuracy and interpretability.
