MISS: Multiclass Interpretable Scoring Systems
Michal K. Grzeszczyk, Tomasz Trzciński, Arkadiusz Sitek
TL;DR
MISS tackles the need for interpretable multiclass classification by learning a single, sparse scoring system with per-class integer coefficients via mixed-integer programming. It optimizes cross-entropy with AUC and calibration while penalizing feature usage with an $\ell_{0}$ term, and provides an optimality gap to certify solution quality. Key innovations include Recursive Feature Aggregation and algorithmic improvements that reduce dimensionality and tighten bounds, enabling competitive performance on binary and multiclass datasets with well-calibrated probabilities. The approach yields highly usable scoring systems suitable for domains demanding transparency, such as healthcare and criminal justice, and is complemented by open-source tooling.
Abstract
In this work, we present a novel, machine-learning approach for constructing Multiclass Interpretable Scoring Systems (MISS) - a fully data-driven methodology for generating single, sparse, and user-friendly scoring systems for multiclass classification problems. Scoring systems are commonly utilized as decision support models in healthcare, criminal justice, and other domains where interpretability of predictions and ease of use are crucial. Prior methods for data-driven scoring, such as SLIM (Supersparse Linear Integer Model), were limited to binary classification tasks and extensions to multiclass domains were primarily accomplished via one-versus-all-type techniques. The scores produced by our method can be easily transformed into class probabilities via the softmax function. We demonstrate techniques for dimensionality reduction and heuristics that enhance the training efficiency and decrease the optimality gap, a measure that can certify the optimality of the model. Our approach has been extensively evaluated on datasets from various domains, and the results indicate that it is competitive with other machine learning models in terms of classification performance metrics and provides well-calibrated class probabilities.
