Table of Contents
Fetching ...

MISS: Multiclass Interpretable Scoring Systems

Michal K. Grzeszczyk, Tomasz Trzciński, Arkadiusz Sitek

TL;DR

MISS tackles the need for interpretable multiclass classification by learning a single, sparse scoring system with per-class integer coefficients via mixed-integer programming. It optimizes cross-entropy with AUC and calibration while penalizing feature usage with an $\ell_{0}$ term, and provides an optimality gap to certify solution quality. Key innovations include Recursive Feature Aggregation and algorithmic improvements that reduce dimensionality and tighten bounds, enabling competitive performance on binary and multiclass datasets with well-calibrated probabilities. The approach yields highly usable scoring systems suitable for domains demanding transparency, such as healthcare and criminal justice, and is complemented by open-source tooling.

Abstract

In this work, we present a novel, machine-learning approach for constructing Multiclass Interpretable Scoring Systems (MISS) - a fully data-driven methodology for generating single, sparse, and user-friendly scoring systems for multiclass classification problems. Scoring systems are commonly utilized as decision support models in healthcare, criminal justice, and other domains where interpretability of predictions and ease of use are crucial. Prior methods for data-driven scoring, such as SLIM (Supersparse Linear Integer Model), were limited to binary classification tasks and extensions to multiclass domains were primarily accomplished via one-versus-all-type techniques. The scores produced by our method can be easily transformed into class probabilities via the softmax function. We demonstrate techniques for dimensionality reduction and heuristics that enhance the training efficiency and decrease the optimality gap, a measure that can certify the optimality of the model. Our approach has been extensively evaluated on datasets from various domains, and the results indicate that it is competitive with other machine learning models in terms of classification performance metrics and provides well-calibrated class probabilities.

MISS: Multiclass Interpretable Scoring Systems

TL;DR

MISS tackles the need for interpretable multiclass classification by learning a single, sparse scoring system with per-class integer coefficients via mixed-integer programming. It optimizes cross-entropy with AUC and calibration while penalizing feature usage with an term, and provides an optimality gap to certify solution quality. Key innovations include Recursive Feature Aggregation and algorithmic improvements that reduce dimensionality and tighten bounds, enabling competitive performance on binary and multiclass datasets with well-calibrated probabilities. The approach yields highly usable scoring systems suitable for domains demanding transparency, such as healthcare and criminal justice, and is complemented by open-source tooling.

Abstract

In this work, we present a novel, machine-learning approach for constructing Multiclass Interpretable Scoring Systems (MISS) - a fully data-driven methodology for generating single, sparse, and user-friendly scoring systems for multiclass classification problems. Scoring systems are commonly utilized as decision support models in healthcare, criminal justice, and other domains where interpretability of predictions and ease of use are crucial. Prior methods for data-driven scoring, such as SLIM (Supersparse Linear Integer Model), were limited to binary classification tasks and extensions to multiclass domains were primarily accomplished via one-versus-all-type techniques. The scores produced by our method can be easily transformed into class probabilities via the softmax function. We demonstrate techniques for dimensionality reduction and heuristics that enhance the training efficiency and decrease the optimality gap, a measure that can certify the optimality of the model. Our approach has been extensively evaluated on datasets from various domains, and the results indicate that it is competitive with other machine learning models in terms of classification performance metrics and provides well-calibrated class probabilities.
Paper Structure (10 sections, 4 equations, 2 figures, 15 tables, 1 algorithm)

This paper contains 10 sections, 4 equations, 2 figures, 15 tables, 1 algorithm.

Figures (2)

  • Figure 1: Performance of MISS with $\Lambda^{max} \leq 10$ (left) and $R^{max} \leq 10$ (right) on ph dataset.
  • Figure 2: Example of qSOFA scoring system measuring the risk of in-hospital mortality with suspected sepsis infection. The binary features used by this scoring system are: 1. Altered mental status Glasgow Coma Scale (GCS), 2. Respiratory rate (RR), 3. Systolic Blood Pressure (SBP).