Learning Interpretable Models Using Uncertainty Oracles

Abhishek Ghose; Balaraman Ravindran

Learning Interpretable Models Using Uncertainty Oracles

Abhishek Ghose, Balaraman Ravindran

TL;DR

The paper tackles the challenge of achieving small, human-interpretable models without sacrificing accuracy by learning a training distribution that favors compact models. It encodes this distribution with an Infinite Beta Mixture Model via a Dirichlet Process and projects data to 1D using an uncertainty oracle, optimizing DP parameters with Bayesian Optimization to maximize held-out accuracy for various interpretable families. The approach is model-agnostic, accommodates non-differentiable losses, and supports multi-component size definitions and cross-feature-space oracles, yielding substantial improvements over baselines in many settings and showing robustness to model size and data representation. Practically, this method offers a flexible, scalable way to push the size-accuracy frontier for interpretable models across domains while preserving reproducibility and extensibility.

Abstract

A desirable property of interpretable models is small size, so that they are easily understandable by humans. This leads to the following challenges: (a) small sizes typically imply diminished accuracy, and (b) bespoke levers provided by model families to restrict size, e.g., L1 regularization, might be insufficient to reach the desired size-accuracy trade-off. We address these challenges here. Earlier work has shown that learning the training distribution creates accurate small models. Our contribution is a new technique that exploits this idea. The training distribution is encoded as a Dirichlet Process to allow for a flexible number of modes that is learnable from the data. Its parameters are learned using Bayesian Optimization; a design choice that makes the technique applicable to non-differentiable loss functions. To avoid the challenges with high dimensionality, the data is first projected down to one-dimension using uncertainty scores of a separate probabilistic model, that we refer to as the uncertainty oracle. We show that this technique addresses the above challenges: (a) it arrests the reduction in accuracy that comes from shrinking a model (in some cases we observe $\sim 100\%$ improvement over baselines), and also, (b) that this maybe applied with no change across model families with different notions of size; results are shown for Decision Trees, Linear Probability models and Gradient Boosted Models. Additionally, we show that (1) it is more accurate than its predecessor, (2) requires only one hyperparameter to be set in practice, (3) accommodates a multi-variate notion of model size, e.g., both maximum depth of a tree and number of trees in Gradient Boosted Models, and (4) works across different feature spaces between the uncertainty oracle and the interpretable model, e.g., a GRU might act as an oracle for a decision tree that ingests n-grams.

Learning Interpretable Models Using Uncertainty Oracles

TL;DR

Abstract

Learning Interpretable Models Using Uncertainty Oracles

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)