Table of Contents
Fetching ...

Amazing Things Come From Having Many Good Models

Cynthia Rudin, Chudi Zhong, Lesia Semenova, Margo Seltzer, Ronald Parr, Jiachang Liu, Srikar Katta, Jon Donnelly, Harry Chen, Zachery Boner

TL;DR

Real-world tabular data often admit many approximately-equally-good models, a phenomenon known as the Rashomon Effect, which challenges the notion of a single optimal solution. The paper argues for a paradigm shift toward exploring Rashomon sets—collections of good models—to achieve simpler yet accurate, fair, and interpretable solutions, especially under noise. It introduces algorithms (TreeFARMS, GAM Rashomon set, FasterRisk) and interactive tools (TimberTrek, GAMChanger) to enumerate and navigate these sets, and it develops concepts like stable variable importance via the Rashomon Importance Distribution (RID). The work demonstrates predictive multiplicity on the FICO dataset, discusses algorithm selection, and outlines policy implications, advocating a move toward constraint-aware, interactive modeling that can improve transparency and societal impact in high-stakes decisions.

Abstract

The Rashomon Effect, coined by Leo Breiman, describes the phenomenon that there exist many equally good predictive models for the same dataset. This phenomenon happens for many real datasets and when it does, it sparks both magic and consternation, but mostly magic. In light of the Rashomon Effect, this perspective piece proposes reshaping the way we think about machine learning, particularly for tabular data problems in the nondeterministic (noisy) setting. We address how the Rashomon Effect impacts (1) the existence of simple-yet-accurate models, (2) flexibility to address user preferences, such as fairness and monotonicity, without losing performance, (3) uncertainty in predictions, fairness, and explanations, (4) reliable variable importance, (5) algorithm choice, specifically, providing advanced knowledge of which algorithms might be suitable for a given problem, and (6) public policy. We also discuss a theory of when the Rashomon Effect occurs and why. Our goal is to illustrate how the Rashomon Effect can have a massive impact on the use of machine learning for complex problems in society.

Amazing Things Come From Having Many Good Models

TL;DR

Real-world tabular data often admit many approximately-equally-good models, a phenomenon known as the Rashomon Effect, which challenges the notion of a single optimal solution. The paper argues for a paradigm shift toward exploring Rashomon sets—collections of good models—to achieve simpler yet accurate, fair, and interpretable solutions, especially under noise. It introduces algorithms (TreeFARMS, GAM Rashomon set, FasterRisk) and interactive tools (TimberTrek, GAMChanger) to enumerate and navigate these sets, and it develops concepts like stable variable importance via the Rashomon Importance Distribution (RID). The work demonstrates predictive multiplicity on the FICO dataset, discusses algorithm selection, and outlines policy implications, advocating a move toward constraint-aware, interactive modeling that can improve transparency and societal impact in high-stakes decisions.

Abstract

The Rashomon Effect, coined by Leo Breiman, describes the phenomenon that there exist many equally good predictive models for the same dataset. This phenomenon happens for many real datasets and when it does, it sparks both magic and consternation, but mostly magic. In light of the Rashomon Effect, this perspective piece proposes reshaping the way we think about machine learning, particularly for tabular data problems in the nondeterministic (noisy) setting. We address how the Rashomon Effect impacts (1) the existence of simple-yet-accurate models, (2) flexibility to address user preferences, such as fairness and monotonicity, without losing performance, (3) uncertainty in predictions, fairness, and explanations, (4) reliable variable importance, (5) algorithm choice, specifically, providing advanced knowledge of which algorithms might be suitable for a given problem, and (6) public policy. We also discuss a theory of when the Rashomon Effect occurs and why. Our goal is to illustrate how the Rashomon Effect can have a massive impact on the use of machine learning for complex problems in society.
Paper Structure (12 sections, 6 figures, 2 tables)

This paper contains 12 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Illustration showing that for hypothesis spaces with good approximating properties, larger Rashomon sets tend to contain multiple simpler models. For every model in the more complex space (blue shaded region), there exists a $\delta$-close model from the simpler space (orange dots). In this illustration, the Rashomon set contains at least four simpler models, which is its $2\delta$-packing number, where blue dots correspond to the centers of the balls in the packing.
  • Figure 2: Decision tree: train and test accuracy are both approximately 72%, which is comparable to the best black box algorithms (deep learning and boosted decision trees). This tree has 7 leaves and was obtained in 8.1 seconds by the GOSDT algorithm lin2020generalizedmctavish2022fast.
  • Figure 3: Sparse GAM: the user gets a score for each of the 11 features in this model, from the FastSparse algorithm liu2022fast. The sum of scores translates into a risk of defaulting on a loan. The model was obtained in 3.15 seconds. Its test AUC and accuracy are .790 and .723. The cross-validation AUC and accuracy of FastSparse are $0.791 \pm 0.010$ and $72.4\% \pm 1.2\%$.
  • Figure 4: Example sparse decision trees in the Rashomon set of the COMPAS dataset LarsonMaKiAn16, found by TreeFARMS.
  • Figure 5: Interactive tools.
  • ...and 1 more figures