Table of Contents
Fetching ...

Dropout-Based Rashomon Set Exploration for Efficient Predictive Multiplicity Estimation

Hsiang Hsu, Guihong Li, Shaohan Hu, Chun-Fu, Chen

TL;DR

This work addresses predictive multiplicity arising when many near-optimal models (the Rashomon set) disagree on individual predictions. It introduces a dropout-based framework to efficiently explore near-optimal models in neural networks, establishing theoretical links between dropout hyperparameters and Rashomon-set loss deviations for linear models and FFNNs. The approach yields substantial runtime speedups over re-training and enables practical mitigation via dropout ensembles and model selection guided by multiplicity estimates. Empirical results across diverse domains show dropout-based exploration consistently outperforms baselines in estimating multiplicity metrics, while maintaining comparable accuracy, and demonstrate its applicability to real-world tasks such as finance, medicine, and computer vision. This framework offers a scalable, principled path to quantify and reduce predictive multiplicity, enhancing fairness and reliability in deployment of complex models.

Abstract

Predictive multiplicity refers to the phenomenon in which classification tasks may admit multiple competing models that achieve almost-equally-optimal performance, yet generate conflicting outputs for individual samples. This presents significant concerns, as it can potentially result in systemic exclusion, inexplicable discrimination, and unfairness in practical applications. Measuring and mitigating predictive multiplicity, however, is computationally challenging due to the need to explore all such almost-equally-optimal models, known as the Rashomon set, in potentially huge hypothesis spaces. To address this challenge, we propose a novel framework that utilizes dropout techniques for exploring models in the Rashomon set. We provide rigorous theoretical derivations to connect the dropout parameters to properties of the Rashomon set, and empirically evaluate our framework through extensive experimentation. Numerical results show that our technique consistently outperforms baselines in terms of the effectiveness of predictive multiplicity metric estimation, with runtime speedup up to $20\times \sim 5000\times$. With efficient Rashomon set exploration and metric estimation, mitigation of predictive multiplicity is then achieved through dropout ensemble and model selection.

Dropout-Based Rashomon Set Exploration for Efficient Predictive Multiplicity Estimation

TL;DR

This work addresses predictive multiplicity arising when many near-optimal models (the Rashomon set) disagree on individual predictions. It introduces a dropout-based framework to efficiently explore near-optimal models in neural networks, establishing theoretical links between dropout hyperparameters and Rashomon-set loss deviations for linear models and FFNNs. The approach yields substantial runtime speedups over re-training and enables practical mitigation via dropout ensembles and model selection guided by multiplicity estimates. Empirical results across diverse domains show dropout-based exploration consistently outperforms baselines in estimating multiplicity metrics, while maintaining comparable accuracy, and demonstrate its applicability to real-world tasks such as finance, medicine, and computer vision. This framework offers a scalable, principled path to quantify and reduce predictive multiplicity, enhancing fairness and reliability in deployment of complex models.

Abstract

Predictive multiplicity refers to the phenomenon in which classification tasks may admit multiple competing models that achieve almost-equally-optimal performance, yet generate conflicting outputs for individual samples. This presents significant concerns, as it can potentially result in systemic exclusion, inexplicable discrimination, and unfairness in practical applications. Measuring and mitigating predictive multiplicity, however, is computationally challenging due to the need to explore all such almost-equally-optimal models, known as the Rashomon set, in potentially huge hypothesis spaces. To address this challenge, we propose a novel framework that utilizes dropout techniques for exploring models in the Rashomon set. We provide rigorous theoretical derivations to connect the dropout parameters to properties of the Rashomon set, and empirically evaluate our framework through extensive experimentation. Numerical results show that our technique consistently outperforms baselines in terms of the effectiveness of predictive multiplicity metric estimation, with runtime speedup up to . With efficient Rashomon set exploration and metric estimation, mitigation of predictive multiplicity is then achieved through dropout ensemble and model selection.
Paper Structure (42 sections, 9 theorems, 69 equations, 27 figures, 9 tables)

This paper contains 42 sections, 9 theorems, 69 equations, 27 figures, 9 tables.

Key Result

Proposition 1

Consider Bernoulli dropout with rate $p$, and denote the dropout weights as $\mathbf{w}_D^* = \mathbf{D}_\mathbf{z} \mathbf{w}^*$. The loss deviation $\epsilon = L_\textsf{SSE}(\mathbf{w}_D^*) - L_\textsf{SSE}(\mathbf{w}'^*)$ satisfies Moreover, if the features of data matrix $\mathbf{X}$ are linearly independent and normalized, i.e., $\mathbf{X}^\top\mathbf{X} = \mathbf{I}_d$, we have $\mathbb{E

Figures (27)

  • Figure 1: Proposition \ref{['prop:lr-loss-dropout']} with 20k models and dimension $d = \{10, 50, 100, 200\}$. As $d$ increases, the variance of loss enlarges, while the mean still matches the theory in (\ref{['eq:prop1-expectation']}).
  • Figure 2: Loss vs. dropout parameters and the corresponding predictive multiplicity metrics of the baselines with UCI datasets. The figures in a row share the same y-axis for the loss difference $\epsilon$, i.e., the Rashomon parameter in (\ref{['def:emp-rashomon-set']}). Both Bernoulli and Gaussian dropouts give higher multiplicity estimates than re-training under the same loss deviation constraints. In other words, dropout is much more effective than re-training. Despite AWP outperforming all the other methods, it is the most computationally expensive.
  • Figure 3: Human detection on MS COCO dataset. The leftest column shows the ground truth of the bounding boxes, and the rest of columns are the bounding boxes found by 4 models in the dropout-based Rashomon set. The green values denote the confidence of the bounding boxes larger than 0.5, and red otherwise. The detectors of the bounding boxes suffer from predictive multiplicity in terms of the coverage and confidence.
  • Figure 4: Applications of the Rashomon set using dropout with the Adult Income dataset.
  • Figure E.5: Loss and accuracy deviations versus Bernoulli and Gaussian dropouts on the UCI datasets.
  • ...and 22 more figures

Theorems & Definitions (12)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • Lemma A.1
  • proof
  • Lemma A.2
  • proof
  • Lemma A.3: Gaussian annulus theorem blum2020foundations
  • Lemma A.4
  • ...and 2 more