Table of Contents
Fetching ...

Stability Evaluation via Distributional Perturbation Analysis

Jose Blanchet, Peng Cui, Jiajin Li, Jiashuo Liu

TL;DR

The paper addresses how to evaluate model stability under distributional perturbations that arise from data corruptions and sub-population shifts. It introduces an OT-based stability criterion defined as the minimal perturbation in the joint sample-density space needed to push risk above a threshold $r$, leveraging strong duality to obtain finite-dimensional reformulations. The framework supports different loss functions and divergence choices, yields both global and feature-wise stability measures, and is demonstrated on income, health coverage, and COVID-19 mortality tasks, revealing how robustness methods improve stability and how feature stability can uncover biases. The approach offers practical guidance for selecting robustness strategies and auditing fairness, with potential applicability to diverse model architectures and high-stakes decision-making contexts.

Abstract

The performance of learning models often deteriorates when deployed in out-of-sample environments. To ensure reliable deployment, we propose a stability evaluation criterion based on distributional perturbations. Conceptually, our stability evaluation criterion is defined as the minimal perturbation required on our observed dataset to induce a prescribed deterioration in risk evaluation. In this paper, we utilize the optimal transport (OT) discrepancy with moment constraints on the \textit{(sample, density)} space to quantify this perturbation. Therefore, our stability evaluation criterion can address both \emph{data corruptions} and \emph{sub-population shifts} -- the two most common types of distribution shifts in real-world scenarios. To further realize practical benefits, we present a series of tractable convex formulations and computational methods tailored to different classes of loss functions. The key technical tool to achieve this is the strong duality theorem provided in this paper. Empirically, we validate the practical utility of our stability evaluation criterion across a host of real-world applications. These empirical studies showcase the criterion's ability not only to compare the stability of different learning models and features but also to provide valuable guidelines and strategies to further improve models.

Stability Evaluation via Distributional Perturbation Analysis

TL;DR

The paper addresses how to evaluate model stability under distributional perturbations that arise from data corruptions and sub-population shifts. It introduces an OT-based stability criterion defined as the minimal perturbation in the joint sample-density space needed to push risk above a threshold , leveraging strong duality to obtain finite-dimensional reformulations. The framework supports different loss functions and divergence choices, yields both global and feature-wise stability measures, and is demonstrated on income, health coverage, and COVID-19 mortality tasks, revealing how robustness methods improve stability and how feature stability can uncover biases. The approach offers practical guidance for selecting robustness strategies and auditing fairness, with potential applicability to diverse model architectures and high-stakes decision-making contexts.

Abstract

The performance of learning models often deteriorates when deployed in out-of-sample environments. To ensure reliable deployment, we propose a stability evaluation criterion based on distributional perturbations. Conceptually, our stability evaluation criterion is defined as the minimal perturbation required on our observed dataset to induce a prescribed deterioration in risk evaluation. In this paper, we utilize the optimal transport (OT) discrepancy with moment constraints on the \textit{(sample, density)} space to quantify this perturbation. Therefore, our stability evaluation criterion can address both \emph{data corruptions} and \emph{sub-population shifts} -- the two most common types of distribution shifts in real-world scenarios. To further realize practical benefits, we present a series of tractable convex formulations and computational methods tailored to different classes of loss functions. The key technical tool to achieve this is the strong duality theorem provided in this paper. Empirically, we validate the practical utility of our stability evaluation criterion across a host of real-world applications. These empirical studies showcase the criterion's ability not only to compare the stability of different learning models and features but also to provide valuable guidelines and strategies to further improve models.
Paper Structure (17 sections, 5 theorems, 50 equations, 9 figures, 1 algorithm)

This paper contains 17 sections, 5 theorems, 50 equations, 9 figures, 1 algorithm.

Key Result

theorem 1

Suppose that (i) The set $\mathcal{Z}\times \mathcal{W}$ is compact, (ii)$\ell(\beta,\cdot)$ is upper semi-continuous for all $\beta$, (iii) the cost function $c: (\mathcal{Z} \times \mathcal{W})^2 \rightarrow \mathbb R_+$ is continuous; and (iv) the risk level $r$ is less than the worst case value where the surrogate function $\tilde{\ell}_{c}^{\alpha,h}(\beta,(\hat{z},\hat{w}) )$ equals to for

Figures (9)

  • Figure 1: Data Distribution Projection
  • Figure 2: Visualizations of the original dataset and the most sensitive distribution $\mathbb{Q}^\star$ produced by cross-entropy loss function under different $\theta_1,\theta_2$. The original prediction error is $0.1$, and the risk threshold is $0.5$.
  • Figure 3: Visualizations of the original dataset and the most sensitive distribution $\mathbb{Q}^\star$ with 0/1 loss function under different $\theta_1,\theta_2$. The original prediction error rate is $1\%$, and the error rate threshold $r$ is set to $30\%$.
  • Figure 4: Visualizations of the most sensitive distribution $\mathbb{Q}^\star$ with 0/1 loss function under different error rate threshold. We set $\theta_1=1.0$ and $\theta_2=0.25$ here.
  • Figure 5: The convergence of $\mathbb{E}_{\mathbb{Q}^{(t)}}[W\cdot \ell(\beta,Z)]$ w.r.t. epoch $t$. (a): Use general nonlinear loss function (cross-entropy loss) with $r=0.5$. (b): Use 0/1 loss function with $r=30\%$. Here $\phi_{\text{KL}}$ denotes $\phi(t)=t\log t - t+1$, and $\phi_{\chi^2}$ denotes $\phi(t)=(t-1)^2$.
  • ...and 4 more figures

Theorems & Definitions (16)

  • definition 1: OT discrepancy with moment constraints
  • remark 1
  • remark 2: Effect of $\theta_1$ and $\theta_2$
  • theorem 1: Strong duality for problem \ref{['eq:primal']}
  • remark 3
  • proposition 1: Dual reformulations
  • remark 4: Structure of the most sensitive distribution
  • theorem 2: KL divergence
  • theorem 3: $\chi^2$ Divergnce
  • proof
  • ...and 6 more