Distributionally Robust Safe Screening

Hiroyuki Hanada; Satoshi Akahane; Tatsuya Aoyama; Tomonari Tanaka; Yoshito Okura; Yu Inatsu; Noriaki Hashimoto; Taro Murayama; Lee Hanju; Shinya Kojima; Ichiro Takeuchi

Distributionally Robust Safe Screening

Hiroyuki Hanada, Satoshi Akahane, Tatsuya Aoyama, Tomonari Tanaka, Yoshito Okura, Yu Inatsu, Noriaki Hashimoto, Taro Murayama, Lee Hanju, Shinya Kojima, Ichiro Takeuchi

TL;DR

This work tackles the problem of identifying redundant samples and features in supervised learning under covariate shift with unknown test distributions. It introduces Distributionally Robust Safe Screening (DRSS), a framework that extends safe screening to weighted empirical risk minimization where weights may change within a predefined set, and provides tight guarantees via duality-gap bounds. The approach yields DRSS rules for both samples (DRSsS) and features (DRSfS), and develops concrete algorithms for typical ML setups (e.g., L1/L2-regularized SVMs) including a method to maximize convex quadratic forms over a hyperball. It also extends to deep learning by applying screening to the last layer, and validates the method through numerical experiments on synthetic and LIBSVM datasets as well as a DL example, demonstrating robust screening under weight perturbations. Overall, DRSS offers a practical tool to reduce computation and storage while maintaining performance under distributional uncertainty, with broad applicability from classical convex models to deep learning settings.

Abstract

In this study, we propose a method Distributionally Robust Safe Screening (DRSS), for identifying unnecessary samples and features within a DR covariate shift setting. This method effectively combines DR learning, a paradigm aimed at enhancing model robustness against variations in data distribution, with safe screening (SS), a sparse optimization technique designed to identify irrelevant samples and features prior to model training. The core concept of the DRSS method involves reformulating the DR covariate-shift problem as a weighted empirical risk minimization problem, where the weights are subject to uncertainty within a predetermined range. By extending the SS technique to accommodate this weight uncertainty, the DRSS method is capable of reliably identifying unnecessary samples and features under any future distribution within a specified range. We provide a theoretical guarantee of the DRSS method and validate its performance through numerical experiments on both synthetic and real-world datasets.

Distributionally Robust Safe Screening

TL;DR

Abstract

Paper Structure (34 sections, 14 theorems, 57 equations, 6 figures, 2 tables)

This paper contains 34 sections, 14 theorems, 57 equations, 6 figures, 2 tables.

Introduction
Related Works
Preliminaries
Weighted Regularized Empirical Risk Minimization (Weighted RERM) for Linear Prediction
Sparsity-inducing Loss Functions and Regularization Functions
Distributionally Robust Safe Screening
(Non-DR) Safe Sample Screening
(Non-DR) Safe Feature Screening
Application to Distributionally Robust Setup
DRSS for Typical ML Setups
DRSsS for L1-loss L2-regularized SVM
DRSfS for L2-loss L1-regularized SVM
Maximizing Linear and Convex Quadratic Functions in Hyperball Constraint
Application to Deep Learning
Numerical Experiment
...and 19 more sections

Key Result

Lemma 3.1

Suppose that $\rho$ in $P_{\bm w}$ (and also $P_{\bm w}$ itself) of eq:primal are $\kappa$-strongly convex. Then, for any $\hat{\bm\beta}\in\mathbb{R}^d$ and $\hat{\bm\alpha}\in\mathbb{R}^n$, we can assure $\bm\beta^{*(\bm w)}\in{\cal B}^{*(\bm w)}$ by taking

Figures (6)

Figure 1: Schematic illustration of the proposed Distributionally Robust Safe Screening (DRSS) method. Panel A displays the training samples, each assigned equal weight, as indicated by the uniform size of the points. Panel B depicts various unknown test distributions, highlighting how the significance of training samples varies with different realizations of the test distribution. Panel C shows the outcomes of safe sample screening (SsS) across multiple realizations of test distributions. Finally, Panel D presents the results of the proposed DRSS method, demonstrating its capability to identify redundant samples regardless of the observed test distribution.
Figure 2: An example of the expression ${\cal T}(\nu)$ (black solid line) in Lemmas \ref{['lem:maximize-convex-quadratic']} and \ref{['lem:find-invsq']}. Colored dash lines denote terms in the summation $(\xi_{e_k}/(\nu - \phi_{e_k}))^2$. We can see that, given an interval $(\phi_{e_k}, \phi_{e_{k+1}})$ ($k\in[N-1]$), the function is convex.
Figure 3: Concept of how to apply SS for deep learning. SS is applied to the last layer for the final prediction.
Figure 4: Ratio of screened samples by DRSsS for dataset "sonar".
Figure 7: Ratios of screened samples by DRSsS.
...and 1 more figures

Theorems & Definitions (30)

Definition 2.1
Remark 2.2
Lemma 3.1
Lemma 3.2
Lemma 3.3
Lemma 3.4
Definition 3.5: weight-changing safe screening (WCSS)
Definition 3.6: Distributionally robust safe screening (DRSS)
Theorem 3.7
Lemma 4.1
...and 20 more

Distributionally Robust Safe Screening

TL;DR

Abstract

Distributionally Robust Safe Screening

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (30)