Safe Distributionally Robust Feature Selection under Covariate Shift

Hiroyuki Hanada; Satoshi Akahane; Noriaki Hashimoto; Shion Takeno; Ichiro Takeuchi

Safe Distributionally Robust Feature Selection under Covariate Shift

Hiroyuki Hanada, Satoshi Akahane, Noriaki Hashimoto, Shion Takeno, Ichiro Takeuchi

Abstract

In practical machine learning, the environments encountered during the model development and deployment phases often differ, especially when a model is used by many users in diverse settings. Learning models that maintain reliable performance across plausible deployment environments is known as distributionally robust (DR) learning. In this work, we study the problem of distributionally robust feature selection (DRFS), with a particular focus on sparse sensing applications motivated by industrial needs. In practical multi-sensor systems, a shared subset of sensors is typically selected prior to deployment based on performance evaluations using many available sensors. At deployment, individual users may further adapt or fine-tune models to their specific environments. When deployment environments differ from those anticipated during development, this strategy can result in systems lacking sensors required for optimal performance. To address this issue, we propose safe-DRFS, a novel approach that extends safe screening from conventional sparse modeling settings to a DR setting under covariate shift. Our method identifies a feature subset that encompasses all subsets that may become optimal across a specified range of input distribution shifts, with finite-sample theoretical guarantees of no false feature elimination.

Safe Distributionally Robust Feature Selection under Covariate Shift

Abstract

Paper Structure (25 sections, 10 theorems, 40 equations, 3 figures, 1 table)

This paper contains 25 sections, 10 theorems, 40 equations, 3 figures, 1 table.

Introduction
Distributionally Robust (DR) learning
Safe screening
Problem Setup
Development phase
Deployment phase
DR feature subset
Proposed Method: Safe Distributionally Robust Feature Selection (Safe-DRFS)
Sparseness Conditions
Safe Screening
Distributionally Robust Sparseness Conditions
Numerical Experiments
Conclusions and Future Works
Proofs for the Proposed Method
Lemmas to be Used
...and 10 more sections

Key Result

Theorem 3.1

We assume that the loss function $\ell(\cdot,\cdot)$ in (eq:weighted_erm) is closed and convex in its second argument, and is twice continuously differentiable with respect to that argument, with a Lipschitz continuous derivative with constant $\nu$. Additionally, we assume that the weights satisfy

Figures (3)

Figure 1: This figure illustrates the sparse sensing problem studied in this work, which consists of two phases: a development phase (top) and a deployment phase (bottom). In the development phase, a system developer selects a subset of sensors from a large pool of candidates and designs a system equipped with these sensors, after which the sensor set is fixed. In the deployment phase, the system is used by many end users operating in diverse environments with uncertainty, modeled as covariate shift, where the input distribution varies across users within a specified range. Each user adapts or fine-tunes a sparse model, such as a regression or classification model, using only the sensors available in the system. Under covariate shift, the optimal support of sparse models may differ across users, leading to different sensor requirements. The key challenge for the developer is therefore to select, during development, all sensors that may be required under plausible deployment environments while safely eliminating sensors that will never be used.
Figure 2: Ratios of removed features to the original features for four regression datasets. Horizontal axis: The uncertainty levels of distributional change $V$, Vertical axis: Ratio of reduced features.
Figure 3: Ratio of removed features for eight binary classification datasets to the original features. Horizontal axis: The uncertainty levels of distributional change $V$, Vertical axis: Ratio of reduced features.

Theorems & Definitions (13)

Theorem 3.1
Theorem 3.2
Lemma 1.1
Lemma 1.2: "Rearrangement inequality"
Lemma 1.3: Corollary 32.3.4 of rockafellar1970convex
Lemma 1.4
proof
Lemma 1.5
proof
Lemma 1.6
...and 3 more

Safe Distributionally Robust Feature Selection under Covariate Shift

Abstract

Safe Distributionally Robust Feature Selection under Covariate Shift

Authors

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (13)