A Short Survey on Importance Weighting for Machine Learning

Masanari Kimura; Hideitsu Hino

A Short Survey on Importance Weighting for Machine Learning

Masanari Kimura, Hideitsu Hino

TL;DR

The paper surveys the use of importance weighting to address distribution shift and other ML challenges by leveraging density ratios $p_{te}(\mathbf{x})/p_{tr}(\mathbf{x})$. It covers foundational tools such as density ratio estimation, and applies them across covariate shift, target shift, sample selection bias, subpopulation and feedback shifts, as well as domain adaptation, DRO, active learning, calibration, PU learning, label noise, fairness, and deep learning. The main contribution is organizing a wide range of methods under the importance weighting framework, highlighting both theoretical guarantees (e.g., unbiasedness under covariate shift) and practical robustness considerations (AIWERM, RIWERM, DIW). This unified view clarifies when and how density ratios drive effective weighting, guiding methodological choices and future research for robust, fair, and scalable ML systems.

Abstract

Importance weighting is a fundamental procedure in statistics and machine learning that weights the objective function or probability distribution based on the importance of the instance in some sense. The simplicity and usefulness of the idea has led to many applications of importance weighting. For example, it is known that supervised learning under an assumption about the difference between the training and test distributions, called distribution shift, can guarantee statistically desirable properties through importance weighting by their density ratio. This survey summarizes the broad applications of importance weighting in machine learning and related research.

A Short Survey on Importance Weighting for Machine Learning

TL;DR

The paper surveys the use of importance weighting to address distribution shift and other ML challenges by leveraging density ratios

. It covers foundational tools such as density ratio estimation, and applies them across covariate shift, target shift, sample selection bias, subpopulation and feedback shifts, as well as domain adaptation, DRO, active learning, calibration, PU learning, label noise, fairness, and deep learning. The main contribution is organizing a wide range of methods under the importance weighting framework, highlighting both theoretical guarantees (e.g., unbiasedness under covariate shift) and practical robustness considerations (AIWERM, RIWERM, DIW). This unified view clarifies when and how density ratios drive effective weighting, guiding methodological choices and future research for robust, fair, and scalable ML systems.

Abstract

Paper Structure (23 sections, 2 theorems, 55 equations, 5 figures, 1 table)

This paper contains 23 sections, 2 theorems, 55 equations, 5 figures, 1 table.

Introduction
Preliminary
Density Ratio Estimation
Distribution Shift Adaptation
Covariate Shift
Target Shift
Sample Selection Bias
Subpopulation Shift
Feedback Shift
Domain Adaptation
Multi-Source Domain Adaptation
Partial Domain Adaptation
Open-Set Domain Adaptation
Universal Domain Adaptation
Distributionally Robust Optimization
...and 8 more sections

Key Result

Theorem 1

If we assume that $P(s = 1 \mid x, y) = P(s = 1 \mid x)$, then we have under the sample selection bias.

Figures (5)

Figure 1: Applications of importance weighting.
Figure 2: Illustration of two distribution shifts. Covariate shift assumes a shift in the distribution of the input vector $\bm{x}$ and target shift assumes a shift in the distribution of the output $y$.
Figure 3: Illustration of different domain adaptation settings.
Figure 4: Left panel: the reliability diagrams of two models. Left one is well calibrated than right one. Right panel: illustrative plot of focal loss. By increasing the focal loss parameter $\gamma$, we can see that the well-classified examples are downweighted.
Figure 5: Left panel: decision boundaries over epochs of training. Right panel: the effect of early stopping with importance weighting. These figures are from Figure 1 and 5 of byrd2019effect, and are reprinted with permission from the authors.

Theorems & Definitions (14)

Definition 1: Expected error
Definition 2: Empirical error
Definition 3: Covariate Shift shimodaira2000improving
Definition 4: Target Shift zhang2013domain
Definition 5: Sample Selection Bias
Theorem 1: zadrozny2004learning
Definition 6: Subpopulation Shift
Definition 7: Feedback Shift
Definition 8: Domain Adaptation
Definition 9: Open-Set Domain Adaptation
...and 4 more

A Short Survey on Importance Weighting for Machine Learning

TL;DR

Abstract

A Short Survey on Importance Weighting for Machine Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (14)