Table of Contents
Fetching ...

A Short Survey on Importance Weighting for Machine Learning

Masanari Kimura, Hideitsu Hino

TL;DR

The paper surveys the use of importance weighting to address distribution shift and other ML challenges by leveraging density ratios $p_{te}(\mathbf{x})/p_{tr}(\mathbf{x})$. It covers foundational tools such as density ratio estimation, and applies them across covariate shift, target shift, sample selection bias, subpopulation and feedback shifts, as well as domain adaptation, DRO, active learning, calibration, PU learning, label noise, fairness, and deep learning. The main contribution is organizing a wide range of methods under the importance weighting framework, highlighting both theoretical guarantees (e.g., unbiasedness under covariate shift) and practical robustness considerations (AIWERM, RIWERM, DIW). This unified view clarifies when and how density ratios drive effective weighting, guiding methodological choices and future research for robust, fair, and scalable ML systems.

Abstract

Importance weighting is a fundamental procedure in statistics and machine learning that weights the objective function or probability distribution based on the importance of the instance in some sense. The simplicity and usefulness of the idea has led to many applications of importance weighting. For example, it is known that supervised learning under an assumption about the difference between the training and test distributions, called distribution shift, can guarantee statistically desirable properties through importance weighting by their density ratio. This survey summarizes the broad applications of importance weighting in machine learning and related research.

A Short Survey on Importance Weighting for Machine Learning

TL;DR

The paper surveys the use of importance weighting to address distribution shift and other ML challenges by leveraging density ratios . It covers foundational tools such as density ratio estimation, and applies them across covariate shift, target shift, sample selection bias, subpopulation and feedback shifts, as well as domain adaptation, DRO, active learning, calibration, PU learning, label noise, fairness, and deep learning. The main contribution is organizing a wide range of methods under the importance weighting framework, highlighting both theoretical guarantees (e.g., unbiasedness under covariate shift) and practical robustness considerations (AIWERM, RIWERM, DIW). This unified view clarifies when and how density ratios drive effective weighting, guiding methodological choices and future research for robust, fair, and scalable ML systems.

Abstract

Importance weighting is a fundamental procedure in statistics and machine learning that weights the objective function or probability distribution based on the importance of the instance in some sense. The simplicity and usefulness of the idea has led to many applications of importance weighting. For example, it is known that supervised learning under an assumption about the difference between the training and test distributions, called distribution shift, can guarantee statistically desirable properties through importance weighting by their density ratio. This survey summarizes the broad applications of importance weighting in machine learning and related research.
Paper Structure (23 sections, 2 theorems, 55 equations, 5 figures, 1 table)

This paper contains 23 sections, 2 theorems, 55 equations, 5 figures, 1 table.

Key Result

Theorem 1

If we assume that $P(s = 1 \mid x, y) = P(s = 1 \mid x)$, then we have under the sample selection bias.

Figures (5)

  • Figure 1: Applications of importance weighting.
  • Figure 2: Illustration of two distribution shifts. Covariate shift assumes a shift in the distribution of the input vector $\bm{x}$ and target shift assumes a shift in the distribution of the output $y$.
  • Figure 3: Illustration of different domain adaptation settings.
  • Figure 4: Left panel: the reliability diagrams of two models. Left one is well calibrated than right one. Right panel: illustrative plot of focal loss. By increasing the focal loss parameter $\gamma$, we can see that the well-classified examples are downweighted.
  • Figure 5: Left panel: decision boundaries over epochs of training. Right panel: the effect of early stopping with importance weighting. These figures are from Figure 1 and 5 of byrd2019effect, and are reprinted with permission from the authors.

Theorems & Definitions (14)

  • Definition 1: Expected error
  • Definition 2: Empirical error
  • Definition 3: Covariate Shift shimodaira2000improving
  • Definition 4: Target Shift zhang2013domain
  • Definition 5: Sample Selection Bias
  • Theorem 1: zadrozny2004learning
  • Definition 6: Subpopulation Shift
  • Definition 7: Feedback Shift
  • Definition 8: Domain Adaptation
  • Definition 9: Open-Set Domain Adaptation
  • ...and 4 more