Table of Contents
Fetching ...

A Comprehensive Review of Machine Learning Advances on Data Change: A Cross-Field Perspective

Jeng-Lin Li, Chih-Fan Hsu, Ming-Ching Chang, Wei-Chao Chen

TL;DR

This paper addresses data change in machine learning by unifying domain shift and concept drift under a three-phase taxonomy consisting of problem detection, problem handling, and extended factors. It surveys state-of-the-art approaches across domain shift and concept drift, highlighting common structures such as $P_S(X,Y)$ vs $P_T(X,Y)$ distributions and the role of temporal dynamics, while detailing methods from OOD detection to test-time training and continual learning. The authors provide industrial perspectives on deployment challenges—efficiency, robustness, and usability—and illustrate real-world applications in smart manufacturing and smart healthcare, emphasizing human-in-the-loop and privacy considerations. They conclude with future directions toward temporal modeling, foundation models, data-efficient and green learning, and trustworthy AI that integrates interpretability and human collaboration in dynamic data environments.

Abstract

Recent artificial intelligence (AI) technologies show remarkable evolution in various academic fields and industries. However, in the real world, dynamic data lead to principal challenges for deploying AI models. An unexpected data change brings about severe performance degradation in AI models. We identify two major related research fields, domain shift and concept drift according to the setting of the data change. Although these two popular research fields aim to solve distribution shift and non-stationary data stream problems, the underlying properties remain similar which also encourages similar technical approaches. In this review, we regroup domain shift and concept drift into a single research problem, namely the data change problem, with a systematic overview of state-of-the-art methods in the two research fields. We propose a three-phase problem categorization scheme to link the key ideas in the two technical fields. We thus provide a novel scope for researchers to explore contemporary technical strategies, learn industrial applications, and identify future directions for addressing data change challenges.

A Comprehensive Review of Machine Learning Advances on Data Change: A Cross-Field Perspective

TL;DR

This paper addresses data change in machine learning by unifying domain shift and concept drift under a three-phase taxonomy consisting of problem detection, problem handling, and extended factors. It surveys state-of-the-art approaches across domain shift and concept drift, highlighting common structures such as vs distributions and the role of temporal dynamics, while detailing methods from OOD detection to test-time training and continual learning. The authors provide industrial perspectives on deployment challenges—efficiency, robustness, and usability—and illustrate real-world applications in smart manufacturing and smart healthcare, emphasizing human-in-the-loop and privacy considerations. They conclude with future directions toward temporal modeling, foundation models, data-efficient and green learning, and trustworthy AI that integrates interpretability and human collaboration in dynamic data environments.

Abstract

Recent artificial intelligence (AI) technologies show remarkable evolution in various academic fields and industries. However, in the real world, dynamic data lead to principal challenges for deploying AI models. An unexpected data change brings about severe performance degradation in AI models. We identify two major related research fields, domain shift and concept drift according to the setting of the data change. Although these two popular research fields aim to solve distribution shift and non-stationary data stream problems, the underlying properties remain similar which also encourages similar technical approaches. In this review, we regroup domain shift and concept drift into a single research problem, namely the data change problem, with a systematic overview of state-of-the-art methods in the two research fields. We propose a three-phase problem categorization scheme to link the key ideas in the two technical fields. We thus provide a novel scope for researchers to explore contemporary technical strategies, learn industrial applications, and identify future directions for addressing data change challenges.
Paper Structure (42 sections, 4 figures, 8 tables)

This paper contains 42 sections, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Taxonomy of the data change corresponds to this paper's sections and sub-sections. We illustrate our three-phase problem categorization scheme in different colors and icons. The topics of domain shift and concept drift are categorized into these phases.
  • Figure 2: Overview of research topics related to the data change problems. The corresponding section is written in parentheses brackets followed by the research problem. The source domain model $M_S$ is trained with the data $D_S$ and the target domain model and data are denoted as $M_T$ and $D_T$. Data from $D_t$ to $D_{t+w}$ denote the data in an observed window for concept drift. Refer to $\S$\ref{['sec:notation']} for the corresponding notations and relations.
  • Figure 3: Examples of the common drift types. The figure is adapted from the review paper BAYRAM2022108632.
  • Figure 4: Common learning schemes on the feature and label spaces. (a) Feature alignment: The encoder learns to map samples from different domains onto a shared latent space. (b) Adversarial learning: An additional discriminator is employed to distinguish between real and fake labels. (c) Teacher-student learning: Constraints are imposed on feature embeddings and classifiers. (d) Self-supervised learning: Pre-trained networks are generalized using contrastive loss to facilitate downstream tasks.