A Comprehensive Review of Machine Learning Advances on Data Change: A Cross-Field Perspective
Jeng-Lin Li, Chih-Fan Hsu, Ming-Ching Chang, Wei-Chao Chen
TL;DR
This paper addresses data change in machine learning by unifying domain shift and concept drift under a three-phase taxonomy consisting of problem detection, problem handling, and extended factors. It surveys state-of-the-art approaches across domain shift and concept drift, highlighting common structures such as $P_S(X,Y)$ vs $P_T(X,Y)$ distributions and the role of temporal dynamics, while detailing methods from OOD detection to test-time training and continual learning. The authors provide industrial perspectives on deployment challenges—efficiency, robustness, and usability—and illustrate real-world applications in smart manufacturing and smart healthcare, emphasizing human-in-the-loop and privacy considerations. They conclude with future directions toward temporal modeling, foundation models, data-efficient and green learning, and trustworthy AI that integrates interpretability and human collaboration in dynamic data environments.
Abstract
Recent artificial intelligence (AI) technologies show remarkable evolution in various academic fields and industries. However, in the real world, dynamic data lead to principal challenges for deploying AI models. An unexpected data change brings about severe performance degradation in AI models. We identify two major related research fields, domain shift and concept drift according to the setting of the data change. Although these two popular research fields aim to solve distribution shift and non-stationary data stream problems, the underlying properties remain similar which also encourages similar technical approaches. In this review, we regroup domain shift and concept drift into a single research problem, namely the data change problem, with a systematic overview of state-of-the-art methods in the two research fields. We propose a three-phase problem categorization scheme to link the key ideas in the two technical fields. We thus provide a novel scope for researchers to explore contemporary technical strategies, learn industrial applications, and identify future directions for addressing data change challenges.
