Self-Labeling in Multivariate Causality and Quantification for Adaptive Machine Learning
Yutian Ren, Aaron Haohua Yen, G. P. Li
TL;DR
The paper tackles post-deployment concept drift by extending interactive causality self-labeling to multivariate causal graphs, enabling autonomous data annotation via cause–effect relationships. It develops a dynamical-systems framework linking interaction time $t_{if}$, effect state $y_2$, and self-labeled cause state $x_{slb}$, and introduces a four-structure generalization for chain, fork, collider, and confounder graphs. It analyzes how inaccuracies in ITM and ESD propagate through self-labeling, using both analytical and dynamical-system-based DS examples, and introduces a cost-index to compare post-deployment costs across self-labeling, fully supervised, and semi-supervised approaches. Experiments in a physics-based multivariate simulation demonstrate that self-labeling maintains high performance under concept drift and non-ideal auxiliary models, often with favorable cost-performance tradeoffs. The work advances scalable, knowledge-graph–driven adaptive ML by providing theoretical foundations, robustness analyses, and practical cost considerations for self-labeling in complex causal settings.
Abstract
Adaptive machine learning (ML) aims to allow ML models to adapt to ever-changing environments with potential concept drift after model deployment. Traditionally, adaptive ML requires a new dataset to be manually labeled to tailor deployed models to altered data distributions. Recently, an interactive causality based self-labeling method was proposed to autonomously associate causally related data streams for domain adaptation, showing promising results compared to traditional feature similarity-based semi-supervised learning. Several unanswered research questions remain, including self-labeling's compatibility with multivariate causality and the quantitative analysis of the auxiliary models used in the self-labeling. The auxiliary models, the interaction time model (ITM) and the effect state detector (ESD), are vital to the success of self-labeling. This paper further develops the self-labeling framework and its theoretical foundations to address these research questions. A framework for the application of self-labeling to multivariate causal graphs is proposed using four basic causal relationships, and the impact of non-ideal ITM and ESD performance is analyzed. A simulated experiment is conducted based on a multivariate causal graph, validating the proposed theory.
