Table of Contents
Fetching ...

Self-Labeling in Multivariate Causality and Quantification for Adaptive Machine Learning

Yutian Ren, Aaron Haohua Yen, G. P. Li

TL;DR

The paper tackles post-deployment concept drift by extending interactive causality self-labeling to multivariate causal graphs, enabling autonomous data annotation via cause–effect relationships. It develops a dynamical-systems framework linking interaction time $t_{if}$, effect state $y_2$, and self-labeled cause state $x_{slb}$, and introduces a four-structure generalization for chain, fork, collider, and confounder graphs. It analyzes how inaccuracies in ITM and ESD propagate through self-labeling, using both analytical and dynamical-system-based DS examples, and introduces a cost-index to compare post-deployment costs across self-labeling, fully supervised, and semi-supervised approaches. Experiments in a physics-based multivariate simulation demonstrate that self-labeling maintains high performance under concept drift and non-ideal auxiliary models, often with favorable cost-performance tradeoffs. The work advances scalable, knowledge-graph–driven adaptive ML by providing theoretical foundations, robustness analyses, and practical cost considerations for self-labeling in complex causal settings.

Abstract

Adaptive machine learning (ML) aims to allow ML models to adapt to ever-changing environments with potential concept drift after model deployment. Traditionally, adaptive ML requires a new dataset to be manually labeled to tailor deployed models to altered data distributions. Recently, an interactive causality based self-labeling method was proposed to autonomously associate causally related data streams for domain adaptation, showing promising results compared to traditional feature similarity-based semi-supervised learning. Several unanswered research questions remain, including self-labeling's compatibility with multivariate causality and the quantitative analysis of the auxiliary models used in the self-labeling. The auxiliary models, the interaction time model (ITM) and the effect state detector (ESD), are vital to the success of self-labeling. This paper further develops the self-labeling framework and its theoretical foundations to address these research questions. A framework for the application of self-labeling to multivariate causal graphs is proposed using four basic causal relationships, and the impact of non-ideal ITM and ESD performance is analyzed. A simulated experiment is conducted based on a multivariate causal graph, validating the proposed theory.

Self-Labeling in Multivariate Causality and Quantification for Adaptive Machine Learning

TL;DR

The paper tackles post-deployment concept drift by extending interactive causality self-labeling to multivariate causal graphs, enabling autonomous data annotation via cause–effect relationships. It develops a dynamical-systems framework linking interaction time , effect state , and self-labeled cause state , and introduces a four-structure generalization for chain, fork, collider, and confounder graphs. It analyzes how inaccuracies in ITM and ESD propagate through self-labeling, using both analytical and dynamical-system-based DS examples, and introduces a cost-index to compare post-deployment costs across self-labeling, fully supervised, and semi-supervised approaches. Experiments in a physics-based multivariate simulation demonstrate that self-labeling maintains high performance under concept drift and non-ideal auxiliary models, often with favorable cost-performance tradeoffs. The work advances scalable, knowledge-graph–driven adaptive ML by providing theoretical foundations, robustness analyses, and practical cost considerations for self-labeling in complex causal settings.

Abstract

Adaptive machine learning (ML) aims to allow ML models to adapt to ever-changing environments with potential concept drift after model deployment. Traditionally, adaptive ML requires a new dataset to be manually labeled to tailor deployed models to altered data distributions. Recently, an interactive causality based self-labeling method was proposed to autonomously associate causally related data streams for domain adaptation, showing promising results compared to traditional feature similarity-based semi-supervised learning. Several unanswered research questions remain, including self-labeling's compatibility with multivariate causality and the quantitative analysis of the auxiliary models used in the self-labeling. The auxiliary models, the interaction time model (ITM) and the effect state detector (ESD), are vital to the success of self-labeling. This paper further develops the self-labeling framework and its theoretical foundations to address these research questions. A framework for the application of self-labeling to multivariate causal graphs is proposed using four basic causal relationships, and the impact of non-ideal ITM and ESD performance is analyzed. A simulated experiment is conducted based on a multivariate causal graph, validating the proposed theory.
Paper Structure (16 sections, 17 equations, 10 figures)

This paper contains 16 sections, 17 equations, 10 figures.

Figures (10)

  • Figure 1: An illustration of the overall procedure of self-labeling.
  • Figure 2: Four basic causal structures represented in graphical models.
  • Figure 3: An illustration of interaction time combination with (a) multiple effects, (b) multiple transient state causes, and (c) multiple steady state causes. Blue signals represent causes and red signals effects. $t_{1}$ and $t_{2}$ represent the interaction time of each cause-effect pair respectively.
  • Figure 4: DS example with (a) ITM and (b) ESD errors. The error bounds are represented by colored regions. $y_{2}$ axis is in $log$ scale.
  • Figure 5: A dynamical system example of the ITM as a data sampler with 10%, 30%, and 50% error margins. The error bounds are represented by colored regions.
  • ...and 5 more figures