Table of Contents
Fetching ...

CaDrift: A Time-dependent Causal Generator of Drifting Data Streams

Eduardo V. L. Barboza, Jean Paul Barddal, Robert Sabourin, Rafael M. O. Cruz

TL;DR

Experimental results show that, after distributional shift events, the accuracy of classifiers tends to drop, followed by a gradual retrieval, confirming the generator's effectiveness in simulating shifts, making it a tool to evaluate methods under evolving data.

Abstract

This work presents Causal Drift Generator (CaDrift), a time-dependent synthetic data generator framework based on Structural Causal Models (SCMs). The framework produces a virtually infinite combination of data streams with controlled shift events and time-dependent data, making it a tool to evaluate methods under evolving data. CaDrift synthesizes various distributional and covariate shifts by drifting mapping functions of the SCM, which change underlying cause-and-effect relationships between features and the target. In addition, CaDrift models occasional perturbations by leveraging interventions in causal modeling. Experimental results show that, after distributional shift events, the accuracy of classifiers tends to drop, followed by a gradual retrieval, confirming the generator's effectiveness in simulating shifts. The framework has been made available on GitHub.

CaDrift: A Time-dependent Causal Generator of Drifting Data Streams

TL;DR

Experimental results show that, after distributional shift events, the accuracy of classifiers tends to drop, followed by a gradual retrieval, confirming the generator's effectiveness in simulating shifts, making it a tool to evaluate methods under evolving data.

Abstract

This work presents Causal Drift Generator (CaDrift), a time-dependent synthetic data generator framework based on Structural Causal Models (SCMs). The framework produces a virtually infinite combination of data streams with controlled shift events and time-dependent data, making it a tool to evaluate methods under evolving data. CaDrift synthesizes various distributional and covariate shifts by drifting mapping functions of the SCM, which change underlying cause-and-effect relationships between features and the target. In addition, CaDrift models occasional perturbations by leveraging interventions in causal modeling. Experimental results show that, after distributional shift events, the accuracy of classifiers tends to drop, followed by a gradual retrieval, confirming the generator's effectiveness in simulating shifts. The framework has been made available on GitHub.
Paper Structure (20 sections, 6 equations, 8 figures, 12 tables, 2 algorithms)

This paper contains 20 sections, 6 equations, 8 figures, 12 tables, 2 algorithms.

Figures (8)

  • Figure 1: Representation of a node in a causal graph and how interventions are included to the feature $x_3$.
  • Figure 2: Samples generated by CaDrift using a DAG with six nodes -- five features and one target. Each color refers to a different class.
  • Figure 3: Prequential accuracies on tested datasets. Dashed vertical lines indicate shift points. Shaded areas refer to the length of incremental and gradual shifts. Letters refer to the distributional shifts applied to the datasets. D stands for distributional, S for severe, C for covariate, and L for local shifts.
  • Figure 4: EWMA evolution with different $\alpha$ values. The lines show the raw values generated with autoregressive noise, and the impact of EWMA on the value depending on the parameter $\alpha$ assigned.
  • Figure 5: The impact of the $\alpha$ and $\rho$ variables on the lagged autocorrelation function. Each row refers to a different value for $\alpha$, and each column a different feature ($x_1$, $x_3$ and $y$).
  • ...and 3 more figures

Theorems & Definitions (2)

  • Definition 1
  • Definition 2