Table of Contents
Fetching ...

One-Shot Multi-Label Causal Discovery in High-Dimensional Event Sequences

Hugo Math, Robin Schön, Rainer Lienhart

TL;DR

The paper tackles causal discovery in high-dimensional event sequences with thousands of event types and hundreds of labels, a setting where traditional methods are computationally infeasible. It proposes OSCAR, a one-shot causal autoregressive approach that uses two Transformer-based density estimators to estimate $P(X_i|\text{Pa}(X_i))$ and $P(Y_j|\text{Pa}(Y_j))$, allowing parallel computation of conditional mutual information $I(Y_j,X_i|\boldsymbol{Z})$. OSCAR recovers per-label Markov Boundaries and provides a causal indicator $\mathcal{C}$ for interpretability, achieving minutes-long MB discovery on a real automotive dataset with $|\mathbb{X}|=29{,}100$ and $|\mathbb{Y}|=474$. The work demonstrates substantial scalability improvements over constraint-based methods, while acknowledging limitations under causal-sufficiency and oracle-model assumptions, and outlines directions for incorporating inter-label dependencies.

Abstract

Understanding causality in event sequences with thousands of sparse event types is critical in domains such as healthcare, cybersecurity, or vehicle diagnostics, yet current methods fail to scale. We present OSCAR, a one-shot causal autoregressive method that infers per-sequence Markov Boundaries using two pretrained Transformers as density estimators. This enables efficient, parallel causal discovery without costly global CI testing. On a real-world automotive dataset with 29,100 events and 474 labels, OSCAR recovers interpretable causal structures in minutes, while classical methods fail to scale, enabling practical scientific diagnostics at production scale.

One-Shot Multi-Label Causal Discovery in High-Dimensional Event Sequences

TL;DR

The paper tackles causal discovery in high-dimensional event sequences with thousands of event types and hundreds of labels, a setting where traditional methods are computationally infeasible. It proposes OSCAR, a one-shot causal autoregressive approach that uses two Transformer-based density estimators to estimate and , allowing parallel computation of conditional mutual information . OSCAR recovers per-label Markov Boundaries and provides a causal indicator for interpretability, achieving minutes-long MB discovery on a real automotive dataset with and . The work demonstrates substantial scalability improvements over constraint-based methods, while acknowledging limitations under causal-sufficiency and oracle-model assumptions, and outlines directions for incorporating inter-label dependencies.

Abstract

Understanding causality in event sequences with thousands of sparse event types is critical in domains such as healthcare, cybersecurity, or vehicle diagnostics, yet current methods fail to scale. We present OSCAR, a one-shot causal autoregressive method that infers per-sequence Markov Boundaries using two pretrained Transformers as density estimators. This enables efficient, parallel causal discovery without costly global CI testing. On a real-world automotive dataset with 29,100 events and 474 labels, OSCAR recovers interpretable causal structures in minutes, while classical methods fail to scale, enabling practical scientific diagnostics at production scale.

Paper Structure

This paper contains 26 sections, 3 theorems, 21 equations, 8 figures, 3 tables.

Key Result

Theorem 1

If $S^k_l$ a multi-labeled sequence drawn from a dataset $D = \{S^1_l, \cdots, S^n_l\} \subset \mathbb{S}$ where two Oracle Models $\text{Tf}_x$ and $\text{Tf}_y$ were trained on, then under causal sufficiency (Aassumption:causal_sufficiency), bounded lagged effects (Aassumption:lagged_effects) and

Figures (8)

  • Figure 1: The overview of OSCAR: One-Shot multi-label Causal AutoRegressive discovery. $d$ denotes the hidden dimension, $L$ the sequence length, $\text{MB}_1, \text{MB}_2$ the Markov Boundary of $Y_1, Y_2$ respectively. All green and blue areas represent parallelised operations.
  • Figure 2: Evolution of the One-Shot Recall, Precision and F1-Score in function of the Markov Boundary length $|\textbf{MB}(Y_j)|$ using $n=45969$ samples.
  • Figure 3: Evolution of several classification metrics (one-shot) and elapsed time per sample in function of the number of samples $N$ chosen. Results are reported using a 1-sigma error bar.
  • Figure 4: Evolution of one-shot F1 Score, Precision and Recall in function of coefficient $k$. Results are reported using 1-sigma error bar.
  • Figure 5: An example of a causal graph extracted from a multi-label event sequence where $\text{MB}_1$ represents the Markov Boundary of $Y_1$ and $\text{MB}_2$ the Markov Boundary of $Y_2$.
  • ...and 3 more figures

Theorems & Definitions (11)

  • Theorem 1: Markov Boundary Identification in Event Sequences
  • Definition 1: Bayesian Network
  • Definition 2: Faithfulness
  • Definition 3: Markov Boundary
  • Definition 4: Conditional Independence
  • Lemma 1: Identifiability of $\mathbb{G}$
  • Lemma 2: Markov Boundary Equivalence
  • proof
  • proof
  • proof
  • ...and 1 more