Towards Causal Relationship in Indefinite Data: Baseline Model and New Datasets

Hang Chen; Xinyu Yang; Keqing Du

Towards Causal Relationship in Indefinite Data: Baseline Model and New Datasets

Hang Chen, Xinyu Yang, Keqing Du

TL;DR

This work defines Indefinite Data as jointly combining multi-structure graphs and multi-value representations and introduces two high-quality datasets, Causalogue and Causaction, to address data gaps. It proposes a probabilistic SCM baseline that decouples causal strength, representation, and confounding via a graph-attention encoder, VAE-inspired decoding, and a confounding-estimation module, enabling learning of both causal structures and causal representations under multi-structure and multi-value conditions. Extensive experiments show the proposed approach outperforms state-of-the-art baselines on structure recovery and representation learning, and demonstrates robustness under cross-distribution shifts, with synthetic disentanglement studies validating latent confounder estimation. The work provides a concrete starting point for causal inference in real-world Indefinite Data and highlights practical implications for dialogue and video analysis where causal relations are multi-faceted and variably structured.

Abstract

Integrating deep learning and causal discovery has encouraged us to spot that learning causal structures and representations in dialogue and video is full of challenges. We defined These data forms as "Indefinite Data", characterized by multi-structure data and multi-value representations. Unlike existing adaptable data forms, Indefinite Data still faces gaps in datasets and methods. To address the dataset gap, we release two high-quality datasets - Causalogue and Causaction, containing text dialogue samples and video action samples with causal annotations respectively. Moreover, the method gap arises from the coexistence of multi-structure data and multi-value representations, breaking the assumptions of all current methods and rendering them infeasible on Indefinite Data. To this end, we propose a probabilistic framework as a baseline, incorporating three designed highlights for this gap: 1) establishing Causation Condition of representations using the independence of noise terms under non-fixed causal structures, 2) treating causal strength as a latent variable and measuring the reconstruction loss in the correlation space, and 3) estimating the effects of latent confounders. These highpoints make the probabilistic model capable of overcoming challenges brought by the coexistence of multi-structure data and multi-value representations and pave the way for the extension of latent confounders. Comprehensive experiments have evaluated baseline results of causal structures, causal representations, and confounding disentanglement.

Towards Causal Relationship in Indefinite Data: Baseline Model and New Datasets

TL;DR

Abstract

Paper Structure (33 sections, 14 equations, 3 figures, 6 tables)

This paper contains 33 sections, 14 equations, 3 figures, 6 tables.

Introduction
Preliminaries
Research Gaps of Indefinite Data
Related Work
Multi-value Representation $\&$ Single-structure Data
Multi-structure Data $\&$ Single-value Representation
Method Gap
Dataset Gap
Baseline Model
Fundamental Framework
Estimation of Confounding Effect
Explanation
How to Extend to Multi-value Representation?
How to Extend to Multi-structure Data?
Implementation Example
...and 18 more sections

Figures (3)

Figure 1: An implementation example of our framework. $q_{\varphi}(z|\mathcal{X})$ predicts the causal strength from the input $X$. The predicted latent variable $z=(I-A)$, and then an causal representation decoder $p_{\theta}((x|(I-A)^{-1}E))$ learns to predict $\widehat{X}$ given the disentangled $E$ and inverse of predicted $z$.
Figure 2: 10 structures in Causalogue Dataset.
Figure 3: MSE error across all ingredients setting for estimating $C$ via GIN, LFCM, pcss, and ours.

Theorems & Definitions (4)

Definition 1: Causal representation
Definition 2: Causal structure
Definition 3: Indefinite Data
Example 1: Indefinite Data

Towards Causal Relationship in Indefinite Data: Baseline Model and New Datasets

TL;DR

Abstract

Towards Causal Relationship in Indefinite Data: Baseline Model and New Datasets

Authors

TL;DR

Abstract

Table of Contents

Figures (3)

Theorems & Definitions (4)