Causal Learning for Heterogeneous Subgroups Based on Nonlinear Causal Kernel Clustering
Lu Liu, Yang Tang, Kexuan Zhang, Qiyu Sun
TL;DR
This work tackles heterogeneity in causal relations across subgroups within multi-source observational data. It introduces a nonlinear Causal Kernel Clustering (CKC) framework that uses a $u$-centered sample mapping function to map samples into a high-dimensional space isomorphic to the causal graph space, enabling unbiased estimation and subgroup discovery. A nonlinear causal kernel built on the mapped representations clusters samples by their causal structure, and the authors establish space isomorphism and causal identifiability to justify the approach. Empirical results on synthetic data and real-world IOD and Boston Housing data demonstrate CKC’s ability to identify heterogeneous subgroups and enhance downstream causal learning, including early causal-warning signals. The framework provides a flexible, plug-in module for improving causal inference in the presence of distribution shifts and diverse environments.
Abstract
Due to the challenge posed by multi-source and heterogeneous data collected from diverse environments, causal relationships among features can exhibit variations influenced by different time spans, regions, or strategies. This diversity makes a single causal model inadequate for accurately representing complex causal relationships in all observational data, a crucial consideration in causal learning. To address this challenge, the nonlinear Causal Kernel Clustering method is introduced for heterogeneous subgroup causal learning, highlighting variations in causal relationships across diverse subgroups. The main component for clustering heterogeneous subgroups lies in the construction of the $u$-centered sample mapping function with the property of unbiased estimation, which assesses the differences in potential nonlinear causal relationships in various samples and supported by causal identifiability theory. Experimental results indicate that the method performs well in identifying heterogeneous subgroups and enhancing causal learning, leading to a reduction in prediction error.
