Skewness-Robust Causal Discovery in Location-Scale Noise Models
Daniel Klippert, Alexander Marx
TL;DR
This work tackles identifiability in bivariate causal discovery under iid data by focusing on location-scale noise models (LSNMs) and showing that skewness in the noise can invalidate Gaussian-based inference. It introduces SkewD, a likelihood- and independence-testing framework that models noise with a skew-normal distribution, enabling reliable direction-finding via $X ightarrow Y$ versus $Y ightarrow X$ even when skewness is present. Parameter estimation combines a heuristic CMA-ES search with an Expectation Conditional Maximization (ECM) refinement on penalized spline representations of the mean and scale functions, with Bayesian optimization used to select regularization strengths. Empirical results on novel skew-noise datasets and established benchmarks demonstrate that SkewD is robust to high skewness, outperforming or matching state-of-the-art baselines and highlighting the practical value of explicitly modeling skewness in LSNMs for causal discovery.
Abstract
To distinguish Markov equivalent graphs in causal discovery, it is necessary to restrict the structural causal model. Crucially, we need to be able to distinguish cause $X$ from effect $Y$ in bivariate models, that is, distinguish the two graphs $X \to Y$ and $Y \to X$. Location-scale noise models (LSNMs), in which the effect $Y$ is modeled based on the cause $X$ as $Y = f(X) + g(X)N$, form a flexible class of models that is general and identifiable in most cases. Estimating these models for arbitrary noise terms $N$, however, is challenging. Therefore, practical estimators are typically restricted to symmetric distributions, such as the normal distribution. As we showcase in this paper, when $N$ is a skewed random variable, which is likely in real-world domains, the reliability of these approaches decreases. To approach this limitation, we propose SkewD, a likelihood-based algorithm for bivariate causal discovery under LSNMs with skewed noise distributions. SkewD extends the usual normal-distribution framework to the skew-normal setting, enabling reliable inference under symmetric and skewed noise. For parameter estimation, we employ a combination of a heuristic search and an expectation conditional maximization algorithm. We evaluate SkewD on novel synthetically generated datasets with skewed noise as well as established benchmark datasets. Throughout our experiments, SkewD exhibits a strong performance and, in comparison to prior work, remains robust under high skewness.
