Table of Contents
Fetching ...

Skewness-Robust Causal Discovery in Location-Scale Noise Models

Daniel Klippert, Alexander Marx

TL;DR

This work tackles identifiability in bivariate causal discovery under iid data by focusing on location-scale noise models (LSNMs) and showing that skewness in the noise can invalidate Gaussian-based inference. It introduces SkewD, a likelihood- and independence-testing framework that models noise with a skew-normal distribution, enabling reliable direction-finding via $X ightarrow Y$ versus $Y ightarrow X$ even when skewness is present. Parameter estimation combines a heuristic CMA-ES search with an Expectation Conditional Maximization (ECM) refinement on penalized spline representations of the mean and scale functions, with Bayesian optimization used to select regularization strengths. Empirical results on novel skew-noise datasets and established benchmarks demonstrate that SkewD is robust to high skewness, outperforming or matching state-of-the-art baselines and highlighting the practical value of explicitly modeling skewness in LSNMs for causal discovery.

Abstract

To distinguish Markov equivalent graphs in causal discovery, it is necessary to restrict the structural causal model. Crucially, we need to be able to distinguish cause $X$ from effect $Y$ in bivariate models, that is, distinguish the two graphs $X \to Y$ and $Y \to X$. Location-scale noise models (LSNMs), in which the effect $Y$ is modeled based on the cause $X$ as $Y = f(X) + g(X)N$, form a flexible class of models that is general and identifiable in most cases. Estimating these models for arbitrary noise terms $N$, however, is challenging. Therefore, practical estimators are typically restricted to symmetric distributions, such as the normal distribution. As we showcase in this paper, when $N$ is a skewed random variable, which is likely in real-world domains, the reliability of these approaches decreases. To approach this limitation, we propose SkewD, a likelihood-based algorithm for bivariate causal discovery under LSNMs with skewed noise distributions. SkewD extends the usual normal-distribution framework to the skew-normal setting, enabling reliable inference under symmetric and skewed noise. For parameter estimation, we employ a combination of a heuristic search and an expectation conditional maximization algorithm. We evaluate SkewD on novel synthetically generated datasets with skewed noise as well as established benchmark datasets. Throughout our experiments, SkewD exhibits a strong performance and, in comparison to prior work, remains robust under high skewness.

Skewness-Robust Causal Discovery in Location-Scale Noise Models

TL;DR

This work tackles identifiability in bivariate causal discovery under iid data by focusing on location-scale noise models (LSNMs) and showing that skewness in the noise can invalidate Gaussian-based inference. It introduces SkewD, a likelihood- and independence-testing framework that models noise with a skew-normal distribution, enabling reliable direction-finding via versus even when skewness is present. Parameter estimation combines a heuristic CMA-ES search with an Expectation Conditional Maximization (ECM) refinement on penalized spline representations of the mean and scale functions, with Bayesian optimization used to select regularization strengths. Empirical results on novel skew-noise datasets and established benchmarks demonstrate that SkewD is robust to high skewness, outperforming or matching state-of-the-art baselines and highlighting the practical value of explicitly modeling skewness in LSNMs for causal discovery.

Abstract

To distinguish Markov equivalent graphs in causal discovery, it is necessary to restrict the structural causal model. Crucially, we need to be able to distinguish cause from effect in bivariate models, that is, distinguish the two graphs and . Location-scale noise models (LSNMs), in which the effect is modeled based on the cause as , form a flexible class of models that is general and identifiable in most cases. Estimating these models for arbitrary noise terms , however, is challenging. Therefore, practical estimators are typically restricted to symmetric distributions, such as the normal distribution. As we showcase in this paper, when is a skewed random variable, which is likely in real-world domains, the reliability of these approaches decreases. To approach this limitation, we propose SkewD, a likelihood-based algorithm for bivariate causal discovery under LSNMs with skewed noise distributions. SkewD extends the usual normal-distribution framework to the skew-normal setting, enabling reliable inference under symmetric and skewed noise. For parameter estimation, we employ a combination of a heuristic search and an expectation conditional maximization algorithm. We evaluate SkewD on novel synthetically generated datasets with skewed noise as well as established benchmark datasets. Throughout our experiments, SkewD exhibits a strong performance and, in comparison to prior work, remains robust under high skewness.

Paper Structure

This paper contains 41 sections, 2 theorems, 37 equations, 9 figures, 8 tables, 1 algorithm.

Key Result

Proposition 1

Let $Z_1,\ldots,Z_n \overset{\text{iid}}{\sim} \mathcal{N}(\mu, \sigma^2)$. Further define the standardized random variables where Then, for $i=1,\ldots,n$, Proof It is a well-known result in asymptotic theory that $\bar{Z}_n \xrightarrow{a.s.} \mu$ and $S_n \xrightarrow{a.s.} \sigma$. Thus, and applying the Continuous Mapping Theorem yields that From the property of normal distributions, it

Figures (9)

  • Figure 1: Comparison of LSNM mean fits and confidence intervals based on the normal distribution via LOCI (left) and skew-normal distribution via SkewD (right) for pair 6 from the novel $\text{LSs}(1.750)$ dataset. The estimated residuals in the normal-model display a dependence with cause $X$, indicated by the pointy outer contour level of the kernel density estimate for small $X$, leading to a wrong inference. SkewD overcomes this limitation and infers the correct direction.
  • Figure 2: Accuracy in % for SkewD and the baselines (approaches not developed for LSNMs in gray) on the proposed ANs and LSs datasets with skewness levels $-0.455$, $0.985$ and $1.750$. SkewD consistently performs well, whereas the likelihood-based variant shows the strongest performance. IGCI-G while strongly misspecified seems to be biased toward the correct direction.
  • Figure 3: Comparison of LSNM mean fits, confidence intervals and residuals based on the normal distribution via LOCI (left in both (a) and (b)) and skew-normal distribution via SkewD (right in both (a) and (b)) for pair 6 from the novel $\text{LSs}(1.750)$ dataset.
  • Figure 4: Exemplary pairs from the ANs(-0.455) dataset. Left two pairs generated with skew-normal noise, right two pairs generated with GNO noise.
  • Figure 5: Exemplary pairs from the ANs(0.985) dataset. Left two pairs generated with skew-normal noise, right two pairs generated with GNO noise.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Proposition 1
  • Theorem 1: LOCI