Table of Contents
Fetching ...

Causal vs. Anticausal merging of predictors

Sergio Hernan Garrido Mejia, Patrick Blöbaum, Bernhard Schölkopf, Dominik Janzing

TL;DR

The paper addresses how causal versus anticausal assumptions affect merging of predictors using the MAXENT framework, focusing on a simple setup with a binary target $Y$ and two continuous predictors $X$. It shows that when all first and second moments are observed, the causal direction yields a logistic regression predictor for $p(Y|X)$, while the anticausal direction yields Linear Discriminant Analysis, linking CMAXENT to these classical classifiers. It further investigates partial knowledge of moments and the resulting Out-Of-Variable generalisation, deriving how decision boundaries shift under incomplete information and establishing when slopes may remain equal. The work illuminates intrinsic asymmetries between causal directions in predictor merging, with implications for transfer learning, domain adaptation, and federated/mixture-of-experts settings, where causal structure informs how to combine heterogeneous predictors, formalised through $p(Y|X)$ under $CMAXENT$.

Abstract

We study the differences arising from merging predictors in the causal and anticausal directions using the same data. In particular we study the asymmetries that arise in a simple model where we merge the predictors using one binary variable as target and two continuous variables as predictors. We use Causal Maximum Entropy (CMAXENT) as inductive bias to merge the predictors, however, we expect similar differences to hold also when we use other merging methods that take into account asymmetries between cause and effect. We show that if we observe all bivariate distributions, the CMAXENT solution reduces to a logistic regression in the causal direction and Linear Discriminant Analysis (LDA) in the anticausal direction. Furthermore, we study how the decision boundaries of these two solutions differ whenever we observe only some of the bivariate distributions implications for Out-Of-Variable (OOV) generalisation.

Causal vs. Anticausal merging of predictors

TL;DR

The paper addresses how causal versus anticausal assumptions affect merging of predictors using the MAXENT framework, focusing on a simple setup with a binary target and two continuous predictors . It shows that when all first and second moments are observed, the causal direction yields a logistic regression predictor for , while the anticausal direction yields Linear Discriminant Analysis, linking CMAXENT to these classical classifiers. It further investigates partial knowledge of moments and the resulting Out-Of-Variable generalisation, deriving how decision boundaries shift under incomplete information and establishing when slopes may remain equal. The work illuminates intrinsic asymmetries between causal directions in predictor merging, with implications for transfer learning, domain adaptation, and federated/mixture-of-experts settings, where causal structure informs how to combine heterogeneous predictors, formalised through under .

Abstract

We study the differences arising from merging predictors in the causal and anticausal directions using the same data. In particular we study the asymmetries that arise in a simple model where we merge the predictors using one binary variable as target and two continuous variables as predictors. We use Causal Maximum Entropy (CMAXENT) as inductive bias to merge the predictors, however, we expect similar differences to hold also when we use other merging methods that take into account asymmetries between cause and effect. We show that if we observe all bivariate distributions, the CMAXENT solution reduces to a logistic regression in the causal direction and Linear Discriminant Analysis (LDA) in the anticausal direction. Furthermore, we study how the decision boundaries of these two solutions differ whenever we observe only some of the bivariate distributions implications for Out-Of-Variable (OOV) generalisation.
Paper Structure (24 sections, 13 theorems, 44 equations, 3 figures)

This paper contains 24 sections, 13 theorems, 44 equations, 3 figures.

Key Result

Proposition 0

Using the Lagrange multiplier formalism for the optimisation problems in eq:causalMarginalOptimisastioneq:causalConditionalOptimisastion we obtain: (i) a multivariate Gaussian distribution for $P(\mathbf{X})$, and (ii) the density of $Y$ conditioned on $\mathbf{X}$ given by where $\alpha(\mathbf{x})$ is a normalising constant. The density can be written as

Figures (3)

  • Figure 1: Causal graphs analysed throughout the article
  • Figure 2: Decision boundaries of the solution of CMAXENT in the causal (left) and anticausal (right) direction when we do not have the covariance between the predictor variables $\bar{s}_{1,2}$.
  • Figure 3: Graph in the causal and anticausal direction

Theorems & Definitions (21)

  • Proposition 0: Resulting predictor in the causal direction
  • Remark 1
  • Proposition 1: Resulting predictor in the anticausal direction
  • Remark 2
  • Theorem 3: Predictor of $Y$ using Bayes' rule
  • Corollary 4: Quadratic Discriminant Analysis (QDA)
  • Corollary 5: Exponential family discriminant analysis
  • Remark 6
  • Proposition 6: Normal vector to the decision boundaries in causal and anticausal direction
  • Theorem 7: Slope of the decision boundary is the same in causal and anticausal direction
  • ...and 11 more