The Third Pillar of Causal Analysis? A Measurement Perspective on Causal Representations
Dingling Yao, Shimeng Huang, Riccardo Cadei, Kun Zhang, Francesco Locatello
TL;DR
This paper reframes causal representation learning as a measurement-model problem, treating learned representations as proxy measurements of latent causal variables to enable principled evaluation of their usefulness for downstream causal tasks. It introduces the Test-based Measurement EXclusivity (T-MEX) score, a nonparametric, test-based metric that quantifies how well a learned representation aligns with an assumed measurement model by comparing conditional-independence structures. Through numerical simulations and the real-world ISTAnt ecological benchmark, the authors demonstrate that T-MEX tracks the ability of representations to yield valid causal inferences (e.g., accurate ATE estimates) and outperforms traditional metrics like R^2 and MCC in revealing causal identifiability. This measurement-model lens unifies CRL theory with task-specific assumptions and offers a practical, scalable approach to evaluate causal representations in complex, high-dimensional data.
Abstract
Causal reasoning and discovery, two fundamental tasks of causal analysis, often face challenges in applications due to the complexity, noisiness, and high-dimensionality of real-world data. Despite recent progress in identifying latent causal structures using causal representation learning (CRL), what makes learned representations useful for causal downstream tasks and how to evaluate them are still not well understood. In this paper, we reinterpret CRL using a measurement model framework, where the learned representations are viewed as proxy measurements of the latent causal variables. Our approach clarifies the conditions under which learned representations support downstream causal reasoning and provides a principled basis for quantitatively assessing the quality of representations using a new Test-based Measurement EXclusivity (T-MEX) score. We validate T-MEX across diverse causal inference scenarios, including numerical simulations and real-world ecological video analysis, demonstrating that the proposed framework and corresponding score effectively assess the identification of learned representations and their usefulness for causal downstream tasks.
