Causal Feature Learning in the Social Sciences
Jingzhou Huang, Jiuyao Lu, Alexander Williams Tolbert
TL;DR
This work tackles variable selection in causal models for social sciences by extending Causal Feature Learning (CFL) to continuous microstates through a principled binning strategy and by extending the Causal Coarsening Theorem (CCT) to continuous settings. The Extended CCT shows that the causal partition is almost surely a coarsening of the observational partition, enabling reliable macrostate construction from observational data, with approximate partitions guaranteed under mild regularity assumptions. Empirically, CFL is applied to the NSW and Voting datasets, where CFL-derived macrostates reduce dimensionality, reveal heterogeneous treatment effects (e.g., stronger gains for older individuals in NSW), and yield downstream causal insights comparable to microstate analyses. Theoretical and applied contributions demonstrate that CFL can robustly capture causal structure in social-science data, offering a scalable path for fair and interpretable causal inference in observational settings.
Abstract
Variable selection poses a significant challenge in causal modeling, particularly within the social sciences, where constructs often rely on inter-related factors such as age, socioeconomic status, gender, and race. Indeed, it has been argued that such attributes must be modeled as macro-level abstractions of lower-level manipulable features, in order to preserve the modularity assumption essential to causal inference. This paper accordingly extends the theoretical framework of Causal Feature Learning (CFL). Empirically, we apply the CFL algorithm to diverse social science datasets, evaluating how CFL-derived macrostates compare with traditional microstates in downstream modeling tasks.
