Table of Contents
Fetching ...

Causal Feature Learning in the Social Sciences

Jingzhou Huang, Jiuyao Lu, Alexander Williams Tolbert

TL;DR

This work tackles variable selection in causal models for social sciences by extending Causal Feature Learning (CFL) to continuous microstates through a principled binning strategy and by extending the Causal Coarsening Theorem (CCT) to continuous settings. The Extended CCT shows that the causal partition is almost surely a coarsening of the observational partition, enabling reliable macrostate construction from observational data, with approximate partitions guaranteed under mild regularity assumptions. Empirically, CFL is applied to the NSW and Voting datasets, where CFL-derived macrostates reduce dimensionality, reveal heterogeneous treatment effects (e.g., stronger gains for older individuals in NSW), and yield downstream causal insights comparable to microstate analyses. Theoretical and applied contributions demonstrate that CFL can robustly capture causal structure in social-science data, offering a scalable path for fair and interpretable causal inference in observational settings.

Abstract

Variable selection poses a significant challenge in causal modeling, particularly within the social sciences, where constructs often rely on inter-related factors such as age, socioeconomic status, gender, and race. Indeed, it has been argued that such attributes must be modeled as macro-level abstractions of lower-level manipulable features, in order to preserve the modularity assumption essential to causal inference. This paper accordingly extends the theoretical framework of Causal Feature Learning (CFL). Empirically, we apply the CFL algorithm to diverse social science datasets, evaluating how CFL-derived macrostates compare with traditional microstates in downstream modeling tasks.

Causal Feature Learning in the Social Sciences

TL;DR

This work tackles variable selection in causal models for social sciences by extending Causal Feature Learning (CFL) to continuous microstates through a principled binning strategy and by extending the Causal Coarsening Theorem (CCT) to continuous settings. The Extended CCT shows that the causal partition is almost surely a coarsening of the observational partition, enabling reliable macrostate construction from observational data, with approximate partitions guaranteed under mild regularity assumptions. Empirically, CFL is applied to the NSW and Voting datasets, where CFL-derived macrostates reduce dimensionality, reveal heterogeneous treatment effects (e.g., stronger gains for older individuals in NSW), and yield downstream causal insights comparable to microstate analyses. Theoretical and applied contributions demonstrate that CFL can robustly capture causal structure in social-science data, offering a scalable path for fair and interpretable causal inference in observational settings.

Abstract

Variable selection poses a significant challenge in causal modeling, particularly within the social sciences, where constructs often rely on inter-related factors such as age, socioeconomic status, gender, and race. Indeed, it has been argued that such attributes must be modeled as macro-level abstractions of lower-level manipulable features, in order to preserve the modularity assumption essential to causal inference. This paper accordingly extends the theoretical framework of Causal Feature Learning (CFL). Empirically, we apply the CFL algorithm to diverse social science datasets, evaluating how CFL-derived macrostates compare with traditional microstates in downstream modeling tasks.

Paper Structure

This paper contains 17 sections, 9 theorems, 37 equations, 7 figures, 3 tables.

Key Result

Theorem 1

A partition of continuous variables based purely on observational data can be refined into the partition defined by (eq:causal-equivalence); the subset of distributions for which this fails has Lebesgue measure zero.

Figures (7)

  • Figure 1: Clustering of NSW Participants Based on Education, Age, and Treatment Assignment
  • Figure 2: Distribution of Treated and Untreated Units Across Clusters with Kernel Density Estimates
  • Figure A1: Clustering of Voting Dataset Participants Based on Demographics, Baseline Political Preference, and Historical Turnout Record
  • Figure A2: Clustering of Redlining Dataset
  • Figure A3: Balance Check
  • ...and 2 more figures

Theorems & Definitions (24)

  • Theorem 1: Extended CCT, informal
  • Theorem 2: Regularity, informal
  • Definition 3.1: Microstates
  • Definition 3.2: Partitions
  • Definition 3.3: Macrostate Manipulation
  • Theorem 3: Causal Coarsening Theorem
  • Remark 3.1
  • Definition 3.4: Partitions with Binning
  • Theorem 4: Extended Causal Coarsening Theorem
  • proof : Proof Sketch
  • ...and 14 more