Compositional Models for Estimating Causal Effects
Purva Pruthi, David Jensen
TL;DR
This work addresses the challenge of estimating individual-level causal effects in heterogeneous, modular systems by introducing a compositional framework that represents each unit as an instance-specific composition of multiple components. Using modular neural network architectures, the approach learns component-wise potential outcomes and aggregates them through a structured interaction graph to obtain unit-level effects, enabling compositional generalization to unseen configurations. The paper formalizes the compositional data-generating process, defines unit- and component-level causal estimands, and establishes identifiability under ignorability, overlap, and consistency, including a tractable additive-parallel special case. Through an experimental infrastructure with real-world benchmarks (query execution, manufacturing, matrix operations) and synthetic data, the authors demonstrate improved CATE estimation, sample efficiency, and robustness to observational bias, while also analyzing how component-level data access and composition structure influence performance. Overall, compositional causal modeling offers scalable, instance-specific reasoning for structured systems and highlights both its potential and conditions under which it may not yield gains, guiding future work on interventions at the component level and broader applications.
Abstract
Many real-world systems can be usefully represented as sets of interacting components. Examples include computational systems, such as query processors and compilers, natural systems, such as cells and ecosystems, and social systems, such as families and organizations. However, current approaches to estimating potential outcomes and causal effects typically treat such systems as single units, represent them with a fixed set of variables, and assume a homogeneous data-generating process. In this work, we study a compositional approach for estimating individual-level potential outcomes and causal effects in structured systems, where each unit is represented by an instance-specific composition of multiple heterogeneous components. The compositional approach decomposes unit-level causal queries into more fine-grained queries, explicitly modeling how unit-level interventions affect component-level outcomes to generate a unit's outcome. We demonstrate this approach using modular neural network architectures and show that it provides benefits for causal effect estimation from observational data, such as accurate causal effect estimation for structured units, increased sample efficiency, improved overlap between treatment and control groups, and compositional generalization to units with unseen combinations of components. Remarkably, our results show that compositional modeling can improve the accuracy of causal estimation even when component-level outcomes are unobserved. We also create and use a set of real-world evaluation environments for the empirical evaluation of compositional approaches for causal effect estimation and demonstrate the role of composition structure, varying amounts of component-level data access, and component heterogeneity in the performance of compositional models as compared to the non-compositional approaches.
