Compositional difference-in-differences
Onil Boussim
TL;DR
CoDiD addresses policy evaluations where outcomes are compositional vectors by jointly identifying effects on both totals and category shares. It hinges on a parallel growth assumption for log-quantities, which under a random utility model corresponds to parallel trends in expected utilities and yields parallel trajectories in the probability simplex; this enables point identification of counterfactual distributions and the computation of GTT, ATT, and CTT. The framework is extended to multiple pre-treatment periods with robust bounds when parallelism is questionable. An empirical application on early voting shows turnout gains and shifts in party support across the full composition, illustrating the method's practical relevance for policy analysis of multidimensional outcomes.
Abstract
Many policy evaluations involve vectors of category-specific quantities, either categorical outcomes (e.g., employment type, major choice) or compositional measures (e.g., GDP by sector, votes by party, electricity generation by source). In these settings, both intensive margins (shares) and extensive margins (totals) can matter. However, existing Difference-in-Differences (DiD) strategies typically focus only on the shares and do not jointly identify treatment effects on totals. In addition, these approaches usually lack a clear economic interpretation. I develop Compositional Difference-in-Differences (CoDiD), a new framework that identifies treatment effects on both shares and totals in a coherent way. The key assumption is parallel growth: in the absence of treatment, the log-quantities of each category would have evolved in parallel for the treated and control groups. I show that, under a random-utility discrete-choice model, this condition is equivalent to parallel trends in expected utilities, meaning that the change in average latent attractiveness for each alternative is identical across groups. Furthermore, geometrically, the counterfactual distributions (shares) follow parallel trajectories in the probability simplex. In settings with multiple time periods, I introduce a relaxation that delivers bounds when parallel growth may not hold. I illustrate the empirical relevance of the method by examining how early voting reforms affected the 2008 U.S. election.
