Table of Contents
Fetching ...

Bayesian Causal Forests for Longitudinal Data: Assessing the Impact of Part-Time Work on Growth in High School Mathematics Achievement

Nathan McJames, Ann O'Shea, Andrew Parnell

TL;DR

This paper develops a longitudinal extension of Bayesian Causal Forests (LBCF) to jointly estimate individual growth trajectories in mathematics achievement and the heterogeneous causal impact of part-time work using two waves of HSLS data. The model decomposes growth into a baseline trajectory and a period-specific treatment effect, incorporating time-varying covariates, a clever propensity-score term, and missing-data handling within a Bayesian nonparametric framework based on BART ensembles. Simulation studies demonstrate strong predictive performance and reliable uncertainty quantification for growth and heterogeneous effects, outperforming standard BART, BCF, and GRF in key settings; HSLS application reveals a negative average effect of intensive part-time work on growth ($ATE \approx -0.08$) with substantial heterogeneity and a potential positive effect for students with low school belonging. These results suggest nuanced policy implications: while intensive work generally dampens growth, targeted supports or alternative activities might mitigate harms and even benefit certain subgroups; the method offers a flexible tool for analyzing growth and heterogeneity in longitudinal causal settings across education data and beyond.

Abstract

Modelling growth in student achievement is a significant challenge in the field of education. Understanding how interventions or experiences such as part-time work can influence this growth is also important. Traditional methods like difference-in-differences are effective for estimating causal effects from longitudinal data. Meanwhile, Bayesian non-parametric methods have recently become popular for estimating causal effects from single time point observational studies. However, there remains a scarcity of methods capable of combining the strengths of these two approaches to flexibly estimate heterogeneous causal effects from longitudinal data. Motivated by two waves of data from the High School Longitudinal Study, the NCES' most recent longitudinal study which tracks a representative sample of over 20,000 students in the US, our study introduces a longitudinal extension of Bayesian Causal Forests. This model allows for the flexible identification of both individual growth in mathematical ability and the effects of participation in part-time work. Simulation studies demonstrate the predictive performance and reliable uncertainty quantification of the proposed model. Results reveal the negative impact of part time work for most students, but hint at potential benefits for those students with an initially low sense of school belonging. Clear signs of a widening achievement gap between students with high and low academic achievement are also identified. Potential policy implications are discussed, along with promising areas for future research.

Bayesian Causal Forests for Longitudinal Data: Assessing the Impact of Part-Time Work on Growth in High School Mathematics Achievement

TL;DR

This paper develops a longitudinal extension of Bayesian Causal Forests (LBCF) to jointly estimate individual growth trajectories in mathematics achievement and the heterogeneous causal impact of part-time work using two waves of HSLS data. The model decomposes growth into a baseline trajectory and a period-specific treatment effect, incorporating time-varying covariates, a clever propensity-score term, and missing-data handling within a Bayesian nonparametric framework based on BART ensembles. Simulation studies demonstrate strong predictive performance and reliable uncertainty quantification for growth and heterogeneous effects, outperforming standard BART, BCF, and GRF in key settings; HSLS application reveals a negative average effect of intensive part-time work on growth () with substantial heterogeneity and a potential positive effect for students with low school belonging. These results suggest nuanced policy implications: while intensive work generally dampens growth, targeted supports or alternative activities might mitigate harms and even benefit certain subgroups; the method offers a flexible tool for analyzing growth and heterogeneity in longitudinal causal settings across education data and beyond.

Abstract

Modelling growth in student achievement is a significant challenge in the field of education. Understanding how interventions or experiences such as part-time work can influence this growth is also important. Traditional methods like difference-in-differences are effective for estimating causal effects from longitudinal data. Meanwhile, Bayesian non-parametric methods have recently become popular for estimating causal effects from single time point observational studies. However, there remains a scarcity of methods capable of combining the strengths of these two approaches to flexibly estimate heterogeneous causal effects from longitudinal data. Motivated by two waves of data from the High School Longitudinal Study, the NCES' most recent longitudinal study which tracks a representative sample of over 20,000 students in the US, our study introduces a longitudinal extension of Bayesian Causal Forests. This model allows for the flexible identification of both individual growth in mathematical ability and the effects of participation in part-time work. Simulation studies demonstrate the predictive performance and reliable uncertainty quantification of the proposed model. Results reveal the negative impact of part time work for most students, but hint at potential benefits for those students with an initially low sense of school belonging. Clear signs of a widening achievement gap between students with high and low academic achievement are also identified. Potential policy implications are discussed, along with promising areas for future research.
Paper Structure (11 sections, 8 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 11 sections, 8 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Visualisation of RMSE and PEHE metrics evaluated over 1000 replications of DGP1 for the BART, BCF, GRF, and LBCF models. In the left panel, which displays the RMSE of the $\delta_{i}$ predictions, the LBCF approach is clearly the strongest performer, with considerably lower RMSE values. In the right panel, which visualises the PEHE metrics, LBCF is again the strongest performer, but by a narrower margin.
  • Figure 2: Visualisation of bias in ATE estimates over 1000 replications of DGP2 for the gesttools, LBCF, and LTMLE models. The gesttools package, which assumes a constant treatment effect at all time points shows minimal bias. This strong performance is closely followed by the proposed LBCF model, which provides estimates for the treatment applied between Waves 1 and 2, and 2 and 3. The LTMLE estimates appear to be much more biased.
  • Figure 3: The left plot shows the posterior distribution of the average growth, while the one on the right displays a histogram of the individual $\delta_{i}$ estimates. The solid line in the left plot shows the posterior mean, while the dashed lines indicate a 95% credible interval. Substantial variability is present in the $\delta_{i}$ values, indicating that some students are predicted to increase their achievement by much more than others who may even experience a decrease in achievement.
  • Figure 4: Scatterplot of the relationship between Wave 1 achievement and predicted $\delta_{i}$ values. Students with initially high levels of academic achievement are predicted to increase their achievement by higher amounts than their peers.
  • Figure 5: The posterior distribution for the Average Treatment Effect (ATE) is shown on the left, and a histogram of the individual conditional average treatment effects is provided on the right. The solid line shows the posterior mean, while the dashed lines indicate a 95% credible interval. An interesting subgroup of students on the right tail of the histogram are predicted to benefit from part time work.
  • ...and 2 more figures