Table of Contents
Fetching ...

Nature versus nurture in galaxy formation: the effect of environment on star formation with causal machine learning

Sunil Mucesh, William G. Hartley, Ciarán M. Gilligan-Lee, Ofer Lahav

TL;DR

The paper tackles the nature-versus-nurture question in galaxy formation by embedding causal inference within a physics-informed model of galaxy evolution. It uses inverse-probability weighting of marginal structural models on the IllustrisTNG simulation to quantify the causal effect of environment on star-formation rate across cosmic time, with a 3D 10th-nearest-neighbor density as the environmental measure. The results show a negative, substantial environment-induced suppression of $SFR$ at $z=0$ (up to a factor of ~100), but a positive impact at $z\gtrsim1$ (boosts by ~10x at $z\sim1$ and larger at higher redshift), highlighting a time-dependent environment role tied to galaxy downsizing. The study also demonstrates that ignoring halo-mass (nature) leads to underestimating the causal effect in intermediate-density environments by ~2x, while controlling for stellar mass at a snapshot is insufficient to disentangle nature and nurture, though stellar mass can serve as an adequate proxy for nature within the causal framework. This causal blueprint enables robust inference from dynamical systems and holds promise for applying similar methods to other complex, feedback-rich astrophysical processes and beyond.

Abstract

Understanding how galaxies form and evolve is at the heart of modern astronomy. With the advent of large-scale surveys and simulations, remarkable progress has been made in the last few decades. Despite this, the physical processes behind the phenomena, and particularly their importance, remain far from known, as correlations have primarily been established rather than the underlying causality. We address this challenge by applying the causal inference framework. Specifically, we tackle the fundamental open question of whether galaxy formation and evolution depends more on nature (i.e., internal processes) or nurture (i.e., external processes), by estimating the causal effect of environment on star-formation rate in the IllustrisTNG simulations. To do so, we develop a comprehensive causal model and employ cutting-edge techniques from epidemiology to overcome the long-standing problem of disentangling nature and nurture. We find that the causal effect is negative and substantial, with environment suppressing the SFR by a maximal factor of $\sim100$. While the overall effect at $z=0$ is negative, in the early universe, environment is discovered to have a positive impact, boosting star formation by a factor of $\sim10$ at $z\sim1$ and by even greater amounts at higher redshifts. Furthermore, we show that: (i) nature also plays an important role, as ignoring it underestimates the causal effect in intermediate-density environments by a factor of $\sim2$, (ii) controlling for the stellar mass at a snapshot in time, as is common in the literature, is not only insufficient to disentangle nature and nurture but actually has an adverse effect, though (iii) stellar mass is an adequate proxy of the effects of nature. Finally, this work may prove a useful blueprint for extracting causal insights in other fields that deal with dynamical systems with closed feedback loops, such as the Earth's climate.

Nature versus nurture in galaxy formation: the effect of environment on star formation with causal machine learning

TL;DR

The paper tackles the nature-versus-nurture question in galaxy formation by embedding causal inference within a physics-informed model of galaxy evolution. It uses inverse-probability weighting of marginal structural models on the IllustrisTNG simulation to quantify the causal effect of environment on star-formation rate across cosmic time, with a 3D 10th-nearest-neighbor density as the environmental measure. The results show a negative, substantial environment-induced suppression of at (up to a factor of ~100), but a positive impact at (boosts by ~10x at and larger at higher redshift), highlighting a time-dependent environment role tied to galaxy downsizing. The study also demonstrates that ignoring halo-mass (nature) leads to underestimating the causal effect in intermediate-density environments by ~2x, while controlling for stellar mass at a snapshot is insufficient to disentangle nature and nurture, though stellar mass can serve as an adequate proxy for nature within the causal framework. This causal blueprint enables robust inference from dynamical systems and holds promise for applying similar methods to other complex, feedback-rich astrophysical processes and beyond.

Abstract

Understanding how galaxies form and evolve is at the heart of modern astronomy. With the advent of large-scale surveys and simulations, remarkable progress has been made in the last few decades. Despite this, the physical processes behind the phenomena, and particularly their importance, remain far from known, as correlations have primarily been established rather than the underlying causality. We address this challenge by applying the causal inference framework. Specifically, we tackle the fundamental open question of whether galaxy formation and evolution depends more on nature (i.e., internal processes) or nurture (i.e., external processes), by estimating the causal effect of environment on star-formation rate in the IllustrisTNG simulations. To do so, we develop a comprehensive causal model and employ cutting-edge techniques from epidemiology to overcome the long-standing problem of disentangling nature and nurture. We find that the causal effect is negative and substantial, with environment suppressing the SFR by a maximal factor of . While the overall effect at is negative, in the early universe, environment is discovered to have a positive impact, boosting star formation by a factor of at and by even greater amounts at higher redshifts. Furthermore, we show that: (i) nature also plays an important role, as ignoring it underestimates the causal effect in intermediate-density environments by a factor of , (ii) controlling for the stellar mass at a snapshot in time, as is common in the literature, is not only insufficient to disentangle nature and nurture but actually has an adverse effect, though (iii) stellar mass is an adequate proxy of the effects of nature. Finally, this work may prove a useful blueprint for extracting causal insights in other fields that deal with dynamical systems with closed feedback loops, such as the Earth's climate.

Paper Structure

This paper contains 46 sections, 20 equations, 15 figures.

Figures (15)

  • Figure 1: Causal model of galaxy formation and evolution (technically, a causal graph of the causal structure). The nodes are variables or processes (size indicates number of connections), and the edges communicate the causes. The confounder, treatment, and outcome (light purple, green, and blue nodes, respectively) are halo mass, environment, and star-formation rate (SFR), respectively. The blue and red arrows show example causal and confounding paths, respectively. The bicoloured arrows indicates the influence of both confounder and treatment. The naming convention is as follows: any variables associated with the halo and galaxy are preceded by them, respectively. Furthermore, halo refers to the dark matter halo that hosts a galaxy, and host halo refers to the parent dark matter halo that hosts other haloes. As such, halo refers to both distinct haloes and subhaloes. Note that only the connections between variables from the construction of the model (Appendix \ref{['sec:galaxy_formation_and_evolution']}) are shown.
  • Figure 2: Causal directed acyclic graph (DAG) for determining the causal effect of environment on star-formation rate. (a) It is constructed by carefully tracing the causal chains between halo mass ($H$; confounder), environment ($E$; treatment), and star-formation rate ($SFR$; outcome) and unravelling the feedback loops over time, in the causal model of galaxy formation and evolution (Fig. \ref{['fig:causal_model_galaxy_formation_and_evolution']}). The subscripts indicate time, increasing from left to right (with zero marking the present). The causal chains of variables between the fundamental quantities are condensed for visual clarity, but more importantly, the adjustment set of only halo mass is both sufficient and necessary to estimate the causal effect of environment on SFR, according to d-separation. This DAG is the causal model. (b) Basic DAG for the relationship between environment and SFR whereby the raw correlation is the unbiased causal effect. (c) DAG of the model implicitly assumed in the literature when controlling for the stellar mass ($M_{\star}$) at a snapshot in time to disentangle nature and nurture. These naïve and traditional models are compared to our physics-informed causal model in Section \ref{['sec:model_comparison']}.
  • Figure 3: Causal dose-response curves (CDRCs) of the causal effect of environment on the star-formation rate (i.e., causal SFR–density relations) at $z=0$ and at different redshifts going back to $z \sim 3$, assuming the causal model (Fig. \ref{['fig:causal_model']}a). Specifically, they represent the average SFR of galaxies if they inhabited, on average, the specific density environment (10th nearest neighbour density) over time. The bottom panel of $z=0$ shows the average causal effects $\tau$ of different density environments (comparing to the lowest-density environment). The shaded regions represent the $68 \%$ confidence interval, estimated with bootstrapping.
  • Figure 4: Causal dose-response curves (CDRCs) of the causal effect of environment on the star-formation rate (i.e., causal SFR–density relations) at $z=0$ of the (a) naïve (orange) and (b) traditional (red) models, compared to our physics-informed causal model (blue). The naïve model (Fig. \ref{['fig:causal_model']}b) assumes no confounding and that the raw correlation between environment and SFR is the unbiased causal effect. The traditional model (Fig. \ref{['fig:causal_model']}c) is the model implicitly assumed in the literature when controlling for the stellar mass at a snapshot in time to disentangle nature and nurture. (c) Causal model (stellar mass; green) is the causal model (Fig. \ref{['fig:causal_model']}a) but with stellar mass as the time-varying confounder instead of halo mass. The inset of the centre panel compares the naïve and traditional models. The CDRCs represent the average SFR of galaxies at $z=0$ if they inhabited, on average, the specific density environment (10th nearest neighbour density) over time. Comparing the CDRC at a given density ('treatment') to a baseline, chosen to be the lowest-density environment ('no treatment'), therefore reveals the causal effect of that environment on SFR. The bottom panels shows the difference in the average SFRs between the models. The shaded regions represent the $68 \%$ confidence interval, estimated with bootstrapping.
  • Figure Extended Data Fig. 1: Distributions of fundamental halo and galaxy properties, such as host halo mass, halo mass, stellar mass, and star-formation rate (SFR), as well as the average environmental density (10th nearest neighbour density), of the galaxy sample at $z=0$. The upper triangle shows central versus satellite galaxies split.
  • ...and 10 more figures