Table of Contents
Fetching ...

Finding the boundary: Using galaxy membership to inform galaxy cluster extent through machine learning

Christine Hao, Stephanie O'Neil, Mark Vogelsberger, Vinh Tran, Lamiya Mowla, Joshua S. Speagle

TL;DR

This study uses the large-volume hydrodynamic simulation suite IllustrisTNG (specifically TNG300-1) to quantify where a galaxy's membership in the cluster versus the field transitions as a function of distance to the nearest cluster. A supervised deep neural network is trained on intrinsic galaxy properties (six primary features, with extensions to fifteen) to classify galaxies as cluster-like or field-like, yielding a probabilistic transition region characterized by a zero-point radius $r_0$ and a stacked probability profile $P(r)_{\mathrm{stack}}$ that are analyzed across cluster mass bins. The results show the transition is broad and intrinsically scattered, extending to $\sim 1{-}1.2\,R_{200,\mathrm{mean}}$ and increasing with cluster mass as $r_0 \propto M_{200,\mathrm{mean}}^{0.10}$, with dynamical properties probing deeper into the core while gas and stellar indicators vary with mass. These findings imply that conventional hard boundaries like $R_{200}$ or the splashback radius do not fully capture environmental preprocessing and highlight the value of a probabilistic, property-based boundary to study cluster environments in both simulations and observations. The work also demonstrates that categorizing galaxy properties by their underlying physics reveals distinct transition behaviours, offering a data-driven perspective on how ram-pressure stripping and related processes shape galaxy evolution in cluster outskirts.

Abstract

The spatial extent of the environment's impact on galaxies marks a transitional region between cluster and field galaxies. We present a data-driven method to identify this region in galaxy clusters with masses $M_{200\rm ,mean}>10^{13} M_{\odot}$ at $z = 0$. Using resolved galaxy samples from the largest simulation volume of IllustrisTNG (TNG300-1), we examine how galaxy properties vary as a function of distance to the closest cluster. We train neural networks to classify galaxies into cluster and field galaxies based on their intrinsic properties. Using this classifier, we present the first quantitative and probabilistic map of the transition region. It is represented as a broad and intrinsically scattered region near cluster outskirts, rather than a sharp physical boundary. This is the physical detection of a mixed population. In order to determine transition regions of different physical processes by training property-specific models, we categorise galaxy properties based on their underlying physics, i.e. gas, stellar, and dynamical. Changes to the dynamical properties dominate the innermost regions of the clusters of all masses. Stellar properties and gas properties, on the other hand, exhibit transitions at similar locations for low mass clusters, yet gas properties have transitions in the outermost regions for high mass clusters. These results have implications for cluster environmental studies in both simulations and observations, particularly in refining the definition of cluster boundaries while considering environmental preprocessing and how galaxies evolve under the effect of the cluster environment.

Finding the boundary: Using galaxy membership to inform galaxy cluster extent through machine learning

TL;DR

This study uses the large-volume hydrodynamic simulation suite IllustrisTNG (specifically TNG300-1) to quantify where a galaxy's membership in the cluster versus the field transitions as a function of distance to the nearest cluster. A supervised deep neural network is trained on intrinsic galaxy properties (six primary features, with extensions to fifteen) to classify galaxies as cluster-like or field-like, yielding a probabilistic transition region characterized by a zero-point radius and a stacked probability profile that are analyzed across cluster mass bins. The results show the transition is broad and intrinsically scattered, extending to and increasing with cluster mass as , with dynamical properties probing deeper into the core while gas and stellar indicators vary with mass. These findings imply that conventional hard boundaries like or the splashback radius do not fully capture environmental preprocessing and highlight the value of a probabilistic, property-based boundary to study cluster environments in both simulations and observations. The work also demonstrates that categorizing galaxy properties by their underlying physics reveals distinct transition behaviours, offering a data-driven perspective on how ram-pressure stripping and related processes shape galaxy evolution in cluster outskirts.

Abstract

The spatial extent of the environment's impact on galaxies marks a transitional region between cluster and field galaxies. We present a data-driven method to identify this region in galaxy clusters with masses at . Using resolved galaxy samples from the largest simulation volume of IllustrisTNG (TNG300-1), we examine how galaxy properties vary as a function of distance to the closest cluster. We train neural networks to classify galaxies into cluster and field galaxies based on their intrinsic properties. Using this classifier, we present the first quantitative and probabilistic map of the transition region. It is represented as a broad and intrinsically scattered region near cluster outskirts, rather than a sharp physical boundary. This is the physical detection of a mixed population. In order to determine transition regions of different physical processes by training property-specific models, we categorise galaxy properties based on their underlying physics, i.e. gas, stellar, and dynamical. Changes to the dynamical properties dominate the innermost regions of the clusters of all masses. Stellar properties and gas properties, on the other hand, exhibit transitions at similar locations for low mass clusters, yet gas properties have transitions in the outermost regions for high mass clusters. These results have implications for cluster environmental studies in both simulations and observations, particularly in refining the definition of cluster boundaries while considering environmental preprocessing and how galaxies evolve under the effect of the cluster environment.

Paper Structure

This paper contains 24 sections, 19 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Spatial distributions of all fifteen intrinsic galaxy properties. The colour scale indicates number density. Each panel denotes a property, as labelled on the y-axis with the corresponding units if applicable. The x-axis denotes the distance-to-closest-cluster normalised by the $R_{200,\rm mean}$ of the host. Red line denotes the median shaded by 16th-84th percentiles. The plot shows that galaxy properties correlate with galaxy-to-closest-cluster distance.
  • Figure 2: Bird’s-eye network architecture described in Section \ref{['sec:construction']}. BN refers to BatchNorm and p is the dropout percentage. All activations are ReLU except the output sigmoid, applied only at inference for metric computation.
  • Figure 3: Per-mass bin median probability profile of the fiducial model using distance-based labels ($0.5/5.0$). The shaded region represents the 16-84 percentile scattering around the median. The y-axis denotes the raw logits predicted by the model for each galaxy sample. The $x$-axis denotes distance normalised by $R_{200,\rm mean}$. Logits are first collected to make the per-cluster profile based on the host cluster of each galaxy, which is then stacked into mass bins for the final profile. This clearly demonstrates the probabilistic relationship between the normalised galaxy-to-cluster distance and galaxy membership (cluster or field).
  • Figure 4: Convergence test curve for the fiducial model architecture using varied label upper ratio (x-axis) and the predicted transition region. Each point represents the zero point and 16th-84th percentile error in the mass bin $10^{14.0} \le M_{200,\mathrm{mean}}/M_\odot < 10^{14.5}$ predicted by each model. The upper panels denote models trained using six primary properties, while models in the lower panels are trained on all properties. The first column consists of models trained using $R_{sp}$ labels, the second $R_{200,\rm mean}$, and the third $R_{200,\rm crit}$. This shows that even when we vary the set of unlabelled samples in between the lower and upper bound, the model's prediction stays within a reasonable range and does not increase with the gap. This is consistent across different radii definitions. However, the intrinsic scatter is best represented by the $R_{sp}$-labelled models as it is the dynamically motivated definition. The $R_{200,\rm mean}$ and $R_{200,\rm crit}$ panels exhibit more fluctuation.
  • Figure 5: The leftmost and the middle panels both show the dependence of transition region on host mass derived using different feature combinations. The leftmost panel is generated by models using SubLink based labels, while the middle panel represents models using the fiducial distance-based label $0.5/5.0$. The x-axis denotes the mass of the cluster, and the y-axis the normalised distance. Each mass bin median zero-point and its 16th-84th percentile scatter is mapped to the median mass in that mass bin. Comparison is made with $R_{sp}$ reference curves from literature, computed using cluster $M_{200\rm m}$. The mass dependence of $r_0$ is opposite to that of both estimated splashback definitions. The error bars indicate intrinsic scattering. The rightmost panel shows the same mass dependence of $r_0$ from all distance-based models. The Roman numerals referenced across the leftmost and middle panels are as follows: I. Six properties, II. All properties, III. Dynamical properties, IV. Gas properties (without sSFR), V. Gas properties (with sSFR), VI. Stellar properties (without sSFR), VII. Stellar properties (with sSFR).
  • ...and 4 more figures