Learning the Cosmic Web: Graph-based Classification of Simulated Galaxies by their Dark Matter Environments
Dakshesh Kololgi, Krishna Naidoo, Amelie Saintonge, Ofer Lahav
TL;DR
The study tackles the challenge of robustly classifying galaxies by their dark matter cosmic web environments. It introduces a three-stage framework that combines Hessian-based T-Web labeling of the density field, a Delaunay graph representation of galaxy positions with ten node features, and a Graph Attention Network (GAT+) to predict the four environments (void, wall, filament, cluster). On IllustrisTNG-300 galaxies with $M_* > 10^9\,M_{\odot}$, the GAT+ model achieves $85\%$ test accuracy, outperforming MLP and GCN baselines, with mutual information highlighting the clustering coefficient as particularly informative. The learned embeddings reveal clearer environment separation than the raw graph metrics, and the results underscore the potential of graph-based approaches to bridge simulations and large observational surveys like DESI through domain adaptation.
Abstract
We present a novel graph-based machine learning classifier for identifying the dark matter cosmic web environments of galaxies. Large galaxy surveys offer comprehensive statistical views of how galaxy properties are shaped by large-scale structure, but this requires robust classifications of galaxies' cosmic web environments. Using stellar mass-selected IllustrisTNG-300 galaxies, we apply a three-stage, simulation-based framework to link galaxies to the total (mainly dark) underlying matter distribution. Here, we apply the following three steps: First, we assign the positions of simulated galaxies to a void, wall, filament, or cluster environment using the T-Web classification of the underlying matter distribution. Second, we construct a Delaunay triangulation of the galaxy distribution to summarise the local geometric structure with ten graph metrics for each galaxy. Third, we train a graph attention network (GAT) on each galaxy's graph metrics to predict its cosmic web environment. For galaxies with stellar mass $\mathrm{>10^9 M_{\odot}}$, our GAT+ model achieves an accuracy of $85\,\%$, outperforming graph-agnostic multilayer perceptrons and graph convolutional networks. Our results demonstrate that graph-based representations of galaxy positions provide a powerful and physically meaningful way to infer dark matter environments. We plan to apply this simulation-based graph modelling to investigate how the properties of observed galaxies from the Dark Energy Spectroscopic Instrument (DESI) survey are influenced by their dark matter environments.
