Geometric deep learning for galaxy-halo connection: a case study for galaxy intrinsic alignments

Yesukhei Jagvaral; Francois Lanusse; Rachel Mandelbaum

Geometric deep learning for galaxy-halo connection: a case study for galaxy intrinsic alignments

Yesukhei Jagvaral, Francois Lanusse, Rachel Mandelbaum

TL;DR

This work addresses intrinsic alignments as a major systematic in weak lensing by building a geometry-aware emulator for the galaxy-halo connection. It combines a conditional score-based diffusion model on mixed Euclidean and SO(3) data with an $E(3)$-equivariant graph neural network to jointly model galaxy scalar properties and 3D orientations within halos, conditioned on the tidal field. Trained and validated against IllustrisTNG-100, the generated samples reproduce the joint scalar distributions and IA statistics (ellipticity–direction correlations and projected $w_{g+}$) with good statistical fidelity across scales and subpopulations. This diffusion-geometric framework offers a scalable, physically informed approach to producing realistic mock catalogs for next-generation surveys like Rubin/LSST, aiding IA mitigation and pipeline validation.

Abstract

Forthcoming cosmological imaging surveys, such as the Rubin Observatory LSST, require large-scale simulations encompassing realistic galaxy populations for a variety of scientific applications. Of particular concern is the phenomenon of intrinsic alignments (IA), whereby galaxies orient themselves towards overdensities, potentially introducing significant systematic biases in weak gravitational lensing analyses if they are not properly modeled. Due to computational constraints, simulating the intricate details of galaxy formation and evolution relevant to IA across vast volumes is impractical. As an alternative, we propose a Deep Generative Model trained on the IllustrisTNG-100 simulation to sample 3D galaxy shapes and orientations to accurately reproduce intrinsic alignments along with correlated scalar features. We model the cosmic web as a set of graphs, each graph representing a halo with nodes representing the subhalos/galaxies. The architecture consists of a SO(3) $\times$ $\mathbb{R}^n$ diffusion generative model, for galaxy orientations and $n$ scalars, implemented with E(3) equivariant Graph Neural Networks that explicitly respect the Euclidean symmetries of our Universe. The model is able to learn and predict features such as galaxy orientations that are statistically consistent with the reference simulation. Notably, our model demonstrates the ability to jointly model Euclidean-valued scalars (galaxy sizes, shapes, and colors) along with non-Euclidean valued SO(3) quantities (galaxy orientations) that are governed by highly complex galactic physics at non-linear scales.

Geometric deep learning for galaxy-halo connection: a case study for galaxy intrinsic alignments

TL;DR

-equivariant graph neural network to jointly model galaxy scalar properties and 3D orientations within halos, conditioned on the tidal field. Trained and validated against IllustrisTNG-100, the generated samples reproduce the joint scalar distributions and IA statistics (ellipticity–direction correlations and projected

) with good statistical fidelity across scales and subpopulations. This diffusion-geometric framework offers a scalable, physically informed approach to producing realistic mock catalogs for next-generation surveys like Rubin/LSST, aiding IA mitigation and pipeline validation.

Abstract

diffusion generative model, for galaxy orientations and

scalars, implemented with E(3) equivariant Graph Neural Networks that explicitly respect the Euclidean symmetries of our Universe. The model is able to learn and predict features such as galaxy orientations that are statistically consistent with the reference simulation. Notably, our model demonstrates the ability to jointly model Euclidean-valued scalars (galaxy sizes, shapes, and colors) along with non-Euclidean valued SO(3) quantities (galaxy orientations) that are governed by highly complex galactic physics at non-linear scales.

Paper Structure (23 sections, 40 equations, 5 figures, 2 tables)

This paper contains 23 sections, 40 equations, 5 figures, 2 tables.

Introduction
Gravitational weak lensing and Intrinsic Alignments
Problem Statement
Methodology
Score-based Diffusion for SO(3) and Euclidean data
The (Stein) score function
E(3) GNN: 3D Euclidean group equivariant graph neural networks
Graph construction
Astrophysical data products and observables
The Cosmological Simulation
Tidal field
Shapes and Orientation of Halos and Galaxies
Two-point estimators
Density-Orientation Correlation Functions in 3D
Density-Shape Correlation Functions in 2D
...and 8 more sections

Figures (5)

Figure 1: Corner plot of correlations and 1D histograms between scalar quantities in the true TNG training sample versus the generated testing sample. The two joint distributions visually match well, and quantitative metrics are provided in Table \ref{['ks-table']}.
Figure 2: Ellipticity-Direction (ED) correlation function of galaxies (top row) and DM subhalos (bottom) for the whole sample. The ratio of the Generated ED function to the TNG ED function is shown in Figure \ref{['F:ED_ratio_reduced']} along with the same ratio for subsamples. The errorbars are shown for only 1 curve for visual clarity. We see an agreement between the true and generated values across all panels and on all scales.
Figure 3: Ratio of the Ellipticity-Direction (ED) correlation function between the entire generated and TNG samples (black) and representative subsamples (colors indicated in legend). The top (bottom) panel shows results for the longest (shortest) axis; we do not show results for the intermediate axis due to its very low signal. The errorbars on the subsample curves were horizontally shifted by 3% for visual clarity. These ratio curves exhibit consistency with 1 given the statistical error of the measurements.
Figure 4: Projected two-point correlation functions $w_{g+}$ of all galaxy positions and projected 2D ellipticities of (sub)samples. For visual clarity we only show the errorbars on a single measurement. The left panel contains measurements for all galaxies and for central and satellite subsamples. The center (right) panel contains measurements for samples split by mass (morphology). The measurements with generated quantities show consistency across all scales with the true TNG measurements.
Figure 5: Projected two-point correlation functions $w_{g+}$ of galaxy positions and the projected 2D ellipticities of train-test split samples. The generated curves follow the TNG measurements closely, showing no signs of overfitting.

Geometric deep learning for galaxy-halo connection: a case study for galaxy intrinsic alignments

TL;DR

Abstract

Geometric deep learning for galaxy-halo connection: a case study for galaxy intrinsic alignments

Authors

TL;DR

Abstract

Table of Contents

Figures (5)