Table of Contents
Fetching ...

A Non-parametric Method for the Inference of Halo Occupation Distributions

Jacob Kennedy, Eric Gawiser, Kartheik G. Iyer, L. Y. Aaron Yung

Abstract

The galaxy-halo connection traces processes by which galaxies form and evolve. The halo occupation distribution (HOD) describes the relationship between galaxies and their host dark matter haloes. Measurements of the galaxy two-point correlation function (2PCF) allow us to extract information about the HODs of observed galaxy samples. Several parametric HOD models have been proposed in the literature, but the choice of parameterization restricts the space of possible HODs. To resolve this issue, we introduce a non-parametric HOD fitting method in which we train an emulator to learn the mappings among the galaxy 2PCF, physical properties used to select galaxy samples, and the HOD, all obtained from simulated past lightcones constructed with the Santa Cruz semi-analytic models. Implementing this emulator within a likelihood analysis framework, we derive constraints on the HOD of a galaxy sample when provided with a measurement of its 2PCF. Using the emulator to accelerate likelihood evaluations, we test the non-parametric HOD approach on a set of 2PCFs for mock galaxy samples drawn from the TNG100-1 simulation and selected above threshold values of stellar mass and star formation rate. Our framework is able to recover TNG100-1 HODs within 0.2 dex. We use the TNG100-1 mocks to tune the reported uncertainties to estimate those expected in the analysis of observations. Comparing to parametric HOD modeling routines applied to the same mock galaxy samples, our approach consistently infers the HOD with comparable or greater precision and accuracy.

A Non-parametric Method for the Inference of Halo Occupation Distributions

Abstract

The galaxy-halo connection traces processes by which galaxies form and evolve. The halo occupation distribution (HOD) describes the relationship between galaxies and their host dark matter haloes. Measurements of the galaxy two-point correlation function (2PCF) allow us to extract information about the HODs of observed galaxy samples. Several parametric HOD models have been proposed in the literature, but the choice of parameterization restricts the space of possible HODs. To resolve this issue, we introduce a non-parametric HOD fitting method in which we train an emulator to learn the mappings among the galaxy 2PCF, physical properties used to select galaxy samples, and the HOD, all obtained from simulated past lightcones constructed with the Santa Cruz semi-analytic models. Implementing this emulator within a likelihood analysis framework, we derive constraints on the HOD of a galaxy sample when provided with a measurement of its 2PCF. Using the emulator to accelerate likelihood evaluations, we test the non-parametric HOD approach on a set of 2PCFs for mock galaxy samples drawn from the TNG100-1 simulation and selected above threshold values of stellar mass and star formation rate. Our framework is able to recover TNG100-1 HODs within 0.2 dex. We use the TNG100-1 mocks to tune the reported uncertainties to estimate those expected in the analysis of observations. Comparing to parametric HOD modeling routines applied to the same mock galaxy samples, our approach consistently infers the HOD with comparable or greater precision and accuracy.
Paper Structure (22 sections, 36 equations, 18 figures, 2 tables)

This paper contains 22 sections, 36 equations, 18 figures, 2 tables.

Figures (18)

  • Figure 1: Upper left panel: Halo mass function for haloes found in the lightcones at $0.7407<z<0.7793$. Lower left panel: The Tinker_2010 linear halo bias at $z \sim 0.76$. Upper right panel: The central, satellite, and total HODs for a galaxy sample selected with ($M_{\star}$, SFR) $\geq (10^{8.45} \, M_{\odot}, \, 10^{-2.45} \, M_{\odot} \, \text{yr}^{-1})$ from the five two-deg$^2$ lightcones. Lower right panel: The 2PCF for the same galaxy sample. The one-halo term (dashed) is computed numerically using the first term in Equation \ref{['eq:xi_efficient']} while the two-halo term (dashed-dotted) is estimated using the approximation in the second term of Equation \ref{['eq:xi_efficient']}. The two-halo term is a rescaling of the linear dark matter two-point correlation function (dotted). The linear dark matter correlation function is truncated on angular scales $\lesssim 10^{-3}$ deg after imposing the scale-dependent additive correction detailed in Section \ref{['sec:caveat_discussion']}. We emphasize that at small scales the dark matter correlation function represents a highly subdominant contribution (i.e. $\lesssim$ a few percent) to the total galaxy 2PCF.
  • Figure 2: SFR-$M_{\star}$ number density distributions of central galaxies (red), satellite galaxies (blue), and their sum (black) in the SC SAM (lower row) and TNG100-1 snapshot (bottom row). Contours containing 68% (inner) and 95% (outer) of the respective distributions are plotted overtop in each panel. The SC SAM does not resolve galaxies below $\log_{10}(M_{\star}/M_{\odot})=7$, whilst TNG100-1 has an effective minimum resolved SFR of $\log_{10}(\mathrm{SFR}/M_{\odot}\mathrm{yr}^{-1})\approx -3.5$.
  • Figure 3: Upper left panel: Training set (grey, 700 samples) and validation set (blue, 300 samples) tuples of physical galaxy property lower bound thresholds. The imposed limit on the minimum galaxy sample size manifests as a restriction on the joint maximum lower bound ($\log_{10}\left(M_{\star}\right), \log_{10}\left(\mathrm{SFR}\right)$) thresholds seen in the upper right hand corner of this panel. Upper right panel: The normalized galaxy pair counts within haloes as a function of separation distance for 40 randomly selected physical galaxy property threshold tuples in the training set. Lower left panel: The number density of central galaxies as a function of halo mass for the same 40 randomly selected physical galaxy property threshold tuples. Lower right panel: The number density of satellite galaxies as a function of halo mass for the same 40 randomly selected physical galaxy property threshold tuples.
  • Figure 4: Median fractional residuals (solid curves) and 68% confidence intervals (shaded regions) for $\log_{10}(dd^{1\text{h}})$ (upper left), $w(\theta)$ (bottom left), $\log_{10}(n_{\text{cen}})$ (upper right), and $\log_{10}(n_{\text{sat}})$ (bottom right) computed using the GP-based emulator over the 300 validation set samples. The fractional error reaches a maximum of $\sim 5\%$ across all angular separation and mass bins for the three one-dimensional histograms, indicating the GP has reliably learned the mapping, even with the limited size of the training set. The fractional error in $w(\theta)$ increases from $<3\%$ on intermediate-to-large angular scales to $\approx 5$-$10\%$ on angular scales $\lesssim 10^{-3}$ deg. Note that we have plotted the fractional residuals in $w(\theta)$ and not $\log_{10}(w(\theta))$ as they are more interpretable.
  • Figure 5: Corner plot from a single MCMC run for a SC SAM mock observation with thresholds $(\log_{10}(M_{\star}/M_{\odot})$, $\log_{10}(\mathrm{SFR}/M_{\odot} \mathrm{yr}^{-1})) \geq (8.45, -2.45)$ indicated with red solid lines. The black lines in the one-dimensional posteriors indicate the 16th (left dashed), 50th (solid), and 84th (right dashed) percentiles, with recovered values $\log_{10}(M_{\star} / M_{\odot}) = 8.45^{+0.04}_{-0.04}$ and $\log_{10}(\mathrm{SFR} / M_{\odot} \mathrm{yr}^{-1}) = -2.44^{+0.10}_{-0.12}$. The inner and outer contours in the two-dimensional posterior contain 39.3% and 86% of the samples, respectively (i.e., one- and two-$\sigma$ contours for a two-dimensional Gaussian distribution). In this example, we are able to recover the "true" thresholds well within $1 \sigma$ in both $\log_{10}(M_{\star})$ and $\log_{10}(\mathrm{SFR})$ and with high precision.
  • ...and 13 more figures