Table of Contents
Fetching ...

BASILISK III. Stress-testing the Conditional Luminosity Function model

Kaustav Mitra, Frank C. van den Bosch

TL;DR

The study probes whether the standard Conditional Luminosity Function (CLF) parametrization adequately captures small-scale galaxy–halo connections, using Basilisk to compare the standard CLF against six flexible variants on mock SDSS-like data and real SDSS DR7 data. Mock tests show unbiased recovery of the underlying galaxy–halo relation across models, with Bayes factors favoring the simplest model. Applying to SDSS DR7 reveals strong data-driven need for extensions, particularly mass-dependence in the satellite faint-end slope $\alpha_s$ and in the satellite cutoff $\Delta_s$, elevating certain models (notably E) as preferred, while excessive flexibility (F) is disfavored. The analysis also shows that halo occupation statistics (HODs) are remarkably robust across CLF variants, though central/BHG impurities can bias interpretations; this highlights the importance of accounting for priors and impurities in empirical galaxy–halo studies and suggests potential gains from non-parametric CLF approaches with future data (e.g., DESI).

Abstract

The Conditional Luminosity Function (CLF) is an effective and flexible way of characterizing the galaxy-halo connection. However, it is subject to a particular choice for its parametrization, which acts as a prior assumption. Most studies have been restricted to what has become a standard CLF parametrization with little to no variation. The goal of this paper is to investigate whether this model is sufficient to fully characterize the small-scale data extracted from spectroscopic surveys and to gauge how adding or removing degrees of freedom impact the inference regarding the galaxy-halo connection. After extensive validation with realistic mock data, we use Basilisk, a highly constraining Bayesian hierarchical tool to model the kinematics and abundance of satellite galaxies, to test the standard CLF model against a slew of more flexible variants. In particular, we test whether the SDSS data favour any of these variants in terms of a goodness-of-fit improvement, and identify the models that are sufficiently flexible, beyond which additional model freedom is not demanded by the data. We show that some of these additional degrees of freedom, which have hitherto not been considered, result in a drastic improvement of the fit and cause significant changes in the inferred galaxy-halo connection. This highlights that an empirical model comes with an implicit prior about the parametrization form, which needs to be addressed to ensure that it is sufficiently flexible to capture the complexity of the data and to safeguard against a biased inference.

BASILISK III. Stress-testing the Conditional Luminosity Function model

TL;DR

The study probes whether the standard Conditional Luminosity Function (CLF) parametrization adequately captures small-scale galaxy–halo connections, using Basilisk to compare the standard CLF against six flexible variants on mock SDSS-like data and real SDSS DR7 data. Mock tests show unbiased recovery of the underlying galaxy–halo relation across models, with Bayes factors favoring the simplest model. Applying to SDSS DR7 reveals strong data-driven need for extensions, particularly mass-dependence in the satellite faint-end slope and in the satellite cutoff , elevating certain models (notably E) as preferred, while excessive flexibility (F) is disfavored. The analysis also shows that halo occupation statistics (HODs) are remarkably robust across CLF variants, though central/BHG impurities can bias interpretations; this highlights the importance of accounting for priors and impurities in empirical galaxy–halo studies and suggests potential gains from non-parametric CLF approaches with future data (e.g., DESI).

Abstract

The Conditional Luminosity Function (CLF) is an effective and flexible way of characterizing the galaxy-halo connection. However, it is subject to a particular choice for its parametrization, which acts as a prior assumption. Most studies have been restricted to what has become a standard CLF parametrization with little to no variation. The goal of this paper is to investigate whether this model is sufficient to fully characterize the small-scale data extracted from spectroscopic surveys and to gauge how adding or removing degrees of freedom impact the inference regarding the galaxy-halo connection. After extensive validation with realistic mock data, we use Basilisk, a highly constraining Bayesian hierarchical tool to model the kinematics and abundance of satellite galaxies, to test the standard CLF model against a slew of more flexible variants. In particular, we test whether the SDSS data favour any of these variants in terms of a goodness-of-fit improvement, and identify the models that are sufficiently flexible, beyond which additional model freedom is not demanded by the data. We show that some of these additional degrees of freedom, which have hitherto not been considered, result in a drastic improvement of the fit and cause significant changes in the inferred galaxy-halo connection. This highlights that an empirical model comes with an implicit prior about the parametrization form, which needs to be addressed to ensure that it is sufficiently flexible to capture the complexity of the data and to safeguard against a biased inference.

Paper Structure

This paper contains 24 sections, 43 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: The panels in the lower-left triangle comprise the complete depiction of the satellite kinematics data, along with their mutual dependencies. The histograms along the diagonal indicate, from top-left to bottom-right, the distributions of the redshift of the primaries ($z_{\rm pri}$), the primary luminsoties ($L_{\rm pri}$), the line-of-sight velocity differences of primary-secondary pairs ($\Delta V$), and their projected separations ($R_{\rm p}$). The $(z_{\rm pri},L_{\rm pri})$ co-dependence is the form of a 2D binned histogram where the bins are color-coded according to the mean number of secondaries in each bin. The other co-dependencies are in the forms of scatter plots, discussed further in the text. The most illuminating is the $(L_{\rm pri}, \Delta V)$ scatter plot, clearly showing that the velocity dispersion is higher for more luminous primaries, indicating that they reside in more massive halos. The panel in the top-right corner shows the complementary information that is not captured in the satellite kinematics data. Here, each line represents the fraction of all primaries in a redshift bin that has zero detected secondaries, as a function of the primary galaxy luminosity. This information is contained in the satellite abundance data vector, ${\bf D}_{\rm Ns}$, which is also modelled in Basilisk's framework.
  • Figure 1: Illustration of the method used to computing the evidence using a linear transformation of the posterior distribution of parameters. Top-left: The posterior distribution of two arbitrarily chosen parameters (specifically $\log M_1$ and $\log L_0$) from the MCMC chain obtained fitting the mock data as described in Section \ref{['sec:mock_validation']}. Top-right: The same posterior distribution, but this time shown along two arbitrarily chosen eigen-axes in the linearly transformed space, showing a random projection of a $N$-dimensional hyper-sphere. Bottom: The points with error-bars show the mean and standard deviations of the log likelihoods in radial bins (or hyper-shells) in the linearly transformed space. The black shaded region (almost a line) is the 95% credible interval of a quadratic fit to the points. The Bayesian evidence is calculated by integrating this fitting function of $\ln {\cal L}(r)$ over the hyper-sphere in the transformed space .
  • Figure 2: Posterior distributions of the various parameters for the galaxy-halo connection obtained by fitting the mock survey data. Different columns correspond to different CLF models used, as indicated at the top. Note that each parameters is indicated with respect to its true value that was used to create the mock data (indicated by the vertical, gray-dashed line), and scaled by the $1 \sigma$ width of the posterior of model (F). The latter facilitates a meaningful comparison of the widths of the posterior distributions among different CLF models. Whenever a parameter is kept fixed at its true value, this is indicated by a circle; note how going from model (A) to (F), fewer and fewer model parameters are kept fixed, which causes most posterior distributions to widen. Finally, the parameter, $\beta$, shown at the bottom, represents the mean orbital velocity anisotropy of satellite galaxies. Here the gray line splits into a range that marks the 16-84 percentile range of velocity anisotropies of subhalos in individual host halos in the simulation box that was used to create the mock data. Note how, in this validation test, all parameters show unbiased recovery, irrespective of the different CLF models assumed.
  • Figure 3: The conditional luminosity function of galaxies in the mock survey data, their relative likelihood distributions, and the Bayes factor corresponding to different CLF model assumptions. The first 3 columns show the CLFs for different halo masses, as indicated at the top. Different rows show the results for the six different CLF models used by Basilisk$\,\,$ for the inference, as indicate in the left-most panels. The symbols (circles/triangles) show the true input CLF (of centrals/satellites) used to create the mock, while the shaded regions indicate the $1\sigma$ confidence intervals based on Basilisk's inference. The histograms in the rightmost column show the corresponding total log likelihood distributions of the posteriors, relative to the maximum likelihood estimate among all models tested. The values of ${\cal Z} / {\cal Z}_{\rm max}$ that are indicated in the rightmost panels indicate the Bayes Factor for the inference of each CLF model with respect to the model with highest evidence, which in this case is model (A).
  • Figure 4: Same as Fig. \ref{['fig:CLF_mock']}, but for an analysis of the SDSS DR7 data. Unlike for the mock data, here the inferred CLFs depend strongly on the CLF model used. In particular, the posteriors of the log likelihood distributions relative to the maximum likelihood estimate among all models, shown in the rightmost panels, shows two dramatic jumps going from model (B) to (C), and from model (C) to (D), indicating that a mass-dependence for $\alpha_{\rm s}$ and freedom in $\Delta_{\rm s}$ are strongly favoured by the data. The Bayes factor with respect to the model with the highest evidence, indicated in the right-most panel, suggests that model (E) is the optimal model to characterize the halo occupation statistics of SDSS galaxies.
  • ...and 4 more figures