Table of Contents
Fetching ...

Identification of Candidate Halos Hosting Massive Black Hole Seeds in the \textit{Renaissance} Simulations with Support Vector Machines

Brandon Pries, John H. Wise

TL;DR

This study tackles the uncertain origins of supermassive black holes by using Renaissance simulations to identify halos likely to host direct collapse black holes (DCBHs) via support vector machines. The authors integrate a physically motivated feature set—encompassing halo mass, metallicity, Lyman-Werner flux, and central gas inflow—and apply a two-stage optimization (hyperparameter tuning followed by feature selection) to derive probabilistic DCBH seeding prescriptions for cosmological simulations. While performance is constrained by data imbalance and labeling ambiguity, the strongest results emerge from star-related features and 2D feature subspaces, achieving up to ~0.37 in F1 on selective subsets. The resulting SVM-based prescriptions offer a practical path to incorporating DCBH seeding into large-scale simulations at lower resolution, with implications for understanding SMBH formation and the diversity of seeding pathways.

Abstract

The nature of the origins of supermassive black holes remains uncertain. Multiple possible seeding pathways have been proposed across a variety of mass scales, each with their own strengths and weaknesses. One such channel is a direct collapse black hole (DCBH), thought to form from the deaths of supermassive stars in pristine atomic cooling halos in the early universe. In this work, we investigate the ability to identify halos likely to form a DCBH based on their properties using a support vector machine (SVM). We implement multiple methods to improve the accuracy of the model, including selecting subsets of critical features and optimizing SVM hyperparameters. We find that our best model requires quantities relevant to star formation, such as the metallicity, incident flux of Lyman-Werner radiation, and halo stellar mass. The SVMs produced from this work can serve as probabilistic and holistic seeding prescriptions for DCBHs in cosmological simulations.

Identification of Candidate Halos Hosting Massive Black Hole Seeds in the \textit{Renaissance} Simulations with Support Vector Machines

TL;DR

This study tackles the uncertain origins of supermassive black holes by using Renaissance simulations to identify halos likely to host direct collapse black holes (DCBHs) via support vector machines. The authors integrate a physically motivated feature set—encompassing halo mass, metallicity, Lyman-Werner flux, and central gas inflow—and apply a two-stage optimization (hyperparameter tuning followed by feature selection) to derive probabilistic DCBH seeding prescriptions for cosmological simulations. While performance is constrained by data imbalance and labeling ambiguity, the strongest results emerge from star-related features and 2D feature subspaces, achieving up to ~0.37 in F1 on selective subsets. The resulting SVM-based prescriptions offer a practical path to incorporating DCBH seeding into large-scale simulations at lower resolution, with implications for understanding SMBH formation and the diversity of seeding pathways.

Abstract

The nature of the origins of supermassive black holes remains uncertain. Multiple possible seeding pathways have been proposed across a variety of mass scales, each with their own strengths and weaknesses. One such channel is a direct collapse black hole (DCBH), thought to form from the deaths of supermassive stars in pristine atomic cooling halos in the early universe. In this work, we investigate the ability to identify halos likely to form a DCBH based on their properties using a support vector machine (SVM). We implement multiple methods to improve the accuracy of the model, including selecting subsets of critical features and optimizing SVM hyperparameters. We find that our best model requires quantities relevant to star formation, such as the metallicity, incident flux of Lyman-Werner radiation, and halo stellar mass. The SVMs produced from this work can serve as probabilistic and holistic seeding prescriptions for DCBHs in cosmological simulations.

Paper Structure

This paper contains 19 sections, 10 equations, 8 figures.

Figures (8)

  • Figure 1: $F_{1}$ scores for each combination of the regularization parameter $C$ (x-axes) and the class weight $w$ (y-axes) for each kernel tested (panels). The top row shows the linear, RBF, and sigmoid kernels, respectively, and the bottom row shows polynomial kernels of different polynomial orders (2, 3, 4, and 5, respectively). Most kernels show a preference for large values of $C$ and intermediate to high values of $w$.
  • Figure 2: Permutation importance rankings for each feature. The black dashed line represents no decrease in accuracy, and all features at or below the red line show an increase in performance accuracy when permuted. Increases in performance accuracy are likely due to correlated variables Mone2025 or due to effects from hyperparameter choice and the distributions of candidates vs. non-candidates in phase space (see Section \ref{['sec:model_performance']}).
  • Figure 3: Distribution of Mahalanobis distances in the full feature space relative to the mean of the candidate set for both non-candidates (filled blue) and candidates (black). Halos marked with solid lines and markers were non-candidates misclassified by at least one model and were chosen for further inspection, described in Section \ref{['sec:noncandidate_properties']}. The markers are vertically offset from each other to prevent overlap.
  • Figure 4: Decision boundary for the best dm_main model, with a 3rd-order polynomial kernel, $C = 10^{4}$, and $w = 10^{2.5}$. The yellow circles and black points correspond to candidates and non-candidates, respectively, while the orange and green markers identify the same halos described in Figure \ref{['fig:Mahalanobis_distribution']}. The yellow and purple regions show the regions of phase space that the SVM determines correspond to candidates and non-candidates, respectively, while the solid blue line shows the decision boundary and the dashed blue lines show the margins on either side of the boundary. This model uses the largest possible value for $C$, corresponding to smallest margins and few support vectors. The SVM correctly finds the region of phase space occupied by all but 1 of the candidates.
  • Figure 5: Decision boundaries for the six subspaces of feature pairs. Points and lines are the same as described in Figure \ref{['fig:decision_boundaries_dm_main']}.
  • ...and 3 more figures