Table of Contents
Fetching ...

Predicting stellar collision outcomes of main sequence stars

Pau Amaro Seoane

TL;DR

The paper develops a fast, physics-informed machine learning framework to predict stellar collision outcomes across a wide parameter space by training on ~16,000 SPH simulations. It combines a two-stage model—remnant-count classification via a random forest and mass regression via gradient-boosted trees or random forests—with physics-based feature transforms and a postprocessing step enforcing mass conservation. The approach achieves near-perfect classification (AUC ≈ 1.0) and high-precision mass predictions (R^2 ≈ 0.994, RMSE ≈ 0.02 for mass loss), operating in fractions of a second. This enables large-scale dynamical studies of galactic nuclei and SMBH fueling, with potential extensions to coupling with stellar evolution codes and transient predictions for upcoming surveys.

Abstract

Stellar collisions in dense galactic nuclei might play an important role in fueling supermassive black holes (SMBHs) and shaping their environments. The gas released during these collisions can contribute to SMBH accretion, influencing phenomena such as active galactic nuclei and tidal disruption events of the remnants. We address the challenge of rapidly and accurately predicting the outcomes of stellar collisionsincluding remnant masses and unbound gasacross a broad parameter space of initial conditions. Existing smoothed-particle-hydrodynamic (SPH) simulation techniques, while detailed, are too resource-intensive for exploratory studies or real-time applications. We develop a machine learning framework trained on a dataset of $\sim 16,000$ SPH simulations of main-sequence star collisions. By extracting physically meaningful parameters (e.g., masses, radii, impact parameters, and virial ratios) and employing gradient-boosted regression trees with Huber loss, we create a model that balances accuracy and computational efficiency. The method includes logarithmic transforms to handle dynamic ranges and regularization to ensure physical plausibility. The model achieves predictions of collision outcomes (remnant masses, and unbound mass) with very low mean absolute errors respect to the typical mass scale. It operates in fractions of a second, enabling large-scale parameter studies and real-time applications. Parameter importance analysis reveals that the impact parameter and the relative velocity dominate outcomes, aligning with theoretical expectations. Our approach provides a scalable tool for studying stellar collisions in galactic nuclei. The rapid predictions facilitate investigations into gas supply for SMBH accretion and the cumulative effects of collisions over cosmic time, particularly relevant to address the growth of SMBHs.

Predicting stellar collision outcomes of main sequence stars

TL;DR

The paper develops a fast, physics-informed machine learning framework to predict stellar collision outcomes across a wide parameter space by training on ~16,000 SPH simulations. It combines a two-stage model—remnant-count classification via a random forest and mass regression via gradient-boosted trees or random forests—with physics-based feature transforms and a postprocessing step enforcing mass conservation. The approach achieves near-perfect classification (AUC ≈ 1.0) and high-precision mass predictions (R^2 ≈ 0.994, RMSE ≈ 0.02 for mass loss), operating in fractions of a second. This enables large-scale dynamical studies of galactic nuclei and SMBH fueling, with potential extensions to coupling with stellar evolution codes and transient predictions for upcoming surveys.

Abstract

Stellar collisions in dense galactic nuclei might play an important role in fueling supermassive black holes (SMBHs) and shaping their environments. The gas released during these collisions can contribute to SMBH accretion, influencing phenomena such as active galactic nuclei and tidal disruption events of the remnants. We address the challenge of rapidly and accurately predicting the outcomes of stellar collisionsincluding remnant masses and unbound gasacross a broad parameter space of initial conditions. Existing smoothed-particle-hydrodynamic (SPH) simulation techniques, while detailed, are too resource-intensive for exploratory studies or real-time applications. We develop a machine learning framework trained on a dataset of SPH simulations of main-sequence star collisions. By extracting physically meaningful parameters (e.g., masses, radii, impact parameters, and virial ratios) and employing gradient-boosted regression trees with Huber loss, we create a model that balances accuracy and computational efficiency. The method includes logarithmic transforms to handle dynamic ranges and regularization to ensure physical plausibility. The model achieves predictions of collision outcomes (remnant masses, and unbound mass) with very low mean absolute errors respect to the typical mass scale. It operates in fractions of a second, enabling large-scale parameter studies and real-time applications. Parameter importance analysis reveals that the impact parameter and the relative velocity dominate outcomes, aligning with theoretical expectations. Our approach provides a scalable tool for studying stellar collisions in galactic nuclei. The rapid predictions facilitate investigations into gas supply for SMBH accretion and the cumulative effects of collisions over cosmic time, particularly relevant to address the growth of SMBHs.

Paper Structure

This paper contains 15 sections, 18 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Confusion matrix evaluating classification performance for stellar collision remnants, where rows represent smoothed-particle hydrodynamics (SPH) simulation truths and columns denote model predictions.
  • Figure 2: Receiver Operating Characteristic curve showing true positive rate (sensitivity) against the false positive rate (1-specificity). This shows the classifier's ability to distinguish between single and binary remnant outcomes in stellar collisions. The blue solid curve, with an area under the curve (AUC) of 1.0, represents near-perfect discrimination, approaching the ideal 90-degree elbow shape that would indicate flawless classification. Three critical decision thresholds are marked with colored circles: the default 0.5 threshold (red), the optimal threshold minimizing Euclidean distance to perfect classification (green), and Youden's threshold maximizing sensitivity-specificity difference (orange), which coincide due to the classifier's performance. The black dashed diagonal line represents random guessing (AUC=0.5), serving as a baseline for comparison. The near-vertical ascent of the blue curve followed by a sharp right-angle turn toward the upper left corner reflects the model's ability to achieve high true positive rates while maintaining minimal false positives, a characteristic of highly discriminative classifiers where clear separation often exists between merger and fly-by outcomes. This performance stems from the physics-informed feature engineering that effectively captures the fundamental differences between these collision regimes.
  • Figure 3: The parameter importance analysis reveals the relative contribution of each physical parameter to the model's classification performance, with the impact parameter $b$ exhibiting the strongest influence, followed by relative velocity $v_\infty$, secondary mass $m_2$, and primary mass $m_1$. These importance scores are determined through the Random Forest algorithm's internal metric, which quantifies how much each parameter decreases the Gini impurity across all decision trees in the ensemble. Specifically, the importance is calculated by: (1) summing the total impurity reduction achieved by splits involving each parameter across all trees, (2) normalizing these values such that their sum equals unity, and (3) averaging over all trees. This process effectively measures how frequently and decisively each parameter is used to partition the parameter space. The y-axis reflects the normalized importance metric, meaning that higher values indicate greater discriminatory power in the classification.
  • Figure 4: Comparison of predicted against true mass loss fractions ($\Delta M/M_{\text{total}}$) from stellar collision simulations, with the dashed red line indicating perfect agreement ($y = x$). Each point represents an individual collision event, colored by relative velocity (darker for higher velocities). The model achieves good accuracy across most of the parameter space, as demonstrated by the tight clustering of points along the diagonal. Deviations emerge primarily in extreme cases (upper right quadrant), where near-total disruption events exhibit underprediction of mass loss. The offset for $\Delta M/M_{\text{total}} \approx 1.0$ reflects the inherent challenge in modeling complete stellar disintegration, where nonlinear hydrodynamic effects dominate. The color gradient illustrates the velocity dependence of prediction errors. These results validate the model's physical fidelity for all but the most catastrophic encounters.
  • Figure 5: Analogous to Fig. (\ref{['fig.feature_importance']}) applied to the set of engineered physics features.
  • ...and 4 more figures