Table of Contents
Fetching ...

Optimized HDBSCAN clustering for reconstructing the merger history of the Milky Way: applications and limitations

Andrea Sante, Andreea S. Font, Dharmesh Mistry, Sandra Ortega-Martorell, Ivan Olier

Abstract

Clustering algorithms can help reconstruct the assembly history of the Milky Way by identifying groups of stars sharing similar properties in a kinematical or chemical abundance space. Despite being promising tools, their efficiency has not yet been fully tested in a realistic cosmological framework. We investigate the effectiveness of the HDBSCAN clustering algorithm in the recovery of the progenitors of Milky Way-type galaxies, using several systems from the Auriga suite of simulations. We develop a methodology aimed at improving the efficiency of the algorithm and avoiding fragmentation: First, we use a 12-dimensional feature space including a range of chemodynamical properties and stellar ages; furthermore, we optimise the algorithm using information from the internal structure of the clusters of accreted stars. We show that our approach yields good results in terms of both purity and completeness of clusters for galaxies with different types of accretion histories. We also evaluate the decrease in efficiency due to contamination by in situ stars. While for accreted-only haloes the algorithm matches well the recovered clusters with the individual progenitors and is able to recover accretion events up to a redshift of accretion $z_{\rm acc}\sim3$, for accreted + in situ haloes it can only identify the more recent accretion events ($z_{\rm acc} < 1$). However, the purity of the identified clusters remains remarkably high even in this case. Our results suggest that HDBSCAN can efficiently identify accreted debris in Milky Way-type galaxies in realistic conditions, however, it requires careful optimization to provide valid results.

Optimized HDBSCAN clustering for reconstructing the merger history of the Milky Way: applications and limitations

Abstract

Clustering algorithms can help reconstruct the assembly history of the Milky Way by identifying groups of stars sharing similar properties in a kinematical or chemical abundance space. Despite being promising tools, their efficiency has not yet been fully tested in a realistic cosmological framework. We investigate the effectiveness of the HDBSCAN clustering algorithm in the recovery of the progenitors of Milky Way-type galaxies, using several systems from the Auriga suite of simulations. We develop a methodology aimed at improving the efficiency of the algorithm and avoiding fragmentation: First, we use a 12-dimensional feature space including a range of chemodynamical properties and stellar ages; furthermore, we optimise the algorithm using information from the internal structure of the clusters of accreted stars. We show that our approach yields good results in terms of both purity and completeness of clusters for galaxies with different types of accretion histories. We also evaluate the decrease in efficiency due to contamination by in situ stars. While for accreted-only haloes the algorithm matches well the recovered clusters with the individual progenitors and is able to recover accretion events up to a redshift of accretion , for accreted + in situ haloes it can only identify the more recent accretion events (). However, the purity of the identified clusters remains remarkably high even in this case. Our results suggest that HDBSCAN can efficiently identify accreted debris in Milky Way-type galaxies in realistic conditions, however, it requires careful optimization to provide valid results.

Paper Structure

This paper contains 20 sections, 4 equations, 13 figures, 1 table.

Figures (13)

  • Figure 1: Optimal HDBSCAN parameter configurations found by the Optuna search. Panels (a) and (b) show the results of optimizations using the V-measure and DBCV metrics, respectively. In both panels, each symbol represents a different galaxy. The top sub-panels show the configurations coloured by the fraction of particles classified as noise, while the bottom sub-panels show the same configurations coloured by the number of clusters in the resulting partition.
  • Figure 2: Density distribution of accreted star particles in the $E-L_z$ plane. The top row shows the ground-truth distribution, where particles are coloured by their progenitor group (PeakMassIndex). The middle and bottom rows show the HDBSCAN clustering results when optimized with the V-measure and DBCV metrics, respectively. In these lower panels, identified clusters are coloured according to their majority progenitor group, with noise particles shown in grey. The overplotted scatter points highlight the highest-density region of each cluster. The fraction of noise points is also included in each panel. At the bottom of each column, a legend shows the colours associated with the merger events reported in Table \ref{['tab:galaxy_sample']}.
  • Figure 3: Precision scores for the clusters identified in accreted-only haloes by the HDBSCAN models optimized with the V-measure and DBCV index metrics. For each cluster, the precision is calculated with respect to the (dominant) progenitor with the highest number of cluster members, and plotted against the lookback infall time (top) and stellar mass at infall of the progenitor (bottom). The clusters are represented as triangles if the dominant progenitor is a stellar stream at $z=0$ and by circles if a phase-mixed structure. The size of the symbols is proportional to the number of members in the cluster, and the colour scheme follows the one in Fig. \ref{['fig:IoM_accreted']} and indicates the dominant progenitor of each cluster.
  • Figure 4: Recall scores for the clusters identified in accreted-only haloes by the HDBSCAN models optimized with the V-measure and DBCV index metrics. The symbols, colours and sizes are the same as described in Fig. \ref{['fig:P_accreted']}, but now for the recall scores.
  • Figure 5: Total variance values for the progenitor galaxies appearing as dominant progenitors in the HDBSCAN clusters.
  • ...and 8 more figures