Identifying and Characterizing Very Low Mass Spectral Blend Binaries with Machine Learning Methods
Juan Diego Draxl Giannoni, Malina Desai, Adam J. Burgasser, A. Camille Dunning, Christian Aganze, Luke McDermott, Christopher A. Theissen, Daniella C. Bardalez Gagliuffi
TL;DR
This work targets unresolved very low mass spectral blend binaries (late-M, L, and T dwarfs) by developing a hierarchical Random Forest framework to identify blended spectra and classify their components. Trained on synthetic single and binary templates derived from empirical SPLAT spectra across M6–T9 and three J-band S/N groups, the RF models outperform traditional spectral index methods in fidelity, range, and speed, achieving a binary identification recall and precision of at least $oxed{0.85}$ and median component classification errors of at most $oxed{0.1}$ subtypes, with systematic uncertainties around $oxed{1}$ subtype. The approach leverages feature importance to link spectral regions (notably CH$_4$, K I, CO, H$_2$O bands) to the presence of companions and demonstrates strong performance on synthetic tests and reasonable agreement with known binaries, though real samples with similar component types pose challenges. The study highlights practical pathways to apply these models to large-scale spectral surveys (SPHEREx, Euclid, JWST, Roman), offering scalable tools for probing VLM binary demographics and informing formation and evolution theories.
Abstract
We present an approach to identifying and characterizing unresolved, very low mass spectral blend binaries composed of late-M, L, and T dwarfs using machine learning methodologies. We generated and evaluated a series of hierarchical random forest models to distinguish spectral blends from single very low-mass dwarfs, and to classify their primary and secondary components. Models were trained on a sample of single and synthesized binary templates generated from empirical spectra. We explored various aspects of the design of our models, and find that models trained on a full range of single and binary combinations have the best performance for identification and component classification. These models achieve binary identification recall and precision of $\gtrsim$85%, median component classification errors of $\lesssim$0.1 subtypes, and systematic classification uncertainties of $\lesssim$1 subtype, outperforming index-based methods in terms of fidelity, range, and speed. Optimal performance is achieved for binaries composed of L and T dwarf primaries and late-L and T dwarf secondaries. When applied to the spectra of previously confirmed very low-mass binaries, model performance is degraded due to the prevalence of systems with similar component types, but remains high in the optimal performance range. We propose potential improvements to these models, which can be used to explore binary populations among the thousands to millions of very low-mass stars and brown dwarfs anticipated with large-scale spectral surveys such as SPHEREx and Euclid.
