Table of Contents
Fetching ...

Identifying and Characterizing Very Low Mass Spectral Blend Binaries with Machine Learning Methods

Juan Diego Draxl Giannoni, Malina Desai, Adam J. Burgasser, A. Camille Dunning, Christian Aganze, Luke McDermott, Christopher A. Theissen, Daniella C. Bardalez Gagliuffi

TL;DR

This work targets unresolved very low mass spectral blend binaries (late-M, L, and T dwarfs) by developing a hierarchical Random Forest framework to identify blended spectra and classify their components. Trained on synthetic single and binary templates derived from empirical SPLAT spectra across M6–T9 and three J-band S/N groups, the RF models outperform traditional spectral index methods in fidelity, range, and speed, achieving a binary identification recall and precision of at least $oxed{0.85}$ and median component classification errors of at most $oxed{0.1}$ subtypes, with systematic uncertainties around $oxed{1}$ subtype. The approach leverages feature importance to link spectral regions (notably CH$_4$, K I, CO, H$_2$O bands) to the presence of companions and demonstrates strong performance on synthetic tests and reasonable agreement with known binaries, though real samples with similar component types pose challenges. The study highlights practical pathways to apply these models to large-scale spectral surveys (SPHEREx, Euclid, JWST, Roman), offering scalable tools for probing VLM binary demographics and informing formation and evolution theories.

Abstract

We present an approach to identifying and characterizing unresolved, very low mass spectral blend binaries composed of late-M, L, and T dwarfs using machine learning methodologies. We generated and evaluated a series of hierarchical random forest models to distinguish spectral blends from single very low-mass dwarfs, and to classify their primary and secondary components. Models were trained on a sample of single and synthesized binary templates generated from empirical spectra. We explored various aspects of the design of our models, and find that models trained on a full range of single and binary combinations have the best performance for identification and component classification. These models achieve binary identification recall and precision of $\gtrsim$85%, median component classification errors of $\lesssim$0.1 subtypes, and systematic classification uncertainties of $\lesssim$1 subtype, outperforming index-based methods in terms of fidelity, range, and speed. Optimal performance is achieved for binaries composed of L and T dwarf primaries and late-L and T dwarf secondaries. When applied to the spectra of previously confirmed very low-mass binaries, model performance is degraded due to the prevalence of systems with similar component types, but remains high in the optimal performance range. We propose potential improvements to these models, which can be used to explore binary populations among the thousands to millions of very low-mass stars and brown dwarfs anticipated with large-scale spectral surveys such as SPHEREx and Euclid.

Identifying and Characterizing Very Low Mass Spectral Blend Binaries with Machine Learning Methods

TL;DR

This work targets unresolved very low mass spectral blend binaries (late-M, L, and T dwarfs) by developing a hierarchical Random Forest framework to identify blended spectra and classify their components. Trained on synthetic single and binary templates derived from empirical SPLAT spectra across M6–T9 and three J-band S/N groups, the RF models outperform traditional spectral index methods in fidelity, range, and speed, achieving a binary identification recall and precision of at least and median component classification errors of at most subtypes, with systematic uncertainties around subtype. The approach leverages feature importance to link spectral regions (notably CH, K I, CO, HO bands) to the presence of companions and demonstrates strong performance on synthetic tests and reasonable agreement with known binaries, though real samples with similar component types pose challenges. The study highlights practical pathways to apply these models to large-scale spectral surveys (SPHEREx, Euclid, JWST, Roman), offering scalable tools for probing VLM binary demographics and informing formation and evolution theories.

Abstract

We present an approach to identifying and characterizing unresolved, very low mass spectral blend binaries composed of late-M, L, and T dwarfs using machine learning methodologies. We generated and evaluated a series of hierarchical random forest models to distinguish spectral blends from single very low-mass dwarfs, and to classify their primary and secondary components. Models were trained on a sample of single and synthesized binary templates generated from empirical spectra. We explored various aspects of the design of our models, and find that models trained on a full range of single and binary combinations have the best performance for identification and component classification. These models achieve binary identification recall and precision of 85%, median component classification errors of 0.1 subtypes, and systematic classification uncertainties of 1 subtype, outperforming index-based methods in terms of fidelity, range, and speed. Optimal performance is achieved for binaries composed of L and T dwarf primaries and late-L and T dwarf secondaries. When applied to the spectra of previously confirmed very low-mass binaries, model performance is degraded due to the prevalence of systems with similar component types, but remains high in the optimal performance range. We propose potential improvements to these models, which can be used to explore binary populations among the thousands to millions of very low-mass stars and brown dwarfs anticipated with large-scale spectral surveys such as SPHEREx and Euclid.

Paper Structure

This paper contains 16 sections, 4 equations, 12 figures.

Figures (12)

  • Figure 1: $J$-band signal-to-noise (S/N) vs spectral type of our empirical spectral sample. Each individual spectrum is plotted and color-coded by their assignment to low (S/N $<$ 50, light blue), mid (50 $\leq$ S/N $<$ 100, medium blue), and high (S/N $\geq$ 100, dark blue) S/N groups. We show the marginalized distributions of the spectral type along the top axis, and for S/N along the right axis.
  • Figure 2: Example of a synthetic binary template with S/N = 116 (black line), constructed from a S/N = 124 L5 dwarf spectrum (magenta line) and a S/N = 99 T5 dwarf spectrum (blue line). The fluxes of the single components are scaled relative to each other using the 2012ApJS..201...19D$M_J$/spectral type relation, and to the combined spectrum which is normalized between 1.20--1.35 $\mu$m. The uncertainty for the combined light spectrum is shown as a dash-dot line, and the vertical gray bands indicate regions of strong telluric absorption.
  • Figure 3: Recall for the index-based methods of B10 (left) and B14 (right) as a function of component classifications. Only those combinations whose combined-light spectra match the applicable spectral range of these methods (L8--T4 for B10, M8--L8 for B14; black boxes) are shown. Each box represents an average of recall across all S/N groups.
  • Figure 4: Receiver operating characteristic (ROC) curves for the nine binary identification (BId) models examined in this study, comparing the true positive rate (true binaries) to the false positive rate (singles identified as binaries) as a function of detection threshold. The threshold for binary identification, computed as the fraction of decision trees identifying a spectrum as binary, was allowed to vary from 0 to 1. Random selection would follow the dashed line, while the computed curves indicate high fidelity in identifying true binaries. The legend indicates the color, point style, and line style corresponding to the given BId model, and lists the area under the curve (AUC) metric for each model. Larger symbols correspond to a detection threshold of 0.5, corresponding to majority vote.
  • Figure 5: Confusion matrices for each of the nine binary identification (BId) models examined in this study, comparing actual binary status (rows) with predicted binary status (columns). Numbers in each square panel indicate the fraction and number of templates corresponding to each condition, while the shading indicates the fraction along a row from 0% (white) to 100% (black).
  • ...and 7 more figures