Automated Mixture Analysis via Structural Evaluation
Zachary T. P. Fried, Brett A. McGuire
TL;DR
The paper tackles the challenge of identifying components in complex chemical mixtures where spectral features densely populate databases. It introduces AMASE, a technique-agnostic framework that combines ML-derived molecular embeddings with a graph-based relevance ranking to infer which molecules are present, propagating evidence from detected species through embedding-space relationships. Applied to rotational spectroscopy, the approach achieves >97% accuracy across multiple mixtures and dramatically reduces manual effort, while maintaining robustness and generalizability. The work's significance lies in its potential to extend automated, rapid mixture analysis to a range of spectroscopic methods and real-time applications in astrochemistry, environmental monitoring, and related fields.
Abstract
The determination of chemical mixture components is vital to a multitude of scientific fields. Oftentimes spectroscopic methods are employed to decipher the composition of these mixtures. However, the sheer density of spectral features present in spectroscopic databases can make unambiguous assignment to individual species challenging. Yet, components of a mixture are commonly chemically related due to environmental processes or shared precursor molecules. Therefore, analysis of the chemical relevance of a molecule is important when determining which species are present in a mixture. In this paper, we combine machine-learning molecular embedding methods with a graph-based ranking system to determine the likelihood of a molecule being present in a mixture based on the other known species and/or chemical priors. By incorporating this metric in a rotational spectroscopy mixture analysis algorithm, we demonstrate that the mixture components can be identified with extremely high accuracy (>97%) in an efficient manner.
