Generative structural elucidation from mass spectra as an iterative optimization problem
Mrunali Manjrekar, Runzhong Wang, Samuel Goldman, Jenna C. Fromer, Connor W. Coley
TL;DR
FOAM reframes structure elucidation from LC-MS/MS as an iterative, formula-constrained optimization problem, leveraging a graph genetic algorithm and a spectrum predictor to search chemically feasible annotations beyond fixed libraries. By maximizing spectral similarity while penalizing structural complexity via NSGA-II-based Pareto ranking, FOAM generates and refines candidate structures across generations to recover true molecules or closely related decoys. Evaluations on NIST'20 and MassSpecGym show FOAM can encounter the true structure in a substantial fraction of runs and significantly boost top-10 candidate quality when combined with existing elucidation methods, with success strongly tied to seed relevance and the accuracy of the spectral oracle. The framework is modular and extensible, enabling integration of additional context signals (e.g., retention time, biosynthetic feasibility) and uncertainty-aware selection to further improve de novo structure elucidation workflows.
Abstract
Liquid chromatography tandem mass spectrometry (LC-MS/MS) is a critical analytical technique for molecular identification across metabolomics, environmental chemistry, and chemical forensics. A variety of computational methods have emerged for structural annotation of spectral features of interest, but many of these features cannot be confidently annotated with reference structures or spectra. Here, we introduce FOAM (Formula-constrained Optimization for Annotating Metabolites), a computational workflow that poses structure elucidation from LC-MS/MS as an iterative optimization problem. FOAM couples a formula-constrained graph genetic algorithm with spectral simulation to explore candidate annotations given an experimental spectrum. We demonstrate FOAM's performance on the NIST'20 and MassSpecGym datasets as both a standalone elucidation pipeline and as a complement to existing inverse models. This work establishes iterative optimization as an effective and extensible paradigm for structural elucidation.
