Table of Contents
Fetching ...

Small molecule retrieval from tandem mass spectrometry: what are we optimizing for?

Gaetan De Waele, Marek Wydmuch, Krzysztof Dembczyński, Wojciech Kotłowski, Willem Waegeman

TL;DR

This study investigates commonly used loss functions, deriving novel regret bounds that characterize when Bayes-optimal decisions for these objectives must diverge, and reveals a fundamental trade-off between the two objectives of fingerprint similarity and molecular retrieval.

Abstract

One of the central challenges in the computational analysis of liquid chromatography-tandem mass spectrometry (LC-MS/MS) data is to identify the compounds underlying the output spectra. In recent years, this problem is increasingly tackled using deep learning methods. A common strategy involves predicting a molecular fingerprint vector from an input mass spectrum, which is then used to search for matches in a chemical compound database. While various loss functions are employed in training these predictive models, their impact on model performance remains poorly understood. In this study, we investigate commonly used loss functions, deriving novel regret bounds that characterize when Bayes-optimal decisions for these objectives must diverge. Our results reveal a fundamental trade-off between the two objectives of (1) fingerprint similarity and (2) molecular retrieval. Optimizing for more accurate fingerprint predictions typically worsens retrieval results, and vice versa. Our theoretical analysis shows this trade-off depends on the similarity structure of candidate sets, providing guidance for loss function and fingerprint selection.

Small molecule retrieval from tandem mass spectrometry: what are we optimizing for?

TL;DR

This study investigates commonly used loss functions, deriving novel regret bounds that characterize when Bayes-optimal decisions for these objectives must diverge, and reveals a fundamental trade-off between the two objectives of fingerprint similarity and molecular retrieval.

Abstract

One of the central challenges in the computational analysis of liquid chromatography-tandem mass spectrometry (LC-MS/MS) data is to identify the compounds underlying the output spectra. In recent years, this problem is increasingly tackled using deep learning methods. A common strategy involves predicting a molecular fingerprint vector from an input mass spectrum, which is then used to search for matches in a chemical compound database. While various loss functions are employed in training these predictive models, their impact on model performance remains poorly understood. In this study, we investigate commonly used loss functions, deriving novel regret bounds that characterize when Bayes-optimal decisions for these objectives must diverge. Our results reveal a fundamental trade-off between the two objectives of (1) fingerprint similarity and (2) molecular retrieval. Optimizing for more accurate fingerprint predictions typically worsens retrieval results, and vice versa. Our theoretical analysis shows this trade-off depends on the similarity structure of candidate sets, providing guidance for loss function and fingerprint selection.
Paper Structure (34 sections, 12 theorems, 88 equations, 6 figures, 7 tables)

This paper contains 34 sections, 12 theorems, 88 equations, 6 figures, 7 tables.

Key Result

Theorem 4.1

For $i\in\{1,\dots,m\}$ define $w_i \;=\; \sum_{\boldsymbol{y}\in\{0,1\}^m}\frac{y_i\,p(\boldsymbol{y}\,|\,\boldsymbol{x})}{\|\boldsymbol{y}\|}$ and let $w_{(1)}\ge\cdots\ge w_{(m)}$ be the sorted weights. For any fixed size $s$, the best $s$-sparse Bayes predictor for cosine similarity selects the

Figures (6)

  • Figure 1: Empirically, different loss functions form a pareto front trading off fingerprint similarity (y-axis) and retrieval performance (x-axis). See \ref{['sec:msmolemainres']} for experimental details. Remark that maximizing Tanimoto similarity is the same as minimizing IoU loss.
  • Figure 2: (A) General framework used in this study. In this figure, all superscript notations indicating the sample index $(i)$ are omitted. A model $f$ predicts a fingerprint $\hat{\boldsymbol{y}}^{(i)}\in[0,1]^m$ from an input spectrum $\boldsymbol{x}^{(i)}$. The prediction is then compared to a set of fingerprints $\mathcal{C}^{(i)}$ derived from candidate molecules in a database. The true fingerprint $\boldsymbol{y}^{(i)}$ is assumed to be part of the candidate set. (B) Loss functions for training $f(\cdot)$ generally fall into one of three categories: (1) bitwise losses (e.g., binary cross entropy or binary focal loss), (2) vectorwise losses (e.g., soft cosine loss or soft IoU loss), or (3) listwise losses (e.g., contrastive loss).
  • Figure 3: Validation performance (in terms of Average Tanimoto, blue, and HR@20, red) over training time. Curves are shown for the BCE Loss, IoU Loss, and Contrastive FP-Cos, respectively. Note that the Contrastive FP-Cos curve is obtained using a temperature of 1/256. As such, the model is tuned for better hit rates (see Table \ref{['tab:tau_tune']}). In addition, note that our experimental setup accounts for different optima in time w.r.t. different metrics, as final scores are reported using the checkpoint for which each metric was optimal.
  • Figure 4: Experimental results when training models using a weighted combination of loss functions. In blue, results are repeated from \ref{['fig:inverse_corr']}. In orange, results are shown using a combination of contrastive loss and IoU loss, with 5 different weights $\lambda$. Scores indicate the mean over 5 model runs. Notably, Even small weights for on contrastive loss components result in a marked increase in retrieval scores over non-contrastive loss functions.
  • Figure 5: Empirical distribution of $\sigma_{\min}$, $\sigma_{\max}$, and $\sigma_{\max} - \sigma_{\min}$ in all MassSpecGym equal-mass candidate sets using the Tanimoto similarity on Morgan Fingerprints (4096 bits, radius 2) to compare candidates to true molecules. $\sigma_{\min}$ denotes the lowest similarity to the true molecule found in the candidate set. $\sigma_{\max}$ denotes the highest similarity to the true molecule found in the candidate set. In some cases, there are candidates with zero similarity to the true molecule (no fingerprint bits in common). Conversely, in some cases, there exist candidates with an equal fingerprint vector representation (i.e., $\sigma_{\max} = 1$ in some cases). This arises due to the problem of duplicate fingerprints. The median $\sigma_{\min}$ equals $0.026$, while the median $\sigma_{\max}$ equals $0.367$.
  • ...and 1 more figures

Theorems & Definitions (20)

  • Theorem 4.1
  • Theorem 4.2: Regret of HR@1 under vectorwise similarity
  • Theorem 4.3: Regret of vectorwise similarity under HR@1
  • Theorem 4.4: Regret bitwise-loss under vectorwise similarity
  • Theorem 5.1
  • proof
  • Lemma 7.1: Row‑wise bounds
  • proof
  • Theorem 7.1: Regret of HR@1 under vectorwise similarity
  • proof
  • ...and 10 more