Table of Contents
Fetching ...

MS2MetGAN: Latent-space adversarial training for metabolite-spectrum matching in MS/MS database search

Meng Tsai, Alexzander Dwyer, Estelle Nuckels, Yingfeng Wang

Abstract

Database search is a widely used approach for identifying metabolites from tandem mass spectra (MS/MS). In this strategy, an experimental spectrum is matched against a user-specified database of candidate metabolites, and candidates are ranked such that true metabolite-spectrum matches receive the highest scores. Machine-learning methods have been widely incorporated into database-search-based identification tools and have substantially improved performance. To further improve identification accuracy, we propose a new framework for generating negative training samples. The framework first uses autoencoders to learn latent representations of metabolite structures and MS/MS spectra, thereby recasting metabolite-spectrum matching as matching between latent vectors. It then uses a GAN to generate latent vectors of decoy metabolites and constructs decoy metabolite-spectrum matches as negative samples for training. Experimental results show that our tool, MS2MetGAN, achieves better overall performance than existing metabolite identification methods.

MS2MetGAN: Latent-space adversarial training for metabolite-spectrum matching in MS/MS database search

Abstract

Database search is a widely used approach for identifying metabolites from tandem mass spectra (MS/MS). In this strategy, an experimental spectrum is matched against a user-specified database of candidate metabolites, and candidates are ranked such that true metabolite-spectrum matches receive the highest scores. Machine-learning methods have been widely incorporated into database-search-based identification tools and have substantially improved performance. To further improve identification accuracy, we propose a new framework for generating negative training samples. The framework first uses autoencoders to learn latent representations of metabolite structures and MS/MS spectra, thereby recasting metabolite-spectrum matching as matching between latent vectors. It then uses a GAN to generate latent vectors of decoy metabolites and constructs decoy metabolite-spectrum matches as negative samples for training. Experimental results show that our tool, MS2MetGAN, achieves better overall performance than existing metabolite identification methods.
Paper Structure (11 sections, 1 figure, 5 tables)

This paper contains 11 sections, 1 figure, 5 tables.

Figures (1)

  • Figure 1: Overview of the method. MS/MS spectra and metabolite structures are encoded into latent spectrum vectors and latent metabolite vectors by a spectrum autoencoder and a structure autoencoder, respectively. The generative adversarial network (GAN) comprises a generator and a discriminator. The generator produces synthetic latent metabolite vectors conditioned on latent spectrum vectors, whereas the discriminator assigns a true-likeness score to each latent metabolite–spectrum match (MSM), distinguishing true MSMs from decoy MSMs formed by pairing synthetic metabolite latents with the corresponding spectrum latents. These scores are then used to rank metabolite candidates.