Table of Contents
Fetching ...

Disentangling the Complex Multiplexed DIA Spectra in De Novo Peptide Sequencing

Zheng Ma, Zeping Mao, Ruixue Zhang, Jiazhen Chen, Lei Xin, Baozhen Shan, Ali Ghodsi, Ming Li

TL;DR

This paper tackles the challenge of de novo peptide sequencing from Data-Independent Acquisition (DIA) data, which is highly multiplexed and noisy due to coeluting peptides. It presents DIANovo, a DIA-tailored encoder–decoder Transformer with a spectrum-graph representation, RoPE-based mass-difference encoding, coelution-aware pretraining, and a two-stage decoding strategy to disentangle overlapping signals. The authors show that DIANovo outperforms prior DIA-based methods (e.g., DeepNovo-DIA, Transformer-DIA, Cascadia) across both older-generation and Orbitrap Astral data, with narrow-window DIA especially advantageous on older instruments and Astral DIA consistently superior to DDA. A theoretical framework links performance to the signal-to-noise balance and p-value behavior, providing practical guidance on when DIA enhances de novo sequencing and highlighting real-time spectrum-quality prediction and adaptive acquisition as promising applications.

Abstract

Data-Independent Acquisition (DIA) was introduced to improve sensitivity to cover all peptides in a range rather than only sampling high-intensity peaks as in Data-Dependent Acquisition (DDA) mass spectrometry. However, it is not very clear how useful DIA data is for de novo peptide sequencing as the DIA data are marred with coeluted peptides, high noises, and varying data quality. We present a new deep learning method DIANovo, and address each of these difficulties, and improves the previous established systems by a large margin, via equipping the model with a deeper understanding of coeluted DIA spectra. This paper also provides criteria about when DIA data could be used for de novo peptide sequencing and when not to by providing a comparison between DDA and DIA, in both de novo and database search mode. We find that while DIA excels with narrow isolation windows on older-generation instruments, it loses its advantage with wider windows. However, with Orbitrap Astral, DIA consistently outperforms DDA due to narrow window mode enabled. We also provide a theoretical explanation of this phenomenon, emphasizing the critical role of the signal-to-noise profile in the successful application of de novo sequencing.

Disentangling the Complex Multiplexed DIA Spectra in De Novo Peptide Sequencing

TL;DR

This paper tackles the challenge of de novo peptide sequencing from Data-Independent Acquisition (DIA) data, which is highly multiplexed and noisy due to coeluting peptides. It presents DIANovo, a DIA-tailored encoder–decoder Transformer with a spectrum-graph representation, RoPE-based mass-difference encoding, coelution-aware pretraining, and a two-stage decoding strategy to disentangle overlapping signals. The authors show that DIANovo outperforms prior DIA-based methods (e.g., DeepNovo-DIA, Transformer-DIA, Cascadia) across both older-generation and Orbitrap Astral data, with narrow-window DIA especially advantageous on older instruments and Astral DIA consistently superior to DDA. A theoretical framework links performance to the signal-to-noise balance and p-value behavior, providing practical guidance on when DIA enhances de novo sequencing and highlighting real-time spectrum-quality prediction and adaptive acquisition as promising applications.

Abstract

Data-Independent Acquisition (DIA) was introduced to improve sensitivity to cover all peptides in a range rather than only sampling high-intensity peaks as in Data-Dependent Acquisition (DDA) mass spectrometry. However, it is not very clear how useful DIA data is for de novo peptide sequencing as the DIA data are marred with coeluted peptides, high noises, and varying data quality. We present a new deep learning method DIANovo, and address each of these difficulties, and improves the previous established systems by a large margin, via equipping the model with a deeper understanding of coeluted DIA spectra. This paper also provides criteria about when DIA data could be used for de novo peptide sequencing and when not to by providing a comparison between DDA and DIA, in both de novo and database search mode. We find that while DIA excels with narrow isolation windows on older-generation instruments, it loses its advantage with wider windows. However, with Orbitrap Astral, DIA consistently outperforms DDA due to narrow window mode enabled. We also provide a theoretical explanation of this phenomenon, emphasizing the critical role of the signal-to-noise profile in the successful application of de novo sequencing.

Paper Structure

This paper contains 17 sections, 3 equations, 7 figures.

Figures (7)

  • Figure 1: (a) The model structure of Our entire workflow. On the top is the optimal path task, generating a series of node indices, transformed into the optimal path. The mass values in the optimal path are then translated to the corresponding amino acids when a single match is found. On the button is the sequence generation task. It takes the generated optimal path and outputs the amino acid sequence to replace mass tags. (b) The BERTdevlin2019bert-like pretrain model (c) An example of a spectrum graph, where the bottom value on each edge represents the mass difference between nodes, encoded by RoPE, and the top value indicates the corresponding amino acid sequence. Only a subset of nodes and edges is plotted for clarity, whereas, in a complete spectrum graph, all possible forward connections would be present.
  • Figure 2: Amino acid recall (a), peptide recall (b), and amino acid precision (c) of our method vs DeepNovo-DIA and Transformer-DIA, training sequences excluded from test set, on various older-generation datasets.
  • Figure 3: Amino acid recall (a), peptide recall (b), and amino acid precision (c) of our method vs baselines on various Astral datasets.
  • Figure 4: Venn diagram, comparison of peptide identification under DDA or DIA mode, with Orbitrap Q Exactive (older-generation).
  • Figure 5: Venn diagram, comparison of peptide identification Under DDA or DIA mode, with Orbitrap Astral.
  • ...and 2 more figures