Disentangling the Complex Multiplexed DIA Spectra in De Novo Peptide Sequencing
Zheng Ma, Zeping Mao, Ruixue Zhang, Jiazhen Chen, Lei Xin, Baozhen Shan, Ali Ghodsi, Ming Li
TL;DR
This paper tackles the challenge of de novo peptide sequencing from Data-Independent Acquisition (DIA) data, which is highly multiplexed and noisy due to coeluting peptides. It presents DIANovo, a DIA-tailored encoder–decoder Transformer with a spectrum-graph representation, RoPE-based mass-difference encoding, coelution-aware pretraining, and a two-stage decoding strategy to disentangle overlapping signals. The authors show that DIANovo outperforms prior DIA-based methods (e.g., DeepNovo-DIA, Transformer-DIA, Cascadia) across both older-generation and Orbitrap Astral data, with narrow-window DIA especially advantageous on older instruments and Astral DIA consistently superior to DDA. A theoretical framework links performance to the signal-to-noise balance and p-value behavior, providing practical guidance on when DIA enhances de novo sequencing and highlighting real-time spectrum-quality prediction and adaptive acquisition as promising applications.
Abstract
Data-Independent Acquisition (DIA) was introduced to improve sensitivity to cover all peptides in a range rather than only sampling high-intensity peaks as in Data-Dependent Acquisition (DDA) mass spectrometry. However, it is not very clear how useful DIA data is for de novo peptide sequencing as the DIA data are marred with coeluted peptides, high noises, and varying data quality. We present a new deep learning method DIANovo, and address each of these difficulties, and improves the previous established systems by a large margin, via equipping the model with a deeper understanding of coeluted DIA spectra. This paper also provides criteria about when DIA data could be used for de novo peptide sequencing and when not to by providing a comparison between DDA and DIA, in both de novo and database search mode. We find that while DIA excels with narrow isolation windows on older-generation instruments, it loses its advantage with wider windows. However, with Orbitrap Astral, DIA consistently outperforms DDA due to narrow window mode enabled. We also provide a theoretical explanation of this phenomenon, emphasizing the critical role of the signal-to-noise profile in the successful application of de novo sequencing.
