Table of Contents
Fetching ...

Unmasking Airborne Threats: Guided-Transformers for Portable Aerosol Mass Spectrometry

Kyle M. Regan, Michael McLoughlin, Wayne A. Bryden, Gonzalo R. Arce

TL;DR

The paper tackles real-time detection of airborne pathogens with portable aerosol MALDI-MS, where single-shot spectra are noisy and traditional averaging is impractical for field use. It introduces MS-DGFormer, a dual-stream transformer that leverages SVD-denoised dictionary subspaces as priors, enabling robust, single-shot multi-label classification by processing raw spectra and denoised dictionaries in parallel and fusing them via selection attention. The authors demonstrate state-of-the-art macro and micro performance on a field-relevant aerosol dataset and further improve efficiency with MS-DGFormer-E, a streamlined inference variant that reduces parameters and doubles throughput. This work supports real-time environmental biosurveillance with portable MALDI-ToF platforms, potentially enabling rapid response to biological threats in public spaces.

Abstract

Matrix Assisted Laser Desorption/Ionization Mass Spectrometry (MALDI-MS) is a cornerstone in biomolecular analysis, offering precise identification of pathogens through unique mass spectral signatures. Yet, its reliance on labor-intensive sample preparation and multi-shot spectral averaging restricts its use to laboratory settings, rendering it impractical for real-time environmental monitoring. These limitations are especially pronounced in emerging aerosol MALDI-MS systems, where autonomous sampling generates noisy spectra for unknown aerosol analytes, requiring single-shot detection for effective analysis. Addressing these challenges, we propose the Mass Spectral Dictionary-Guided Transformer (MS-DGFormer): a data-driven framework that redefines spectral analysis by directly processing raw, minimally prepared mass spectral data. MS-DGFormer leverages a transformer architecture, designed to capture the long-range dependencies inherent in these time-series spectra. To enhance feature extraction, we introduce a novel dictionary encoder that integrates denoised spectral information derived from Singular Value Decomposition (SVD), enabling the model to discern critical biomolecular patterns from single-shot spectra with robust performance. This innovation provides a system to achieve superior pathogen identification from aerosol samples, facilitating autonomous, real-time analysis in field conditions. By eliminating the need for extensive preprocessing, our method unlocks the potential for portable, deployable MALDI-MS platforms, revolutionizing environmental pathogen detection and rapid response to biological threats.

Unmasking Airborne Threats: Guided-Transformers for Portable Aerosol Mass Spectrometry

TL;DR

The paper tackles real-time detection of airborne pathogens with portable aerosol MALDI-MS, where single-shot spectra are noisy and traditional averaging is impractical for field use. It introduces MS-DGFormer, a dual-stream transformer that leverages SVD-denoised dictionary subspaces as priors, enabling robust, single-shot multi-label classification by processing raw spectra and denoised dictionaries in parallel and fusing them via selection attention. The authors demonstrate state-of-the-art macro and micro performance on a field-relevant aerosol dataset and further improve efficiency with MS-DGFormer-E, a streamlined inference variant that reduces parameters and doubles throughput. This work supports real-time environmental biosurveillance with portable MALDI-ToF platforms, potentially enabling rapid response to biological threats in public spaces.

Abstract

Matrix Assisted Laser Desorption/Ionization Mass Spectrometry (MALDI-MS) is a cornerstone in biomolecular analysis, offering precise identification of pathogens through unique mass spectral signatures. Yet, its reliance on labor-intensive sample preparation and multi-shot spectral averaging restricts its use to laboratory settings, rendering it impractical for real-time environmental monitoring. These limitations are especially pronounced in emerging aerosol MALDI-MS systems, where autonomous sampling generates noisy spectra for unknown aerosol analytes, requiring single-shot detection for effective analysis. Addressing these challenges, we propose the Mass Spectral Dictionary-Guided Transformer (MS-DGFormer): a data-driven framework that redefines spectral analysis by directly processing raw, minimally prepared mass spectral data. MS-DGFormer leverages a transformer architecture, designed to capture the long-range dependencies inherent in these time-series spectra. To enhance feature extraction, we introduce a novel dictionary encoder that integrates denoised spectral information derived from Singular Value Decomposition (SVD), enabling the model to discern critical biomolecular patterns from single-shot spectra with robust performance. This innovation provides a system to achieve superior pathogen identification from aerosol samples, facilitating autonomous, real-time analysis in field conditions. By eliminating the need for extensive preprocessing, our method unlocks the potential for portable, deployable MALDI-MS platforms, revolutionizing environmental pathogen detection and rapid response to biological threats.

Paper Structure

This paper contains 19 sections, 9 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: (A) Top: An example batch of particles containing $80\%$ dust particulate, with the remaining $20\%$ evenly divided among the four biological markers. Each row represents a mass spectrum, and each column corresponds to a mass-to-charge ratio value. Bottom: The column-wise average. (B) Top: The heatmap of rows from (A) corresponding to B. globigii spectra. Bottom: the average spectrum. (C), (D), (E) same format as (A) but with E. coli, Insulin, and Ubiquitin, respectively.
  • Figure 2: Top: Heatmap of $200$ mass spectra from Bacillus globigii. Middle: A low-rank approximation via the Singular Value Decomposition (SVD) with rank $r=2$. Bottom: The first 50 singular values from the SVD plotted on a y-axis log-scale.
  • Figure 3: The input spectral embedding layer creates a sequence of small overlapping patches from the mass spectrum $\mathbf{s}$ through one-dimensional convolution filters, transforming the 1D spectrum to 2D sequence.
  • Figure 4: The Mass Spectral Dictionary-Guided Transformer (MS-DGFormer) architecture. (A) Input embedding module. (B) Dictionary Embedding Module. (C) Selection Attention Mechanism. (D) Final peak prediction layer.
  • Figure 5: The processing of a sub-dictionary is illustrated by exemplifying the first low-rank approximated sub-dictionary $\tilde{\mathbf{D}}_1 \in \mathbb{R}^{\frac{\alpha}{c} \times l}$. The spectra $[\tilde{\mathbf{d}}_{1,1}, \ldots, \tilde{\mathbf{d}}_{1,\frac{\alpha}{c}}]^{T}$ are transformed into token sequences via convolution with overlapping kernels and $h$ output channels, yielding $\tilde{\mathbf{D}}^{p}_{1} \in \mathbb{R}^{\frac{\alpha}{c} \times N \times h}$, where each kernel encodes temporal peak information. A learnable token sequence $\tilde{\mathbf{d}}^{L}_{1} \in \mathbb{R}^{1 \times N \times h}$ is concatenated with $\tilde{\mathbf{D}}^{p}_{1}$, forming $\tilde{\mathbf{D}}^{p}_{1} \in \mathbb{R}^{\left(\frac{\alpha}{c} + 1\right) \times N \times h}$. This tensor is permuted to $\mathbb{R}^{N \times \left(\frac{\alpha}{c} + 1\right) \times h}$, for attention to be computed independently across the $N$ temporal positions. The attention mechanism aggregates information across the $\frac{\alpha}{c} + 1$ sequences at each temporal location. Finally, the learnable tokens $\tilde{\mathbf{d}}^{L}_{1}$, now enriched with contextual information, are extracted to represent the aggregated temporal information.
  • ...and 4 more figures