Table of Contents
Fetching ...

Immunogenicity Prediction with Dual Attention Enables Vaccine Target Selection

Song Li, Yang Tan, Song Ke, Liang Hong, Bingxin Zhou

TL;DR

This work tackles immunogenicity prediction for vaccine design, proposing VenusVaccine, a dual-attention deep learning model that fuses sequence embeddings from pre-trained protein language models with two scales of structure tokens and handcrafted physicochemical descriptors. The authors build ImmunoDB, the largest cross-species immunogenicity benchmark to date, enabling rigorous training and evaluation across bacteria, virus, and tumor antigens. Across extensive experiments and post-hoc analyses, VenusVaccine outperforms diverse baselines and demonstrates practical utility in identifying vaccine candidates, as shown in case studies on Helicobacter pylori and SARS-CoV-2. The study also provides valuable resources and evaluation protocols to benchmark future methods in reverse vaccinology and vaccine target discovery.

Abstract

Immunogenicity prediction is a central topic in reverse vaccinology for finding candidate vaccines that can trigger protective immune responses. Existing approaches typically rely on highly compressed features and simple model architectures, leading to limited prediction accuracy and poor generalizability. To address these challenges, we introduce VenusVaccine, a novel deep learning solution with a dual attention mechanism that integrates pre-trained latent vector representations of protein sequences and structures. We also compile the most comprehensive immunogenicity dataset to date, encompassing over 7000 antigen sequences, structures, and immunogenicity labels from bacteria, virus, and tumor. Extensive experiments demonstrate that VenusVaccine outperforms existing methods across a wide range of evaluation metrics. Furthermore, we establish a post-hoc validation protocol to assess the practical significance of deep learning models in tackling vaccine design challenges. Our work provides an effective tool for vaccine design and sets valuable benchmarks for future research. The implementation is at https://github.com/songleee/VenusVaccine.

Immunogenicity Prediction with Dual Attention Enables Vaccine Target Selection

TL;DR

This work tackles immunogenicity prediction for vaccine design, proposing VenusVaccine, a dual-attention deep learning model that fuses sequence embeddings from pre-trained protein language models with two scales of structure tokens and handcrafted physicochemical descriptors. The authors build ImmunoDB, the largest cross-species immunogenicity benchmark to date, enabling rigorous training and evaluation across bacteria, virus, and tumor antigens. Across extensive experiments and post-hoc analyses, VenusVaccine outperforms diverse baselines and demonstrates practical utility in identifying vaccine candidates, as shown in case studies on Helicobacter pylori and SARS-CoV-2. The study also provides valuable resources and evaluation protocols to benchmark future methods in reverse vaccinology and vaccine target discovery.

Abstract

Immunogenicity prediction is a central topic in reverse vaccinology for finding candidate vaccines that can trigger protective immune responses. Existing approaches typically rely on highly compressed features and simple model architectures, leading to limited prediction accuracy and poor generalizability. To address these challenges, we introduce VenusVaccine, a novel deep learning solution with a dual attention mechanism that integrates pre-trained latent vector representations of protein sequences and structures. We also compile the most comprehensive immunogenicity dataset to date, encompassing over 7000 antigen sequences, structures, and immunogenicity labels from bacteria, virus, and tumor. Extensive experiments demonstrate that VenusVaccine outperforms existing methods across a wide range of evaluation metrics. Furthermore, we establish a post-hoc validation protocol to assess the practical significance of deep learning models in tackling vaccine design challenges. Our work provides an effective tool for vaccine design and sets valuable benchmarks for future research. The implementation is at https://github.com/songleee/VenusVaccine.
Paper Structure (33 sections, 5 equations, 6 figures, 17 tables)

This paper contains 33 sections, 5 equations, 6 figures, 17 tables.

Figures (6)

  • Figure 1: Illustrative framework of VenusVaccine. The model encodes sequence and structural representations using a dual-attention mechanism, followed by aggregation layers to incorporate global physicochemical attributes and perform binary classification of immunogenicity prediction.
  • Figure 2: Data collection, redundancy processing, and dataset construction steps of ImmunoDB.
  • Figure 3: Generalizability of VenusVaccine by the AUC and Recall of cross-test evaluations. For instance, the top left figure reports the test performance on Immuno-Bacteria by different models trained on Immuno-Virus (yellow) and Immuno-Tumor (orange).
  • Figure 4: KDE of predicted immunogenicity scores on Helicobacter pylori candidates. The $11$ experimentally determined immunogen are highlighted by red dots. Only VenusVaccine simultaneously identifies all positive samples while providing a reasonable overall distribution.
  • Figure 5: Epitope marker on the surface glycoprotein (NCBI ID: YP_009724390.1) by the attention score identifies vaccine targets.
  • ...and 1 more figures