Table of Contents
Fetching ...

MC-NN: An End-to-End Multi-Channel Neural Network Approach for Predicting Influenza A Virus Hosts and Antigenic Types

Yanhua Xu, Dominik Wojtczak

TL;DR

This work addresses predicting Influenza A virus hosts and HA/NA subtypes directly from HA and NA protein sequences using a multi-channel neural network (MC-NN) that fuses two input streams (HA and NA trigrams) to produce three outputs (host, HA subtype, NA subtype). It evaluates three architectures—CNN, BiGRU, and Transformer—via nested cross-validation on pre-2020 data and tests on post-2020 and incomplete sequences, with Transformer generally delivering the strongest performance, including host $F1$ of 83.39% and subtype $F1$ of 99.87% on post-2020 data, and $AP$ of 91.63% with $F1$ of 89.29% on incomplete data. The approach substantially outperforms the BLAST baseline and demonstrates potential for rapid, low-cost surveillance in settings with limited laboratory resources. These results highlight the viability of end-to-end, sequence-based host and subtype prediction and suggest directions for extending the method to cross-species transmission and handling limited data realism.

Abstract

Influenza poses a significant threat to public health, particularly among the elderly, young children, and people with underlying dis-eases. The manifestation of severe conditions, such as pneumonia, highlights the importance of preventing the spread of influenza. An accurate and cost-effective prediction of the host and antigenic sub-types of influenza A viruses is essential to addressing this issue, particularly in resource-constrained regions. In this study, we propose a multi-channel neural network model to predict the host and antigenic subtypes of influenza A viruses from hemagglutinin and neuraminidase protein sequences. Our model was trained on a comprehensive data set of complete protein sequences and evaluated on various test data sets of complete and incomplete sequences. The results demonstrate the potential and practicality of using multi-channel neural networks in predicting the host and antigenic subtypes of influenza A viruses from both full and partial protein sequences.

MC-NN: An End-to-End Multi-Channel Neural Network Approach for Predicting Influenza A Virus Hosts and Antigenic Types

TL;DR

This work addresses predicting Influenza A virus hosts and HA/NA subtypes directly from HA and NA protein sequences using a multi-channel neural network (MC-NN) that fuses two input streams (HA and NA trigrams) to produce three outputs (host, HA subtype, NA subtype). It evaluates three architectures—CNN, BiGRU, and Transformer—via nested cross-validation on pre-2020 data and tests on post-2020 and incomplete sequences, with Transformer generally delivering the strongest performance, including host of 83.39% and subtype of 99.87% on post-2020 data, and of 91.63% with of 89.29% on incomplete data. The approach substantially outperforms the BLAST baseline and demonstrates potential for rapid, low-cost surveillance in settings with limited laboratory resources. These results highlight the viability of end-to-end, sequence-based host and subtype prediction and suggest directions for extending the method to cross-species transmission and handling limited data realism.

Abstract

Influenza poses a significant threat to public health, particularly among the elderly, young children, and people with underlying dis-eases. The manifestation of severe conditions, such as pneumonia, highlights the importance of preventing the spread of influenza. An accurate and cost-effective prediction of the host and antigenic sub-types of influenza A viruses is essential to addressing this issue, particularly in resource-constrained regions. In this study, we propose a multi-channel neural network model to predict the host and antigenic subtypes of influenza A viruses from hemagglutinin and neuraminidase protein sequences. Our model was trained on a comprehensive data set of complete protein sequences and evaluated on various test data sets of complete and incomplete sequences. The results demonstrate the potential and practicality of using multi-channel neural networks in predicting the host and antigenic subtypes of influenza A viruses from both full and partial protein sequences.
Paper Structure (16 sections, 4 equations, 6 figures, 3 tables)

This paper contains 16 sections, 4 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Data distribution (hosts)
  • Figure 2: Data distribution (subtypes)
  • Figure 3: The multi-channel neural network architecture: positional encoding is only employed along with Transformer.
  • Figure 4: Comparison of Overall Performance Between Models (Hosts): the baseline results with BLAST are framed by the black solid line.
  • Figure 5: Comparison of Overall Performance Between Models (HA subtypes): the baseline results with BLAST are framed by the black solid line.
  • ...and 1 more figures