MC-NN: An End-to-End Multi-Channel Neural Network Approach for Predicting Influenza A Virus Hosts and Antigenic Types
Yanhua Xu, Dominik Wojtczak
TL;DR
This work addresses predicting Influenza A virus hosts and HA/NA subtypes directly from HA and NA protein sequences using a multi-channel neural network (MC-NN) that fuses two input streams (HA and NA trigrams) to produce three outputs (host, HA subtype, NA subtype). It evaluates three architectures—CNN, BiGRU, and Transformer—via nested cross-validation on pre-2020 data and tests on post-2020 and incomplete sequences, with Transformer generally delivering the strongest performance, including host $F1$ of 83.39% and subtype $F1$ of 99.87% on post-2020 data, and $AP$ of 91.63% with $F1$ of 89.29% on incomplete data. The approach substantially outperforms the BLAST baseline and demonstrates potential for rapid, low-cost surveillance in settings with limited laboratory resources. These results highlight the viability of end-to-end, sequence-based host and subtype prediction and suggest directions for extending the method to cross-species transmission and handling limited data realism.
Abstract
Influenza poses a significant threat to public health, particularly among the elderly, young children, and people with underlying dis-eases. The manifestation of severe conditions, such as pneumonia, highlights the importance of preventing the spread of influenza. An accurate and cost-effective prediction of the host and antigenic sub-types of influenza A viruses is essential to addressing this issue, particularly in resource-constrained regions. In this study, we propose a multi-channel neural network model to predict the host and antigenic subtypes of influenza A viruses from hemagglutinin and neuraminidase protein sequences. Our model was trained on a comprehensive data set of complete protein sequences and evaluated on various test data sets of complete and incomplete sequences. The results demonstrate the potential and practicality of using multi-channel neural networks in predicting the host and antigenic subtypes of influenza A viruses from both full and partial protein sequences.
