Primer C-VAE: An interpretable deep learning primer design method to detect emerging virus variants
Hanyu Wang, Emmanuel K. Tsinda, Anthony J. Dunn, Francis Chikweto, Alain B. Zemkoho
TL;DR
Primer C-VAE presents an interpretable, DL-based primer-design workflow that scales to long, highly similar genomes and enables simultaneous forward and reverse primer generation. By coupling a convolutional variational auto-encoder with reconstruction-driven interpretability and four anchor-extraction strategies, the method achieves high variant-discrimination accuracy and yields highly target-specific primer pairs validated in silico. The approach demonstrates strong performance on SARS-CoV-2 variants and on closely related bacteria (E. coli vs S. flexneri), while remaining adaptable to qPCR applications and long-genome contexts. Despite computational demands and lack of wet-lab validation, Primer C-VAE offers a scalable, semi-automated alternative to traditional primer design pipelines with potential impact on surveillance of emerging pathogens.
Abstract
Motivation: PCR is more economical and quicker than Next Generation Sequencing for detecting target organisms, with primer design being a critical step. In epidemiology with rapidly mutating viruses, designing effective primers is challenging. Traditional methods require substantial manual intervention and struggle to ensure effective primer design across different strains. For organisms with large, similar genomes like Escherichia coli and Shigella flexneri, differentiating between species is also difficult but crucial. Results: We developed Primer C-VAE, a model based on a Variational Auto-Encoder framework with Convolutional Neural Networks to identify variants and generate specific primers. Using SARS-CoV-2, our model classified variants (alpha, beta, gamma, delta, omicron) with 98% accuracy and generated variant-specific primers. These primers appeared with >95% frequency in target variants and <5% in others, showing good performance in in-silico PCR tests. For Alpha, Delta, and Omicron, our primer pairs produced fragments <200 bp, suitable for qPCR detection. The model also generated effective primers for organisms with longer gene sequences like E. coli and S. flexneri. Conclusion: Primer C-VAE is an interpretable deep learning approach for developing specific primer pairs for target organisms. This flexible, semi-automated and reliable tool works regardless of sequence completeness and length, allowing for qPCR applications and can be applied to organisms with large and highly similar genomes.
