Table of Contents
Fetching ...

Emergenet: A Digital Twin of Sequence Evolution for Scalable Emergence Risk Assessment of Animal Influenza A Strains

Kevin Yuanbo Wu, Jin Li, Aaron Esser-Kahn, Ishanu Chattopadhyay

TL;DR

Emergenet, a tool to infer a digital twin of sequence evolution to chart how new variants might emerge in the wild, opens the door to preemptive pandemic mitigation through targeted inoculation of animal hosts before the first human infection.

Abstract

Despite having triggered devastating pandemics in the past, our ability to quantitatively assess the emergence potential of individual strains of animal influenza viruses remains limited. This study introduces Emergenet, a tool to infer a digital twin of sequence evolution to chart how new variants might emerge in the wild. Our predictions based on Emergenets built only using 220,151 Hemagglutinnin (HA) sequences consistently outperform WHO seasonal vaccine recommendations for H1N1/H3N2 subtypes over two decades (average match-improvement: 3.73 AAs, 28.40\%), and are at par with state-of-the-art approaches that use more detailed phenotypic annotations. Finally, our generative models are used to scalably calculate the current odds of emergence of animal strains not yet in human circulation, which strongly correlates with CDC's expert-assessed Influenza Risk Assessment Tool (IRAT) scores (Pearson's $r = 0.721, p = 10^{-4}$). A minimum five orders of magnitude speedup over CDC's assessment (seconds vs months) then enabled us to analyze 6,354 animal strains collected post-2020 to identify 35 strains with high emergence scores ($> 7.7$). The Emergenet framework opens the door to preemptive pandemic mitigation through targeted inoculation of animal hosts before the first human infection.

Emergenet: A Digital Twin of Sequence Evolution for Scalable Emergence Risk Assessment of Animal Influenza A Strains

TL;DR

Emergenet, a tool to infer a digital twin of sequence evolution to chart how new variants might emerge in the wild, opens the door to preemptive pandemic mitigation through targeted inoculation of animal hosts before the first human infection.

Abstract

Despite having triggered devastating pandemics in the past, our ability to quantitatively assess the emergence potential of individual strains of animal influenza viruses remains limited. This study introduces Emergenet, a tool to infer a digital twin of sequence evolution to chart how new variants might emerge in the wild. Our predictions based on Emergenets built only using 220,151 Hemagglutinnin (HA) sequences consistently outperform WHO seasonal vaccine recommendations for H1N1/H3N2 subtypes over two decades (average match-improvement: 3.73 AAs, 28.40\%), and are at par with state-of-the-art approaches that use more detailed phenotypic annotations. Finally, our generative models are used to scalably calculate the current odds of emergence of animal strains not yet in human circulation, which strongly correlates with CDC's expert-assessed Influenza Risk Assessment Tool (IRAT) scores (Pearson's ). A minimum five orders of magnitude speedup over CDC's assessment (seconds vs months) then enabled us to analyze 6,354 animal strains collected post-2020 to identify 35 strains with high emergence scores (). The Emergenet framework opens the door to preemptive pandemic mitigation through targeted inoculation of animal hosts before the first human infection.

Paper Structure

This paper contains 20 sections, 1 theorem, 32 equations, 15 figures, 19 tables.

Key Result

Theorem 1

Given a sequence $x$ of length $N$ that transitions to a strain $y\in Q$, we have the following bounds at significance level $\alpha$. where $\omega_{y}^Q$ is the persistence probability of strain $y$ in the target population $Q$ (See Def. defmem), and $\theta(x,y)$ is the q-distance between $x,y$ (See Def. defqdistance).

Figures (15)

  • Figure 1: Emergenet inference and applications. Panel a Variations of genomes for identical subtypes of Influenza A are analyzed to infer a recursive forest of conditional inference trees Hothorn06unbiasedrecursive -- the Emergenet -- which maximally captures the emergent dependencies between an a priori unspecified number of mutations. With these inferred dependencies we can estimate the numerical odds of specific mutations, and by extension, the numerical value of the probability of one strain giving rise to another in the wild, under complex selection pressures from the background. Panel b Snapshot of decision trees from the Emergenet inferred for H1N1 HA sequences collected in 2020-2021, which reveals a cyclic dependency. In general, every internal node of a component tree can be "expanded" into its own tree, underscoring the recursive structure of the Emergenet. Panel c First application: forecast dominant strain(s) for the next flu season, using only sequences collected up to six months prior and the inferred Emergenet, using data from the past year. Panel d Second application: estimation of the pandemic risk posed by individual animal strains that are still not known to circulate in humans.
  • Figure 1: Sequence comparisons.Panel a Comparing the Emergenet (ENT) and the WHO recommendation (WHO), and the observed dominant strain (DOM), we note that the correct Emergenet predictions tend to be within the RBD, both for H1N1 and H3N2 for HA. Panels b-f Additionally, by comparing the type, side chain area, and the accessible side chain area, we note that DOM and ENT are often close in important chemical properties, while WHO deviations are not. Panels g-i show the localization of the deviations in the molecular structure of HA, where we note that the changes are most frequent in the HA1 sub-unit (the globular head), and around residues and structures that have been commonly implicated in receptor binding interactions $e.g$ the $\approx 200$ loop, the $\approx 220$ loop and the $\approx 180$-helix tzarum2015structurelazniewski2018structuralgarcia2015dynamic.
  • Figure 1: Low-risk animal strain comparison. HA sequence comparison with 2020-2021 dominant frequency human strains (A/Baltimore/JH/001/2021, A/Myanmar/I026/2021, A/Darwin/12/2021) with Emergenet estimated relatively medium-risk H3N2 EPI2146849 (emergence risk score 6.5) showing substantially more differences compared to high-risk strain comaprison shown in Extended Data Fig. \ref{['figriskyseqlr']}.
  • Figure 2: Seasonal predictions for Influenza A. Relative out-performance of Emergenet predictions against WHO recommendations for H1N1 and H3N2 subtypes for Hemagglutinin (HA) over the both hemispheres. The negative bars (red) indicate the reduced average Hamming distance between the predicted sequence and the sequence population that season. Providing two recommendations shows a significant improvement over providing a single recommendation. Note that the recommendations for the north are given in February, while that for the south are given in September, keeping in mind that the southern flu season begins a few months earlier (e.g. for the 2022-2023 flu season, northern data is labelled '2022').
  • Figure 2: High-risk animal strain comparison. HA sequence comparison with 2020-2021 dominat frequency human strains (A/Baltimore/JH/001/2021, A/Myanmar/I026/2021, A/Darwin/12/2021) with Emergenet estimated top H3N2 risky strain EPI1818137 (emergence score $> 7.26$, 2020-2022 April) showing differences in and out of the RBD.
  • ...and 10 more figures

Theorems & Definitions (6)

  • Definition 1: Emergenet
  • Definition 2: E-distance: adaptive biologically meaningful dissimilarity between sequences
  • Definition 3: Persistence probability of a sequence
  • Theorem 1: Probability bound
  • proof
  • Remark 1