Table of Contents
Fetching ...

Learning virulence-transmission relationships using causal inference

Sudam Surasinghe, C. Brandon Ogbunugafor

TL;DR

LETR presents a data-driven framework that combines Granger-style causal inference with discrete dynamical maps and transfer-operator analysis to uncover directional relationships between pathogen virulence and transmission. By identifying Granger causal drivers via geometric information flow and fitting conditional update maps, LETR links short-term predictability to long-run trait distributions through invariant densities. Results on synthetic myxomatosis data validate a robust virulence-to-transmission causal path, while the SARS-CoV-2 analysis reveals context-dependent, region-specific asymmetries and bimodal long-run virulence patterns. The work highlights that virulence and transmission relationships are dynamic and context-dependent, with broad implications for understanding disease evolution and for developing mechanistic theories beyond static trade-off models.

Abstract

The relationship between traits that influence pathogen virulence and transmission is part of the central canon of the evolution and ecology of infectious disease. However, identifying directional and mechanistic relationships among traits remains a key challenge in various subfields of biology, as models often assume static, fixed links between characteristics. Here, we introduce learning evolutionary trait relationships (LETR), a data-driven framework that applies Granger-causality principles to determine which traits drive others and how these relationships change over time. LETR integrates causal discovery with generative mapping and transfer-operator analysis to link short-term predictability with long-term trait distributions. Using a synthetic myxomatosis virus-host data set, we show that LETR reliably recovers known directional influences, such as virulence driving transmission. Applying the framework to global pandemic (SARS-CoV-2) data, we find that past virulence improves future transmission prediction, while the reverse effect is weak. Invariant-density estimates reveal a long-term trend toward low virulence and transmission, with bimodality in virulence suggesting ecological influences or host heterogeneity. In summary, this study provides a blueprint for learning the relationship between how harmful a pathogen is and how well it spreads, which is highly idiosyncratic and context-dependent. This finding undermines simplistic models and encourages the development of new theory for the constraints underlying pathogen evolution. Further, by uniting causal inference with dynamical modeling, the LETR framework offers a general approach for uncovering mechanistic trait linkages in complex biological systems of various kinds.

Learning virulence-transmission relationships using causal inference

TL;DR

LETR presents a data-driven framework that combines Granger-style causal inference with discrete dynamical maps and transfer-operator analysis to uncover directional relationships between pathogen virulence and transmission. By identifying Granger causal drivers via geometric information flow and fitting conditional update maps, LETR links short-term predictability to long-run trait distributions through invariant densities. Results on synthetic myxomatosis data validate a robust virulence-to-transmission causal path, while the SARS-CoV-2 analysis reveals context-dependent, region-specific asymmetries and bimodal long-run virulence patterns. The work highlights that virulence and transmission relationships are dynamic and context-dependent, with broad implications for understanding disease evolution and for developing mechanistic theories beyond static trade-off models.

Abstract

The relationship between traits that influence pathogen virulence and transmission is part of the central canon of the evolution and ecology of infectious disease. However, identifying directional and mechanistic relationships among traits remains a key challenge in various subfields of biology, as models often assume static, fixed links between characteristics. Here, we introduce learning evolutionary trait relationships (LETR), a data-driven framework that applies Granger-causality principles to determine which traits drive others and how these relationships change over time. LETR integrates causal discovery with generative mapping and transfer-operator analysis to link short-term predictability with long-term trait distributions. Using a synthetic myxomatosis virus-host data set, we show that LETR reliably recovers known directional influences, such as virulence driving transmission. Applying the framework to global pandemic (SARS-CoV-2) data, we find that past virulence improves future transmission prediction, while the reverse effect is weak. Invariant-density estimates reveal a long-term trend toward low virulence and transmission, with bimodality in virulence suggesting ecological influences or host heterogeneity. In summary, this study provides a blueprint for learning the relationship between how harmful a pathogen is and how well it spreads, which is highly idiosyncratic and context-dependent. This finding undermines simplistic models and encourages the development of new theory for the constraints underlying pathogen evolution. Further, by uniting causal inference with dynamical modeling, the LETR framework offers a general approach for uncovering mechanistic trait linkages in complex biological systems of various kinds.
Paper Structure (34 sections, 14 equations, 16 figures, 6 tables)

This paper contains 34 sections, 14 equations, 16 figures, 6 tables.

Figures (16)

  • Figure 1: Illustration of the learning evolutionary trait relationships (LETR) framework. The top box shows time series of focal and interacting traits, such as virulence ($\mu_n$) and transmission ($\beta_n$), collected from evolving pathogen populations or from host-level surveillance. Causal discovery identifies which measured variables improve short-term forecasts of a chosen focal trait by testing whether adding a candidate predictor, such as an environmental covariate, a demographic or behavioral variable, a genetic marker, or a treatment exposure improves one-step forecasts. Methods include geometric information flow (GeoC) surasinghe2020geometry, transfer entropy schreiber2000measuring and causation entropy sun2014causation. Model fitting fits a generative update map $x_{n+1}=f(x_n,y_{i_n})$ conditioned on drivers discovered in step one, where $f$ may be a pedagogical map such as the logistic map or a regularized supervised learner chosen by cross-validation, and where fitted models yield interpretable summaries and partial dependence visualizations for applied researchers. Invariant densities analyze ensemble behavior under $f$ by approximating the Frobenius-Perron transfer operator and estimating the invariant density of $x$, which summarizes the long-run distribution of the focal trait under the inferred dynamics and observed driver processes, highlights prevalent phenotypes and multimodality, and indicates how interventions may reshape stable outcomes. Arrows indicate the workflow and feedback, with dashed links from causal discovery back to the data denoting the possibility of adding additional variables to improve inference, and arrows from data to model fitting denoting direct learning of long-term behaviour from observed trajectories.
  • Figure 2: Simulated complex relationship between virulence and transmission, motivated by myxomatosis. We illustrate the complex relationship between virulence and transmission trait dynamics using a methodological benchmark. (a) The marginal density of virulence $\mu$ estimated by kernel density estimation and departs markedly from a Gaussian form, exhibiting skewness and heavy tails that indicate a nontrivial probability of extreme virulence values. (b) The marginal density of transmission $\beta$ is likewise skewed and not well summarized by mean and variance alone. (c) The joint density of $\mu$ and $\beta$reveals a nonlinear, non-monotonic, and complex relationship between the traits. The dataset was constructed to avoid Gaussian assumptions and a simple tradeoff structure, providing a demanding test case for causal discovery methods. The complex dynamics underscore the need for flexible, data-driven causal inference when interpreting biological trait relationships.
  • Figure 3: Geometric diagnostics of the virulence–transmission causal model based on synthetic data inspired by myxomatosis. One-step relationships between consecutive generations illustrate that past virulence improves the prediction of future transmission. (a) Scatter plot of $\mu_n$ versus $\mu_{n+1}$, showing a tight low-dimensional relationship consistent with strong self-predictability of virulence and the existence of a parsimonious update map for $\mu$. (b) Scatter plot of $\beta_n$ versus $\beta_{n+1}$, showing weak one-step predictability when using $\beta$ alone. The point cloud lies in a higher-dimensional space than a simple one-dimensional curve. The estimated correlation dimension for the pair $[\beta_n,\beta_{n+1}]$ is approximately 1.17, indicating that $\beta$ dynamics are not adequately described by a one-dimensional map in $\beta$ and that additional drivers are required. (c) Scatter plot of $(\mu_n,\beta_n)$ versus $\beta_{n+1}$, showing that the point cloud lies on an effectively two-dimensional manifold, thereby recovering the structure absent from panel (b). Together, the contrast between panels (a) and (b) and the structural recovery in panel (c) provide visual and model-free evidence that $\mu_n$ carries predictive information for subsequent transmission. Quantitative confirmation of these observations is provided by geometric causal inference and correlation dimension estimates presented in the main text. This figure illustrates the geometric validation of the recovery of the hidden benchmark causal structure and captures the expected virulence–transmission dynamics with complex, biologically relevant trait interactions.
  • Figure 4: Geometric investigation of SARS-CoV-2 trait dynamics. One-step relationships show that virulence is largely self-predictive while transmission is more dispersed and likely requires additional causal predictors. Data points are daily observations from Our World in Data Mathieu2020mortalityMathieu2020cases for the period 1 September, 2020 to 24 August, 2025. (a) The phase space $\mu$ versus $\beta$ with a reciprocal fit $\beta \approx 0.228/\mu$ obtained by nonlinear regression. (b) shows $\mu_n$ versus $\mu_{n+1}$ and reveals a tight one-dimensional manifold consistent with strong temporal predictability of virulence. (c) shows $\beta_n$ versus $\beta_{n+1}$, which exhibits substantially greater dispersion and a structured pattern in which higher transmission values separate into three apparent branches, suggesting more complex dynamics and the plausibility of additional causal drivers. Together, these patterns geometrically illustrate the asymmetric causal structure between virulence and transmission and indicate that predicting transmission may require additional ecological, social, or behavioral variables beyond virulence alone.
  • Figure 5: Long-run trait distributions of SARS-CoV-2. Estimated invariant densities for SARS-CoV-2 virulence and transmission obtained by Ulam's method. (a) The invariant density for virulence $\mu$ is bimodal with modes near $\mu\approx 0.0025$ and $\mu\approx 0.0175$. (b) The invariant density for transmission $\beta$, where $\beta$ denotes the min-max scaled (see jain2011minali2022investigating) daily new cases per million. The transmission density is strongly concentrated near the lower bound and displays a rapidly decaying tail that is approximately exponential, with a weak secondary mode at the very low end. Data are daily observations from Our World in Data Mathieu2020mortalityMathieu2020cases covering the period from September 1st, 2020, to August 24, 2025. These estimates indicate a long-term tendency toward low virulence and low transmission. The observed bimodality in virulence may arise from variations in periods favorable to the pathogen, driven by host heterogeneities or environmental fluctuations.
  • ...and 11 more figures