Avoiding matrix exponentials for large transition rate matrices

Pedro Pessoa; Max Schweiger; Steve Presse

Avoiding matrix exponentials for large transition rate matrices

Pedro Pessoa, Max Schweiger, Steve Presse

TL;DR

Matrix exponentiation for large, sparse rate matrices is computationally expensive in time and memory. The authors compare five approaches—three MJP-based (T-MJP, J-MJP, R-MJP), plus Runge-Kutta RK4 and Krylov subspace methods—to avoid explicit exponentiation, showing that under reasonable sparsity and scaling assumptions the fastest approaches scale as $O(N^2)$ time and $O(N)$ memory. Through Bayesian inference benchmarks on birth-death and gene-network models, they demonstrate that Krylov and R-MJP generally outperform alternatives, with Krylov favored for very large state spaces and R-MJP preferable for sparser generators. J-MJP, while conceptually appealing, exhibits poor mixing and inconsistent posteriors due to latent-variable burden, making it impractical for most inference tasks. The work provides actionable guidance on selecting the most efficient method for posterior inference in large MJPs, enabling scalable probabilistic modeling in systems biology and related fields.

Abstract

Exact methods for exponentiation of matrices of dimension $N$ can be computationally expensive in terms of execution time ($N^{3}$) and memory requirements ($N^{2}$) not to mention numerical precision issues. A type of matrix often exponentiated in the sciences is the rate matrix. Here we explore five methods to exponentiate rate matrices some of which apply even more broadly to other matrix types. Three of the methods leverage a mathematical analogy between computing matrix elements of a matrix exponential and computing transition probabilities of a dynamical processes (technically a Markov jump process, MJP, typically simulated using Gillespie). In doing so, we identify a novel MJP-based method relying on restricting the number of "trajectory" jumps based on the magnitude of the matrix elements with favorable computational scaling. We then discuss this method's downstream implications on mixing properties of Monte Carlo posterior samplers. We also benchmark two other methods of matrix exponentiation valid for any matrix (beyond rate matrices and, more generally, positive definite matrices) related to solving differential equations: Runge-Kutta integrators and Krylov subspace methods. Under conditions where both the largest matrix element and the number of non-vanishing elements scale linearly with $N$ -- reasonable conditions for rate matrices often exponentiated -- computational time scaling with the most competitive methods (Krylov and one of the MJP-based methods) reduces to $N^2$ with total memory requirements of $N$.

Avoiding matrix exponentials for large transition rate matrices

TL;DR

time and

memory. Through Bayesian inference benchmarks on birth-death and gene-network models, they demonstrate that Krylov and R-MJP generally outperform alternatives, with Krylov favored for very large state spaces and R-MJP preferable for sparser generators. J-MJP, while conceptually appealing, exhibits poor mixing and inconsistent posteriors due to latent-variable burden, making it impractical for most inference tasks. The work provides actionable guidance on selecting the most efficient method for posterior inference in large MJPs, enabling scalable probabilistic modeling in systems biology and related fields.

Abstract

Exact methods for exponentiation of matrices of dimension

can be computationally expensive in terms of execution time (

) and memory requirements (

) not to mention numerical precision issues. A type of matrix often exponentiated in the sciences is the rate matrix. Here we explore five methods to exponentiate rate matrices some of which apply even more broadly to other matrix types. Three of the methods leverage a mathematical analogy between computing matrix elements of a matrix exponential and computing transition probabilities of a dynamical processes (technically a Markov jump process, MJP, typically simulated using Gillespie). In doing so, we identify a novel MJP-based method relying on restricting the number of "trajectory" jumps based on the magnitude of the matrix elements with favorable computational scaling. We then discuss this method's downstream implications on mixing properties of Monte Carlo posterior samplers. We also benchmark two other methods of matrix exponentiation valid for any matrix (beyond rate matrices and, more generally, positive definite matrices) related to solving differential equations: Runge-Kutta integrators and Krylov subspace methods. Under conditions where both the largest matrix element and the number of non-vanishing elements scale linearly with

-- reasonable conditions for rate matrices often exponentiated -- computational time scaling with the most competitive methods (Krylov and one of the MJP-based methods) reduces to

with total memory requirements of

Paper Structure (29 sections, 35 equations, 9 figures, 1 table, 3 algorithms)

This paper contains 29 sections, 35 equations, 9 figures, 1 table, 3 algorithms.

Introduction
Methods
Background on MJPs
Forward Modeling
Inverse Modeling
Avoiding the matrix exponential in Methods #1 -- #5
Method #1 -- Trajectory-based MJP method (T-MJP)
Method #2 -- Jump-based MJP method (J-MJP)
Method #3 -- Remarginalized MJP method (R-MJP)
Method #4 -- Runge-Kutta
Method #5 -- Krylov subspace
Results
Examples
Birth-death process
Two state birth-death
...and 14 more sections

Figures (9)

Figure 1: Diagram depicting MJP trajectories, probability distributions, and the methods included in our benchmark. In a), we present sample trajectories of an MJP model of a birth-death process, detailed in Sec. \ref{['sec:Birth-Death']}. In b), three plots represent the probability distributions over states at times indicated by the dashed lines in a). In c), methods #1 --- #5 are categorized. Methods with the best scaling are highlighted in green. For more details, refer to Sec. \ref{['sec:inference']}.
Figure 2: Cartoon representation of dynamical systems described in Sec. \ref{['sec:examples']}. Panel a) represents a birth-death process of $R$ as described in Sec. \ref{['sec:Birth-Death']}, panel b) represents a system that switches between two states, only one of them, the active one $\sigma_\text{act}$, produces $R$, as in Sec. \ref{['sec:Gene_example']}, and panel c) represents an autoregulatory system, with one state, $\sigma_\text{act}$, generating a product $R$ which generates a second product $P$, which subsequently plays a role in inactivating the system by moving it to a state, $\sigma_\text{ina}$, where $R$ is not produced, as in Sec. \ref{['sec:autorel']}.
Figure 3: We recorded the wall time, $t_W$, necessary to obtain 100 samples from the posterior in relation to the total number of states. A comparison of J-MJP, R-MJP, RK4, and Krylov (KRY) for all three examples is presented above. In the birth-death process (Sec. \ref{['sec:Birth-Death']}) is presented on the top panel, the two states birth-death (Sec. \ref{['sec:Gene_example']}) is presented in the middle panel example and for the autoregulatory gene model (Sec. \ref{['sec:autorel']}) is presented in the bottom panel. In the birth-death example, we also compare to the conventional implementation of the matrix exponential (ME). When considering overall performance, the contest narrows down between R-MJP and Krylov, with Krylov tends to exhibit superior performance, especially as the state spaces increase in size.
Figure 4: Histogram of the posterior samples of each kinetic parameter for the two state birth-death. The histogram samples are compared with the simulation's ground truth (parameter values used for generating synthetic data). In each of these histograms we highlight the estimated expected value of the posterior (corresponding to the sample mean of each parameter) as well as the estimated $50\%$ and $95\%$ credible intervals (corresponding to the samples' 25th to 75th percentiles and 2.5th to 97.5th percentiles, respectively). We notice that R-MJP, RK4, and Krylov yield coherent posteriors. On the other hand J-MJP, while placing the bulk of its probability density in a similar parameter region, recovers confidence intervals inconsistent with the other methods.
Figure 5: Auto-correlation function \ref{['autocorr']} for the kinetic parameters sampled using J-MJP (on top) and R-MJP (in the bottom) within the two state birth-death example. The results confirm that the sampler generated using J-MJP is inefficient, as the autocorrelation does not move to near 0 even after tens of thousands samples, while in R-MJP the autocorrelation falls to zero after just a few hundred samples (as occurs with Krylov and RK4).
...and 4 more figures

Avoiding matrix exponentials for large transition rate matrices

TL;DR

Abstract

Avoiding matrix exponentials for large transition rate matrices

Authors

TL;DR

Abstract

Table of Contents

Figures (9)