Table of Contents
Fetching ...

Pathspace Kalman Filters with Dynamic Process Uncertainty for Analyzing Time-course Data

Chaitra Agrahar, William Poole, Simone Bianco, Hana El-Samad

TL;DR

The paper tackles time-course analysis under changing model reliability by introducing Pathspace Kalman Filter (PKF), which iteratively processes the entire trajectory to dynamically update the process uncertainty $Q_t^i$ and quantify distinct uncertainty sources. PKF uses a three-way convex combination of data, model, and prior output, with analytically solvable weights and convergence guarantees, enabling non-monotone Kalman gains in time and automatic change-point detection. It also develops efficient Bayesian computations via ODE-spline-based local posteriors to estimate $E[oldsymbol{M}_t^i]$ and $ ext{Var}[oldsymbol{M}_t^i]$ without costly MCMC. Empirically, PKF outperforms standard KF variants and Bayesian smoothers on synthetic data by orders of magnitude in MSE and scales to large biological time-course datasets (1.8 million measurements) with per-gene parallelization, making it a practical tool for identifying dynamic regime changes in gene expression and similar time-series.

Abstract

Kalman Filter (KF) is an optimal linear state prediction algorithm, with applications in fields as diverse as engineering, economics, robotics, and space exploration. Here, we develop an extension of the KF, called a Pathspace Kalman Filter (PKF) which allows us to a) dynamically track the uncertainties associated with the underlying data and prior knowledge, and b) take as input an entire trajectory and an underlying mechanistic model, and using a Bayesian methodology quantify the different sources of uncertainty. An application of this algorithm is to automatically detect temporal windows where the internal mechanistic model deviates from the data in a time-dependent manner. First, we provide theorems characterizing the convergence of the PKF algorithm. Then, we numerically demonstrate that the PKF outperforms conventional KF methods on a synthetic dataset lowering the mean-squared-error by several orders of magnitude. Finally, we apply this method to biological time-course dataset involving over 1.8 million gene expression measurements.

Pathspace Kalman Filters with Dynamic Process Uncertainty for Analyzing Time-course Data

TL;DR

The paper tackles time-course analysis under changing model reliability by introducing Pathspace Kalman Filter (PKF), which iteratively processes the entire trajectory to dynamically update the process uncertainty and quantify distinct uncertainty sources. PKF uses a three-way convex combination of data, model, and prior output, with analytically solvable weights and convergence guarantees, enabling non-monotone Kalman gains in time and automatic change-point detection. It also develops efficient Bayesian computations via ODE-spline-based local posteriors to estimate and without costly MCMC. Empirically, PKF outperforms standard KF variants and Bayesian smoothers on synthetic data by orders of magnitude in MSE and scales to large biological time-course datasets (1.8 million measurements) with per-gene parallelization, making it a practical tool for identifying dynamic regime changes in gene expression and similar time-series.

Abstract

Kalman Filter (KF) is an optimal linear state prediction algorithm, with applications in fields as diverse as engineering, economics, robotics, and space exploration. Here, we develop an extension of the KF, called a Pathspace Kalman Filter (PKF) which allows us to a) dynamically track the uncertainties associated with the underlying data and prior knowledge, and b) take as input an entire trajectory and an underlying mechanistic model, and using a Bayesian methodology quantify the different sources of uncertainty. An application of this algorithm is to automatically detect temporal windows where the internal mechanistic model deviates from the data in a time-dependent manner. First, we provide theorems characterizing the convergence of the PKF algorithm. Then, we numerically demonstrate that the PKF outperforms conventional KF methods on a synthetic dataset lowering the mean-squared-error by several orders of magnitude. Finally, we apply this method to biological time-course dataset involving over 1.8 million gene expression measurements.
Paper Structure (28 sections, 57 equations, 8 figures, 3 tables, 1 algorithm)

This paper contains 28 sections, 57 equations, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: A. A schematic of the Kalman filtering framework. B. Graphical model for the KF. C. Graphical model of the proposed PKF. D. Illustration of a KF which makes predictions forward in time $t$ as new data is acquired. E. Illustration of a PKF which ingests the entire temporal trajectory, over all $t$, to produce an entire trajectory as output. At each iteration $i$, the PKF output trajectory is fed back into the PKF as the input for the next iteration $i+1$ of filtering.
  • Figure 2: Models are fit across time windows of three measurements, where the first ($t_0)$ and third data-points ($t_1)$ are fixed, and the middle point (denoted by the yellow X mark at $t$) is allowed to vary. A. Illustrates the model fit and the process uncertainties when the model variance is relatively large, as represented by the spread of the colored lines, and the process uncertainty is small, as represented by the proximity of the estimated data-point (yellow X mark) to the measured data-point (black dot). B. Illustrates the fit where the process uncertainty is high, but the model variance is low. The implications of these relative measures are further elaborated in Table \ref{['table:cases']}.
  • Figure 3: Comparison of the adaptive non-linear univariate KF and PKF on synthetic population dynamics data. A. A synthetic dataset of population dynamics generated from simulating a birth-death model and adding Gaussian noise. B. Parameters of the underlying model generating the simulated data. The growth and death rates change at the black vertical lines and the noise increases at the gray vertical line through the end of the time course, as indicated in all panels. C. Filter output for an adaptive non-linear KF with $Q=1$. Notice that the filter estimate lags changes in the data generating process. D. Model and data weights for the same KF. E. Process uncertainty, filter variance, and mean-squared error for the same KF. F-G. Filter output and parameters for an adaptive non-linear KF with $Q=10$. Notice that increasing $Q$ decreases model weight. H. Process uncertainty, filter variance, and mean-squared error for the same KF. I. Filter output for the PKF shows no lag. J. Weights for the PKF show that after many iterations, the filter converges to a low variance estimate because the filter weight $\mathop{\mathrm{\boldsymbol{u}}}\nolimits > \mathop{\mathrm{\boldsymbol{w}}}\nolimits, \mathop{\mathrm{\boldsymbol{v}}}\nolimits$. K. Mean-squared error, filter variance, and process uncertainty for the PFK. Notice that the mean-squared error is lowest for this model. Additionally, the process uncertainty spikes precisely when the data generating process changes.
  • Figure 4: Application of the PKF to gene expression data in two conditions: Clock Active and Clock Repressed. A. Mean expression data of core circadian clock gene BMAL1 which oscillates when the clock is active and fails to oscillate when it is repressed. The shaded regions denote standard deviation. B. Filter output derived from the data in A. Shaded region denotes filter variance. C. Dynamic Process uncertainty associated with the filter output in B. D. Average log process uncertainty divided by data variance for $1.8$ million gene expression measurements separated by condition and plotted against their percentile variance. Turning off the circadian clock decreases process uncertainty for highly varying genes which is consistent with the circadian clock being a global regulator of gene expression.
  • Figure 5: Outputs of the methods benchmarked in Table \ref{['table:method_comp']}.
  • ...and 3 more figures