Pathspace Kalman Filters with Dynamic Process Uncertainty for Analyzing Time-course Data
Chaitra Agrahar, William Poole, Simone Bianco, Hana El-Samad
TL;DR
The paper tackles time-course analysis under changing model reliability by introducing Pathspace Kalman Filter (PKF), which iteratively processes the entire trajectory to dynamically update the process uncertainty $Q_t^i$ and quantify distinct uncertainty sources. PKF uses a three-way convex combination of data, model, and prior output, with analytically solvable weights and convergence guarantees, enabling non-monotone Kalman gains in time and automatic change-point detection. It also develops efficient Bayesian computations via ODE-spline-based local posteriors to estimate $E[oldsymbol{M}_t^i]$ and $ ext{Var}[oldsymbol{M}_t^i]$ without costly MCMC. Empirically, PKF outperforms standard KF variants and Bayesian smoothers on synthetic data by orders of magnitude in MSE and scales to large biological time-course datasets (1.8 million measurements) with per-gene parallelization, making it a practical tool for identifying dynamic regime changes in gene expression and similar time-series.
Abstract
Kalman Filter (KF) is an optimal linear state prediction algorithm, with applications in fields as diverse as engineering, economics, robotics, and space exploration. Here, we develop an extension of the KF, called a Pathspace Kalman Filter (PKF) which allows us to a) dynamically track the uncertainties associated with the underlying data and prior knowledge, and b) take as input an entire trajectory and an underlying mechanistic model, and using a Bayesian methodology quantify the different sources of uncertainty. An application of this algorithm is to automatically detect temporal windows where the internal mechanistic model deviates from the data in a time-dependent manner. First, we provide theorems characterizing the convergence of the PKF algorithm. Then, we numerically demonstrate that the PKF outperforms conventional KF methods on a synthetic dataset lowering the mean-squared-error by several orders of magnitude. Finally, we apply this method to biological time-course dataset involving over 1.8 million gene expression measurements.
