Table of Contents
Fetching ...

High-Dimensional Markov-switching Ordinary Differential Processes

Katherine Tsai, Mladen Kolar, Sanmi Koyejo

TL;DR

This work tackles parameter recovery for high-dimensional Markov-switching ODEs with nonlinear additive dynamics from discretely observed data. It introduces a two-stage estimation approach that first reconstructs the continuous trajectory via wavelet smoothing and then applies a (truncated) EM algorithm to infer the transition-rate matrix $Q$ and additive-function coefficients, with theoretical guarantees under $β$-mixing. The authors derive convergence and error bounds for both the trajectory recovery and the EM procedure, and demonstrate graph-recovery performance and edge detection under realistic sample sizes. The method is validated on simulated data and applied to resting-state ADHD fMRI, where group differences in transition dynamics and state-specific connectomes are revealed, highlighting the approach’s practical impact for understanding time-varying brain networks.

Abstract

We investigate the parameter recovery of Markov-switching ordinary differential processes from discrete observations, where the differential equations are nonlinear additive models. This framework has been widely applied in biological systems, control systems, and other domains; however, limited research has been conducted on reconstructing the generating processes from observations. In contrast, many physical systems, such as human brains, cannot be directly experimented upon and rely on observations to infer the underlying systems. To address this gap, this manuscript presents a comprehensive study of the model, encompassing algorithm design, optimization guarantees, and quantification of statistical errors. Specifically, we develop a two-stage algorithm that first recovers the continuous sample path from discrete samples and then estimates the parameters of the processes. We provide novel theoretical insights into the statistical error and linear convergence guarantee when the processes are $β$-mixing. Our analysis is based on the truncation of the latent posterior processes and demonstrates that the truncated processes approximate the true processes under mixing conditions. We apply this model to investigate the differences in resting-state brain networks between the ADHD group and normal controls, revealing differences in the transition rate matrices of the two groups.

High-Dimensional Markov-switching Ordinary Differential Processes

TL;DR

This work tackles parameter recovery for high-dimensional Markov-switching ODEs with nonlinear additive dynamics from discretely observed data. It introduces a two-stage estimation approach that first reconstructs the continuous trajectory via wavelet smoothing and then applies a (truncated) EM algorithm to infer the transition-rate matrix and additive-function coefficients, with theoretical guarantees under -mixing. The authors derive convergence and error bounds for both the trajectory recovery and the EM procedure, and demonstrate graph-recovery performance and edge detection under realistic sample sizes. The method is validated on simulated data and applied to resting-state ADHD fMRI, where group differences in transition dynamics and state-specific connectomes are revealed, highlighting the approach’s practical impact for understanding time-varying brain networks.

Abstract

We investigate the parameter recovery of Markov-switching ordinary differential processes from discrete observations, where the differential equations are nonlinear additive models. This framework has been widely applied in biological systems, control systems, and other domains; however, limited research has been conducted on reconstructing the generating processes from observations. In contrast, many physical systems, such as human brains, cannot be directly experimented upon and rely on observations to infer the underlying systems. To address this gap, this manuscript presents a comprehensive study of the model, encompassing algorithm design, optimization guarantees, and quantification of statistical errors. Specifically, we develop a two-stage algorithm that first recovers the continuous sample path from discrete samples and then estimates the parameters of the processes. We provide novel theoretical insights into the statistical error and linear convergence guarantee when the processes are -mixing. Our analysis is based on the truncation of the latent posterior processes and demonstrates that the truncated processes approximate the true processes under mixing conditions. We apply this model to investigate the differences in resting-state brain networks between the ADHD group and normal controls, revealing differences in the transition rate matrices of the two groups.
Paper Structure (60 sections, 45 theorems, 414 equations, 12 figures, 3 tables, 1 algorithm)

This paper contains 60 sections, 45 theorems, 414 equations, 12 figures, 3 tables, 1 algorithm.

Key Result

Proposition 2.1

Assume that the following properties holds: Define $A=-2\text{diag}(\beta_1,\ldots,\beta_k)-Q^\star$. Suppose that $A$ is an nonsingular M-matrix. Then, under additional regularity conditions, Assumption assumption:oppensetirreducible--assumption:A4_1 stated in Appendix, the joint process $(Z(t_n), X(t_n), Y_n)$ is $\beta$-mixing.

Figures (12)

  • Figure 1: The box plots of model selection procedure introduced in Section \ref{['ssec:parameterselect']}. The orange line denotes the median, the upper line of the box denotes the third quartile (Q$3$), and the lower line of the box denotes the first quartile (Q$1$). The whisekrs extend from the box toward Q$3+1.5$IQR and Q$1-1.5$IQR, where IQR is the inter-quartile range. The dots denote the outliers that lie outside the ends of the whiskers. The true number of states are $2$ for both cases and the true number of basis is $3$ for case $1$ and $1$ for case $2$. The results indicate that when the sample size is sufficient large, the model selection procedure is able to select the true number of states and the number of basis. Furthermore, the optimal $\lambda$ decrease with the increase of sample size, whose trend matches the result of Theorem \ref{['theorem:graphrecovery']}.
  • Figure 2: The $\ell_2$ distance of the estimated parameters to the ground truth. The top row shows the result of case $1$, and there is a drop of distance when the sample size is greater than $160$. The bottom row shows the result of case $2$. The distance of $\{\theta_{ij}^\ell\}$ and $\sigma^2$ consistently decrease as the sample size increases.
  • Figure 3: The average ROC of the simulated tasks over $10$ independent runs with $X(t)$ estimated using the procedure developed in Section \ref{['ssec:waveletsmooth']}. Top row. Results of the proposed method. The AUC consistently increases as the sample size increases for all for graphs. Bottom row. Results of the oracle method, where the latent process $Z(t)$ is assumed to be known. When the latent state is known, the AUC is larger compared to the AUC of the proposed method given same sample size.
  • Figure 4: The average ROC of the simulated tasks over $10$ independent runs with ground truth $X(t)$ given. Top row. Results of the proposed method. Bottom row. Results of the oracle method, where the latent process $Z(t)$ is assumed to be known. When $X(t)$ is given, both methods can get good result when $N\geq 80$.
  • Figure 5: (a)--(c) are the connectomes of each state. The red arrow indicates that $\theta_{ij}^\ell$ is positive and the blue arrow indicates that $\theta_{ij}^\ell$ is negative. The darker the color is, the larger the absolute value of $\theta_{ij}^\ell$ is. (d) shows estimated transition rate matrix. (e) Each figure is the probability map $P(Z(t_n)\mid Y_0^N; Q_{\text{group}}, \{\theta_{ij}^\ell\})$ of each subject. The x-axis is the time point and the y-axis is accumulated probability that sums to $1$. Left column shows the probability maps of all subject from TDC group and right column shows the probability maps of all subject from ADHD-C group.
  • ...and 7 more figures

Theorems & Definitions (61)

  • Definition 1: $\beta$-mixing
  • Definition 2: Geometric $\beta$-mixing
  • Proposition 2.1
  • Proposition 2.2: Linear Model
  • Definition 3
  • Proposition 4.1
  • Definition 4
  • Proposition 4.2: One-step Update of Population Log-likelihood
  • Lemma 4.3
  • Lemma 4.4
  • ...and 51 more