Table of Contents
Fetching ...

Concentration of Empirical First-Passage Times

Rick Bebon, Aljaz Godec

TL;DR

This work develops a non-asymptotic framework to quantify uncertainty in empirical first-passage times for reversible Markov processes. By leveraging a spectral decomposition of the absorbing generator, it yields concentration inequalities that bound deviations of the empirical mean $\overline{\tau}_n$ from the true mean $\langle\tau\rangle$ for any sample size, enabling model-free confidence intervals via $\mathcal{U}_n^\pm$. It further provides two-sided bounds on extreme deviations, $\langle m_n^\pm \rangle$, to capture multi-time-scale dynamics where the mean is not representative. The results apply to Markov jump networks and diffusion in confining potentials, and offer practical guidance for experimental design through minimal sample-size prescriptions, with extensions to beyond-mean statistics and multiple searchers.

Abstract

First-passage properties are central to the kinetics of target-search processes. Theoretical approaches so far primarily focused on predicting first-passage statistics for a given process or model. In practice, however, one faces the reverse problem of inferring first-passage statistics from, typically sub-sampled, experimental or simulation data. Obtaining trustworthy estimates from under-sampled data and unknown underlying dynamics remains a daunting task, and the assessment of the uncertainty is imperative. In this chapter, we highlight recent progress in understanding and controlling finite-sample effects in empirical first-passage times of reversible Markov processes. Precisely, we present concentration inequalities bounding from above the deviations of the sample mean for any sample size from the true mean first-passage time and construct non-asymptotic confidence intervals. Moreover, we present two-sided bounds on the range of fluctuations, i.e, deviations of the expected maximum and minimum from the mean in any given sample, which control uncertainty even in situations where the mean is a priori not a sufficient statistic.

Concentration of Empirical First-Passage Times

TL;DR

This work develops a non-asymptotic framework to quantify uncertainty in empirical first-passage times for reversible Markov processes. By leveraging a spectral decomposition of the absorbing generator, it yields concentration inequalities that bound deviations of the empirical mean from the true mean for any sample size, enabling model-free confidence intervals via . It further provides two-sided bounds on extreme deviations, , to capture multi-time-scale dynamics where the mean is not representative. The results apply to Markov jump networks and diffusion in confining potentials, and offer practical guidance for experimental design through minimal sample-size prescriptions, with extensions to beyond-mean statistics and multiple searchers.

Abstract

First-passage properties are central to the kinetics of target-search processes. Theoretical approaches so far primarily focused on predicting first-passage statistics for a given process or model. In practice, however, one faces the reverse problem of inferring first-passage statistics from, typically sub-sampled, experimental or simulation data. Obtaining trustworthy estimates from under-sampled data and unknown underlying dynamics remains a daunting task, and the assessment of the uncertainty is imperative. In this chapter, we highlight recent progress in understanding and controlling finite-sample effects in empirical first-passage times of reversible Markov processes. Precisely, we present concentration inequalities bounding from above the deviations of the sample mean for any sample size from the true mean first-passage time and construct non-asymptotic confidence intervals. Moreover, we present two-sided bounds on the range of fluctuations, i.e, deviations of the expected maximum and minimum from the mean in any given sample, which control uncertainty even in situations where the mean is a priori not a sufficient statistic.
Paper Structure (12 sections, 25 equations, 5 figures)

This paper contains 12 sections, 25 equations, 5 figures.

Figures (5)

  • Figure 1: Schematic of target search processes for ergodic reversible Markov dynamics. Diffusive dynamics in (a) $d$ dimensional spherical domains (here $d=2$) with reflecting boundary $\partial\Omega$ and (b) arbitrary one-dimensional confining potential landscapes $U(x)$. (c) Markov jump dynamics on a discrete network state-space with transition rates that obey detailed balance. Search processes are initialized from $x_{t=0}$ (magenta), which is drawn from the stationary density $\tilde{p}_{\rm eq}(x)$ and we consider the first-passage time $\tau$ to reach the target (red).
  • Figure 2: Non-asymptotic concentration and sample-size effects of empirical first-passage times $\overline{\tau}_n$ around the true mean $\langle\tau\rangle$. (a) Schematic probability density of $\overline{\tau}_n$ inferred from a (small) sample of $n$ realizations. Fluctuations are quantified by tail probabilities of deviations of $\overline{\tau}_n$ from $\langle\tau\rangle$ by more than $t$ towards the right $\mathbb{P}(\overline{\tau}_n\geq\langle\tau\rangle +t)$ or the left $\mathbb{P}(\overline{\tau}_n\leq\langle\tau\rangle -t)$ and are shown in orange and blue, respectively. (b) Dependence of the upper bounds $\mathcal{U}_n^\pm(\mu_1 t; \mathcal{C})$ on sample size $n$ for the confined Brownian search in $d=3$. Bounds quantify how the probability of deviations from the sample mean drastically diminishes as $n$ increases (bright to dark; see (d)). Inset: corresponding empirical histogram of the sample mean $\overline{\tau}_n$ for different $n$ values. (c, d) Deviation probabilities and corresponding bounds for (c) a Markov network representation of protein folding ($\mathcal{C}\approx 1.05$) and (d) a spatially confined Brownian search ($\mathcal{C}\approx 1.99$) as introduced in Fig. \ref{['Fig1']}. Probabilities are scaled as $\mathbb{P}^{1/n}({\rm sgn}(t)[\overline{\tau}_n -\langle\tau\rangle]\geq |t|)$; right tail areas are shown for $t>0$ and left for $t<0$, respectively. Lower $\mathcal{L}_n^\pm(\mu_1 t)^{1/n}$ and upper bounds $\mathcal{U}_n^\pm(\mu_1 t; \mathcal{C})^{1/n}$ are depicted as black and red lines, respectively, and the model-free bounds $\mathcal{U}_n^\pm(\mu_1 t;2)^{1/n}$ are shown as yellow dashed line. Corresponding deviation probabilities obtained by numerical simulations as a function of $t$ for different fixed $n$ are denoted by symbols. Yellow and red curves coincide in (d) since $\mathcal{C}\approx 1.99$.
  • Figure 3: Sketch of the Cramér-Chernoff method. (a) Schematic of the inequality in Eq. \ref{['eq:inequality_1']}. (b) Illustration of the Legendre transform of $\psi_X(\lambda)$ (black) and $\phi_X(\lambda)$ (yellow). The optimal $\lambda_{\rm opt}$ maximizes the difference with $\lambda t$ (blue); $\psi_X^\ast(t)$ (red) and $\phi_X^\ast(t)$ (green) denoting the corresponding Legendre transform (Eq. \ref{['eq:legendre']}), respectively. Note that $\phi_X(\lambda)\geq \psi_X(\lambda)$ implies $\phi_X^\ast(t)\leq\psi_X^\ast(t)$.
  • Figure 4: Uncertainty quantification of the sample-mean $\overline{\tau}_n$ for any sample-size $n$. (a) Probability that the sample-mean $\overline{\tau}_n$ lies within an interval of $[-t_{\alpha_-,n}^-, t_{\alpha_+,n}^+]$ around $\langle\tau\rangle$ with a probability of at least $1-(\alpha_- + \alpha_+)$. (b) Model-free 90% confidence region (i.e., $\alpha=0.1$) that the error in units $1/\mu_1$ remains within $\pm 10\%$ as a function of sample-size $n$. (c) Minimal-sample size $n_{\rm min}$ required to ensure that the relative error $\mu_1(\overline{\tau}_n-\langle\tau\rangle)$ does not exceed $\pm 10\%$ as a function of confidence level $1-\alpha$; a confidence of at least $90\%$ is shown as dashed line.
  • Figure 5: Extreme deviations from the mean first-passage time. (a) Maximal $\tau_n^+\equiv\max_{i\in[1,n]}\tau_i$ and minimal $\tau_n^-\equiv\min_{i\in[1,n]}\tau_i$ first-passage time in a sample of $n$ i.i.d. realizations. (b, c) Average maximal ($+$) and minimal ($-$) deviation from the mean $\langle m_n^\pm\rangle\equiv\langle \tau_n^\pm -\langle\tau\rangle \rangle$ from extensive computer simulations (symbols) with lower $\underline{\mathcal{M}}_n^\pm$ (dashed black lines) and upper $\overline{\mathcal{M}}_n^\pm$ bound (black lines) shown for $n=10$ and $n=3$, respectively. Quantities are expressed in units of $1/\mu_1$ and shown as a function of $\mu_1\langle\tau\rangle$.