Table of Contents
Fetching ...

The Limits of Inference in Complex Systems: When Stochastic Models Become Indistinguishable

Javier Aguilar, Miguel A. Muñoz, Sandro Azaele

TL;DR

The paper addresses the challenge of inferring parameters and discriminating among stochastic models from sparse time-series data when analytical likelihoods are unavailable. It develops a path-based Monte Carlo framework that combines full-path statistics with bridge processes and Radon–Nikodym derivatives to compute propagators and perform maximum-likelihood inference, even with coarse sampling. A key contribution is the bridge-change-of-measure estimator, which reduces bias and variance in propagator estimation and yields principled guidelines for optimal sampling times and dataset sizes. The framework is validated across diverse domains, revealing sharp, resolution-dependent limits on model distinguishability and emphasizing the importance of experimental design to maximize information under real-world constraints. Together, these results provide a practical toolkit for robust inference and principled measurement design in complex stochastic systems.

Abstract

Robust inference for stochastic dynamical systems is often hampered by sparse sampling and the absence of closed-form likelihoods. We introduce a Monte Carlo path-inference framework that leverages full-path statistics and bridge processes to deliver reliable parameter estimation and model selection from coarsely sampled time series, without requiring analytical solutions. Crucially, we couple mechanistic stochastic models with their inference procedures to quantify how experimental design -specifically, sampling frequency and dataset size- governs estimator precision and model distinguishability. This analysis reveals optimal sampling regimes and sharp, resolution-dependent limits beyond which competing models become empirically indistinguishable. We validate the approach across four disparate systems -trajectories of optically trapped particles, human microbiome dynamics, social-media topic mentions, and forest population time series- recovering parameters and identifying when inference is fundamentally constrained by measurement resolution, thereby clarifying ongoing debates about dominant noise sources in these systems. Together, these results establish path-based Monte Carlo as a practical, general tool for inference and model discrimination in complex systems and provide principled guidelines for designing measurements that maximize information under real-world constraints.

The Limits of Inference in Complex Systems: When Stochastic Models Become Indistinguishable

TL;DR

The paper addresses the challenge of inferring parameters and discriminating among stochastic models from sparse time-series data when analytical likelihoods are unavailable. It develops a path-based Monte Carlo framework that combines full-path statistics with bridge processes and Radon–Nikodym derivatives to compute propagators and perform maximum-likelihood inference, even with coarse sampling. A key contribution is the bridge-change-of-measure estimator, which reduces bias and variance in propagator estimation and yields principled guidelines for optimal sampling times and dataset sizes. The framework is validated across diverse domains, revealing sharp, resolution-dependent limits on model distinguishability and emphasizing the importance of experimental design to maximize information under real-world constraints. Together, these results provide a practical toolkit for robust inference and principled measurement design in complex stochastic systems.

Abstract

Robust inference for stochastic dynamical systems is often hampered by sparse sampling and the absence of closed-form likelihoods. We introduce a Monte Carlo path-inference framework that leverages full-path statistics and bridge processes to deliver reliable parameter estimation and model selection from coarsely sampled time series, without requiring analytical solutions. Crucially, we couple mechanistic stochastic models with their inference procedures to quantify how experimental design -specifically, sampling frequency and dataset size- governs estimator precision and model distinguishability. This analysis reveals optimal sampling regimes and sharp, resolution-dependent limits beyond which competing models become empirically indistinguishable. We validate the approach across four disparate systems -trajectories of optically trapped particles, human microbiome dynamics, social-media topic mentions, and forest population time series- recovering parameters and identifying when inference is fundamentally constrained by measurement resolution, thereby clarifying ongoing debates about dominant noise sources in these systems. Together, these results establish path-based Monte Carlo as a practical, general tool for inference and model discrimination in complex systems and provide principled guidelines for designing measurements that maximize information under real-world constraints.

Paper Structure

This paper contains 29 sections, 1 theorem, 138 equations, 8 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Let $\mathbb{Q}$ be a Markov path measure with fixed initial condition $X_0=x_0$, and propagator $\rho^{(\mathbb{Q})}$, meaning that ${\mathbb{E}}_{\mathbb{Q}}\left[\delta(X_t-x)|X_0=x_0\right]=\rho^{(\mathbb{Q})}_{0,t}(x|x_0)$. Then, the Radon-Nikodym derivative between the unconstrained process $\

Figures (8)

  • Figure 1: Population abundance fluctuations.Left: Example of a real trajectory showing the evolution of population density of a species belonging to the human gut microbiome caporaso2011moving. Right: Empirical stationary distribution corresponding to the trajectory on the left. The dashed and solid lines represent Gaussian and Gamma fits, respectively.
  • Figure 2: Estimates of the diffusion functions signal limits of inference. (a) Diffusion functions inferred from synthetic time series generated with the demographic noise (DE) model [Eq.\ref{['eq:general_model_demographic']}] (blue, top) and the Ornstein–Uhlenbeck (OU) process [Eq.\ref{['eq:OU_process']}] (red, bottom), for a small ratio of sampling interval to autocorrelation time ($\Delta t/\tau$, with $\tau = k^{-1}$). Although the OU and DE models have different stationary distributions, their parameters were tuned so that the stationary distributions share the same first and second centered moments. Dashed lines indicate the true diffusion function in the OU model, while solid lines indicate the true function in the DE model. (b) Same as (a), but for a larger $\Delta t/\tau$. The dash–dot line marks the limiting form of the diffusion estimator for large sampling intervals [Eq. \ref{['eq:limiting_form_of_QV_estimator']}]. (c) Mean of the diffusion estimator [Eq.\ref{['eq:mean_dif_estimator']}] as a function of $x$ and $\Delta t$, for the OU model (top) and the DE model (bottom). Dashed lines represent the corresponding true diffusion functions. See Appendix\ref{['AP_sec:B_err_df']} for the analytical expression of Eq.\ref{['eq:mean_dif_estimator']} and Appendix\ref{['AP_sec:parameters']} for parameter values.
  • Figure 3: Errors in parametric inference of $\mu$, $k$, and $D$ depend strongly on the experimental protocol. Panel (a) reports the estimation errors for $\mu$, $k$, and $D$ across different models and sampling times ($\Delta t$), using a fixed number of measurements ($M=100$). Panel (b) mirrors panel (a), but varies $M$ in addition to $\Delta t$. In (b), colors indicate different values of $M$, and symbols indicate the model. The parameters $\mu$ and $k$ were estimated with the high-frequency method of Appendix \ref{['AP_sec:MLE_Girsanov']}, while $D$ was obtained via the quadratic-variation approach of Appendix \ref{['AP_sec:QV_calculations']}. Parameter values are given in Appendix \ref{['AP_sec:parameters']}.
  • Figure 4: Propagator estimation using bridge change of measure. a) Trajectories of $100$ stochastic paths of a Wiener process (left), and its associated propagator estimation at time $t=10$ using the estimator of Eq. \ref{['eq:MC_estimator_prop']} (right). Note that probabilities computed with this estimation method are lower bounded by the number of paths, signaled with a vertical dashed line in a), right. Panel b) i) shows $100$ Wiener bridges, connecting the origin with a specified final state, $X_T=4$, while ii) shows the associated logarithm of the Radom-Nikodym derivative, computed through Eq. \ref{['eq:RN_der_2']} on every bridge. Finally, in iii) we used bridge paths to evaluate Radon--Nikodym derivatives and estimate the propagator for the EN model \ref{['eq:general_model_environmental']}. Each red dot represents the bridge change-of-measure estimator [Eq. \ref{['eq:estimator_BCM']}] with a different final condition, while the bars show the classical Monte Carlo estimator [Eq. \ref{['eq:MC_estimator_prop']}]. Both methods yield similar precision, but the bridge estimator required only $10^3$ paths compared to $10^5$ for standard Monte Carlo, making it far more efficient and eliminating the need for spatial discretization. Errors are on the order of the red circle size. In c), relative errors in the propagator estimation for different values of the number of bridges ($N_B$) and bridge discretization, $\Delta t^{(B)}$. The target process is the Ornstein-Uhlenbeck process, and we used Wiener bridges for the sampling. See parameter values in Appendix \ref{['AP_sec:parameters']}.
  • Figure 5: Model distinguishability transition. a) Stationary distributions for the OU and DE models (Eqs.\ref{['eq:OU_process']} and \ref{['eq:general_model_demographic']}, respectively) used to generate synthetic data. b) Probability of correctly identifying the data-generating model (OU vs. DE) as a function of sampling time interval and for different number of measurements ($M=100$ top and $M=1000$ bottom), showing improved performance using the bridge change of measure MLE (B, black squares) over the path integral approximation (PI, blue dots) as well as a sharp drop in identification accuracy around $\Delta t = \tau$, where loss of temporal correlation limits model discrimination. c), same as in panel b), but focusing in the distinguishability between EN and DE models (Eqs.\ref{['eq:general_model_environmental']} and \ref{['eq:general_model_demographic']}, respectively), sharing the same stationary distribution. d) Probability of distinguishing DE and EN models for different sampling times and number of measures in high-frequency regime using path integral approximation. e) Probability of distinguishing models as functions of $\Delta t$ and $M$ when competing models have the same stationary distribution (left panel for EN vs. DE models) and different stationary distributions (right panel for OU vs. DE models). f) Probability of correct generative model identification (EN model noted with green triangles and DE model noted with red stars) computed with path integral approximation (PI,top) and bridge change of measure (B,bottom); in the indistinguishable regime, $\Delta t\gg\tau$, inference methods systematically favor one model due to estimator-specific biases, despite theoretical equivalence, as both likelihoods are the same, Eq. \ref{['eq:prob_time_series_uncorrelated']}.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Theorem 1
  • proof
  • proof