Table of Contents
Fetching ...

Information Criteria Fail for Dynamical Systems: Sampling Rate and Dimension Dependence

Kumar Utkarsh, Daniel M. Abrams

TL;DR

The paper tackles the reliability of information criteria like AIC and BIC for model selection in dynamical systems, where temporal correlations violate the independence assumption. It develops an analytical framework that yields explicit sampling-rate and dimension-dependent crossovers for simple motifs such as exponential decay and harmonic oscillators, enabling practitioners to predict when standard criteria will fail. Key contributions include closed-form crossover frequencies f_c^{(1)} and f_c^{(2)} for sampling-rate effects and detailed dimension-dependent thresholds N_{crit} under different data-scaling regimes. The work provides actionable guidance for experimental design to avoid pathological regimes and clarifies fundamental limitations of likelihood-based selection for temporally correlated data.

Abstract

Information criteria such as Akaike's (AIC) and Bayes' (BIC) are widely used for model selection in physics and beyond, quantifying the tradeoff between model complexity and goodness-of-fit to enforce parsimony. However, their derivation assumes uncorrelated samples, an assumption systematically violated by dynamical systems data. Here, through analysis of simple but representative dynamical models -- exponential decay, harmonic oscillation, and chaos -- we demonstrate that model selection depends sensitively on sampling rate and system dimensionality. We derive explicit formulas predicting when standard information criteria fail that should be adaptable to many real-world scenarios, enabling experimentalists to design sampling protocols that avoid pathological regimes.

Information Criteria Fail for Dynamical Systems: Sampling Rate and Dimension Dependence

TL;DR

The paper tackles the reliability of information criteria like AIC and BIC for model selection in dynamical systems, where temporal correlations violate the independence assumption. It develops an analytical framework that yields explicit sampling-rate and dimension-dependent crossovers for simple motifs such as exponential decay and harmonic oscillators, enabling practitioners to predict when standard criteria will fail. Key contributions include closed-form crossover frequencies f_c^{(1)} and f_c^{(2)} for sampling-rate effects and detailed dimension-dependent thresholds N_{crit} under different data-scaling regimes. The work provides actionable guidance for experimental design to avoid pathological regimes and clarifies fundamental limitations of likelihood-based selection for temporally correlated data.

Abstract

Information criteria such as Akaike's (AIC) and Bayes' (BIC) are widely used for model selection in physics and beyond, quantifying the tradeoff between model complexity and goodness-of-fit to enforce parsimony. However, their derivation assumes uncorrelated samples, an assumption systematically violated by dynamical systems data. Here, through analysis of simple but representative dynamical models -- exponential decay, harmonic oscillation, and chaos -- we demonstrate that model selection depends sensitively on sampling rate and system dimensionality. We derive explicit formulas predicting when standard information criteria fail that should be adaptable to many real-world scenarios, enabling experimentalists to design sampling protocols that avoid pathological regimes.

Paper Structure

This paper contains 10 sections, 27 equations, 3 figures.

Figures (3)

  • Figure 1: Equilibration: Selected model varies with sampling rate. Top: Comparison of AIC values for true deterministic decay model (blue) and pure noise null model (red) as a function of sampling frequency $f$. Theoretical crossover points are indicated by vertical orange dashed lines: $f_c^{(1)} = 8\lambda\sigma^2/x_0^2 = 5.0$ and $f_c^{(2)} = M^{3/2}\lambda x_0^2/(4\sqrt{3}\sigma) \approx 2000$. Bottom: Proportion of trials in which each model is selected as a function of sampling frequency $f$ (1000 trials). Note that the noise model is selected at high and low frequencies (left and right on graph). Parameters: $(x_0, \lambda, \mu, \sigma, M) = (1, 0.1, 0, 2.5, 2000)$. Points in upper panel represent means over trials at each frequency.
  • Figure 2: Oscillation: Selected model varies with sampling rate.Left panels: Model selection at a single sampling frequency ($f = 0.67$) showing (top) normalized AIC values for the SHO model (blue circles) and pure noise model (red squares) as functions of noise-to-amplitude ratio $\sigma/A$, and (bottom) percentage of Monte Carlo simulations preferring each model. Vertical dashed line indicates the theoretical crossover point $(\sigma/A)_c = \sqrt{f t_{\max}/8}$. Right panel: Numerical confirmation of $\sqrt{f}$ scaling law critical noise ratio. Red filled circles: simulation, black line: theory. Parameters: $(A, \omega, f_0, t_{\max}) = (1, 2\pi, 1, 1000)$.
  • Figure 3: Equilibration: Selected model varies with system dimension.Top:$\Delta\text{AIC} = \text{AIC}_{\text{decay}} - \text{AIC}_{\text{noise}}$. Vertical orange dashed lines indicate theoretical crossover points $\Delta\text{AIC} = 0$. Bottom: Proportion of simulations (500 trials) selecting equilibration (blue) or pure noise (red) model. Left panels: Fixed data per dimension ($M = 100$ constant). Single crossover at $N_{\text{crit}} \approx 7$. Middle panels: Fixed total data ($MN = 1250$ constant). Two crossovers predicted: $N_{\text{crit}}^{\text{(low)}} \approx 6$ (large $M$ per agent) and $N_{\text{crit}}^{\text{(high)}} \approx 225$ (small $M$ per agent). Right panels: Combinatorial effects ($M/N = 10$ constant). Single crossover at $N_{\text{crit}} \approx 6$. All panels have fixed sampling frequency $f = 10$. Parameters: $x_0 = 2$ (known a priori, not fitted), ($\lambda$, $\mu$, $\sigma) = (1, 1, 8)$.