Table of Contents
Fetching ...

Deep Predictor-Corrector Networks for Robust Parameter Estimation in Non-autonomous System with Discontinuous Inputs

Gyeongwan Gu, Jinwoo Hyun, Hyeontae Jo, Jae Kyoung Kim

Abstract

Learning under non-smooth objectives remains a fundamental challenge in machine learning, as abrupt changes in conditioning variables can induce highly non-smooth loss landscapes and destabilize optimization. This difficulty is particularly pronounced in non-autonomous dynamical systems driven by discontinuous inputs, where widely used optimization methods, including recent neural smoothing approaches, exhibit unreliable convergence or strong hyperparameter sensitivity. To address this issue, we propose Deep Predictor-Corrector Networks (DePCoN), a multi-scale learning framework that stabilizes optimization by learning scale-consistent parameter update rules across a hierarchy of smoothed inputs. Rather than treating smoothing as a fixed preprocessing choice, DePCoN integrates smoothing into the learning dynamics itself through a learned predictor-corrector mechanism. Across biological and ecological benchmarks with discontinuous inputs, DePCoN consistently achieves more robust and faster convergence than existing methods while substantially reducing sensitivity to hyperparameter choices. Beyond dynamical systems, our approach provides a general learning principle for stabilizing optimization under non-smooth objectives.

Deep Predictor-Corrector Networks for Robust Parameter Estimation in Non-autonomous System with Discontinuous Inputs

Abstract

Learning under non-smooth objectives remains a fundamental challenge in machine learning, as abrupt changes in conditioning variables can induce highly non-smooth loss landscapes and destabilize optimization. This difficulty is particularly pronounced in non-autonomous dynamical systems driven by discontinuous inputs, where widely used optimization methods, including recent neural smoothing approaches, exhibit unreliable convergence or strong hyperparameter sensitivity. To address this issue, we propose Deep Predictor-Corrector Networks (DePCoN), a multi-scale learning framework that stabilizes optimization by learning scale-consistent parameter update rules across a hierarchy of smoothed inputs. Rather than treating smoothing as a fixed preprocessing choice, DePCoN integrates smoothing into the learning dynamics itself through a learned predictor-corrector mechanism. Across biological and ecological benchmarks with discontinuous inputs, DePCoN consistently achieves more robust and faster convergence than existing methods while substantially reducing sensitivity to hyperparameter choices. Beyond dynamical systems, our approach provides a general learning principle for stabilizing optimization under non-smooth objectives.
Paper Structure (21 sections, 11 theorems, 90 equations, 6 figures, 3 tables)

This paper contains 21 sections, 11 theorems, 90 equations, 6 figures, 3 tables.

Key Result

Lemma 2.4

Let $S\in\mathcal{L}^2(\mathbb{R})$ be an input supported on $[0,T]$, extended by $0$ to $\mathbb{R}\setminus[0,T]$. Assume that $\{K_\tau\}_{\tau>0}\subset C^\infty(\mathbb{R})$ is a family of nonnegative kernels satisfying Then $\|K_\tau * S - S\|_{\mathcal{L}^2(\mathbb{R})}\to 0 \text{ as }\tau\to0^+,$ where, for each $\tau>0$, and $K_\tau * S\in C^{\infty}(\mathbb{R})$.

Figures (6)

  • Figure 1: Failure of widely used local optimization under discontinuous inputs and hyperparameter sensitivity of HADES-NN. (a) Human circadian pacemaker model in which discontinuous light exposure $S(t)$ is transformed into an effective photic drive $B(t)$ and subsequently modulates the internal oscillator states $\vec{y}(t)=\left(y_1(t),y_2(t)\right)$ (i). Synthetic observations are generated (ii) by integrating the system for 144 hours using real-world light measurements $S(t)$ (iii). (b) Scatter plots of parameter estimates obtained by widely used optimization methods, including L-BFGS and LM, from randomized initializations. The gray box indicates the range of parameters from which initial values were randomly sampled. All estimates are normalized by the true parameter values, which are located at the intersection of the dashed lines. Each point represents the final estimate from 30 independent optimization trials, illustrating large dispersion and unstable convergence under irregular exogenous inputs. (c) Schematic of HADES-NN, which alternates between neural smoothing of the discontinuous input and parameter optimization. At each outer iteration, a smoothed input $\tilde{S}_n$ is refined through $M$ inner updates and used to estimate system parameters. (d) Sensitivity of HADES-NN to the smoothing depth $M$. While increasing $M$ enhances regularization, excessive smoothing suppresses informative input variations and induces estimation bias, whereas small $M$ provides insufficient stabilization.
  • Figure 2: DePCoN stabilizes parameter estimation by learning parameter update rules that remain consistent across multiple input-smoothing scales. (a) Multi-scale preprocessing of the discontinuous exogenous input $S(t)$ via heat-kernel convolution, producing a hierarchy of smoothed inputs $\{S_{\tau_n}(t)\}_{n=0}^{N}$ with normalized scales $\tau_n=n/N$, from lightly smoothed ($\tau_1$) to heavily smoothed ($\tau_N$). (b) Predictor stage. Starting from the coarsest scale $\tau_N$, the predictor network $f_\theta$ propagates parameter estimates toward finer scales according to $\vec{p}_{\tau_{n-1}} = f_\theta(\vec{p}_{\tau_n})$, transferring stable information to increasingly discontinuous regimes. (c) Corrector stage. For each scale $\tau_n$, the predicted parameters $\vec{p}_{\tau_n}$ and smoothed input $S_{\tau_n}(t)$ define a $\tau$-dependent non-autonomous system whose Neural ODE solution $\vec{y}_{\tau_n}$ is compared with observations. (d) Training objective. The predictor network is trained end-to-end by minimizing a multi-scale loss $\sum_{n=0}^{N}\|\vec{y}_{\tau_n}-\vec{y}_{\mathrm{o}}\|$, enforcing consistency across scales and reducing sensitivity to any single smoothing choice.
  • Figure 3: Hyperparameter robustness and computational efficiency of DePCoN. (a) Parameter estimation results of DePCoN under different smoothing-grid sizes $N=\{4,8,12\}$. Each panel shows the distribution of estimated parameters $(\tau_c, G)$ obtained from randomized initializations. In contrast to the pronounced hyperparameter sensitivity observed for HADES-NN in Fig. 1(d), the estimates produced by DePCoN remain tightly concentrated across all values of $N$, indicating substantially reduced sensitivity to the smoothing-grid hyperparameter. Gray boxes indicate the parameter ranges used for random initialization, and all estimates are normalized by the true parameter values, located at the intersection of the dashed lines. (b) MAPE trajectories over a 1-hour interval comparing DePCoN with HADES-NN. DePCoN rapidly enters a low-error regime and exhibits stable convergence, whereas HADES-NN converges more slowly and remains at a higher error level.
  • Figure 4: Robust parameter estimation on the Lotka--Volterra system with discontinuous exogenous inputs. (a) Modified Lotka--Volterra model describing prey--predator dynamics driven by an abrupt external signal $S(t)$.The resulting trajectories exhibit pronounced non-smooth behavior under real-world--like environmental perturbations. (b) Parameter estimates obtained using widely used optimization methods under identical experimental settings. Each scatter plot shows estimates of parameter pairs from 30 independent trials with randomized initializations, revealing broad dispersion and strong sensitivity to initialization. (c) Results for HADES-NN using hyperparameter configurations reported to perform well in prior studies doi:10.1137/25M1741340. Despite neural smoothing of the discontinuous input, the resulting estimates remain unstable and exhibit systematic bias away from the ground-truth parameters. (d) DePCoN yields tightly clustered parameter estimates centered at the ground truth, demonstrating both high accuracy and strong consistency. All parameter estimates are normalized by the true parameter values, which are located at the intersection of the dashed lines, and gray boxes indicate the parameter ranges used for random initialization.
  • Figure 5: Comparison of parameter estimation methods on the human circadian pacemaker model introduced in \ref{['fig:1']}. The scatter plots summarize parameter estimates from 30 independent trials for each of seven methods (DE, L-BFGS, LM, NM, SLSQP, HADES-NN, and DePCoN). All parameter estimates are normalized by the true parameter values, which are located at the intersection of the dashed lines, and gray boxes indicate the parameter ranges used for random initialization. DE failed to produce valid parameter estimates due to numerical divergence during the search process. (a) Under discontinuous exogenous inputs, the resulting non-smooth loss landscape causes L-BFGS, LM, and NM to remain trapped near their initializations. (b) HADES-NN achieves improved estimation performance relative to these baselines, whereas (c) DePCoN consistently and accurately recovers the ground-truth parameter values across trials.
  • ...and 1 more figures

Theorems & Definitions (21)

  • Lemma 2.4
  • proof
  • Corollary 2.5
  • proof
  • Lemma 2.6
  • proof
  • Corollary 2.7
  • proof
  • Corollary 2.8
  • proof
  • ...and 11 more