Table of Contents
Fetching ...

An output scaling layer boosts deep neural networks for multiscale ODE systems

Yuxiao Yi, Weizong Wang, Tianhan Zhang, Zhi-Qin John Xu

TL;DR

This work tackles the challenge of modeling multiscale, stiff reaction dynamics by introducing Generalized Box-Cox Transformation (GBCT) as an output-scaling layer. GBCT is implemented as an odd extension of Box-Cox to handle sign-changing data, integrated into a data-driven surrogate framework (GBCTNet) that uses Box-Cox transformed inputs (B(Y)) and GBCT-transformed outputs to stabilize training and improve long-term predictions. Across six benchmarks—including methane/air kinetics, nuclear reactions, Robertson diffusion, turbulent ignition, and nuclear flames—GBCTNet significantly reduces prediction errors, enhances stability, and accelerates training (roughly 6x faster in terms of epochs to reach similar accuracy). Frequency analysis shows GBCT shifts high-frequency components toward lower frequencies, aligning with neural networks’ bias and improving generalization, while remaining plug-in compatible with PINNs and operator-learning methods. The results suggest GBCT is a practical, model-agnostic tool to mitigate multiscale effects in complex dynamical systems.

Abstract

Simulating complex diffusion-reaction systems is often prohibitively expensive due to the high dimensionality and stiffness of the underlying ODEs, where state variables may span tens of orders of magnitude. Deep learning has recently emerged as a powerful tool in scientific computing, achieving remarkable progress in modeling and sampling stiff systems. However, data scaling techniques remain largely underexplored, despite their crucial role in addressing the frequency bias of deep neural networks when handling multi-magnitude or high-frequency data. In this work, we propose the Generalized Box-Cox Transformation (GBCT), a novel nonlinear scaling method designed to mitigate multiscale challenges by rescaling inherent multi-magnitude components toward a more consistent order of magnitude. We integrate GBCT into our previous data-driven framework and evaluate its performance against the original baseline surrogate model across six representative scenarios: a 21-species chemical reaction kinetics, a 13-isotope nuclear reaction model, the well-known Robertson problem coupled with diffusion, and practically relevant simulations of two-dimensional turbulent reaction-diffusion systems as well as one- and two-dimensional nuclear reactive flows. Numerical experiments demonstrate that GBCT reduces prediction errors by up to two orders of magnitude compared with the baseline model - particularly in the long-term evolution of dynamical systems - and achieves comparable performance with only about one-sixth of the training epochs. Frequency analysis further reveals that GBCT rescales high-frequency components of the objective function toward lower frequencies to align with the neural network's natural low frequency bias, thereby boosting training and generalization. The source code to reproduce the results in this paper is available at https://github.com/Seauagain/GBCT.

An output scaling layer boosts deep neural networks for multiscale ODE systems

TL;DR

This work tackles the challenge of modeling multiscale, stiff reaction dynamics by introducing Generalized Box-Cox Transformation (GBCT) as an output-scaling layer. GBCT is implemented as an odd extension of Box-Cox to handle sign-changing data, integrated into a data-driven surrogate framework (GBCTNet) that uses Box-Cox transformed inputs (B(Y)) and GBCT-transformed outputs to stabilize training and improve long-term predictions. Across six benchmarks—including methane/air kinetics, nuclear reactions, Robertson diffusion, turbulent ignition, and nuclear flames—GBCTNet significantly reduces prediction errors, enhances stability, and accelerates training (roughly 6x faster in terms of epochs to reach similar accuracy). Frequency analysis shows GBCT shifts high-frequency components toward lower frequencies, aligning with neural networks’ bias and improving generalization, while remaining plug-in compatible with PINNs and operator-learning methods. The results suggest GBCT is a practical, model-agnostic tool to mitigate multiscale effects in complex dynamical systems.

Abstract

Simulating complex diffusion-reaction systems is often prohibitively expensive due to the high dimensionality and stiffness of the underlying ODEs, where state variables may span tens of orders of magnitude. Deep learning has recently emerged as a powerful tool in scientific computing, achieving remarkable progress in modeling and sampling stiff systems. However, data scaling techniques remain largely underexplored, despite their crucial role in addressing the frequency bias of deep neural networks when handling multi-magnitude or high-frequency data. In this work, we propose the Generalized Box-Cox Transformation (GBCT), a novel nonlinear scaling method designed to mitigate multiscale challenges by rescaling inherent multi-magnitude components toward a more consistent order of magnitude. We integrate GBCT into our previous data-driven framework and evaluate its performance against the original baseline surrogate model across six representative scenarios: a 21-species chemical reaction kinetics, a 13-isotope nuclear reaction model, the well-known Robertson problem coupled with diffusion, and practically relevant simulations of two-dimensional turbulent reaction-diffusion systems as well as one- and two-dimensional nuclear reactive flows. Numerical experiments demonstrate that GBCT reduces prediction errors by up to two orders of magnitude compared with the baseline model - particularly in the long-term evolution of dynamical systems - and achieves comparable performance with only about one-sixth of the training epochs. Frequency analysis further reveals that GBCT rescales high-frequency components of the objective function toward lower frequencies to align with the neural network's natural low frequency bias, thereby boosting training and generalization. The source code to reproduce the results in this paper is available at https://github.com/Seauagain/GBCT.

Paper Structure

This paper contains 31 sections, 41 equations, 15 figures.

Figures (15)

  • Figure 1: (A) Schematic diagrams of $\log(x)$, BCT and GBCT transformations. (B) Distribution of species H mass fraction in inputs and outputs before and after data transformation. The BCT maps $\pmb{x}(t)$ to $\pmb{x}_B(t)$ while the GBCT maps $\pmb{u}_B(t)$ to $G(\pmb{u}_B(t))$ with parameters $\lambda_a = 0.1$ and $\lambda_b = 0.5$.
  • Figure 2: Temporal evolution of temperature and representative radicals H, OH,H2O of two test cases with initial temperature $T_0 = 1600$ K and $T_0 = 1700$ K, respectively. The initial pressure is $p_0 = 1$ atm and equivalence ratio is $\phi_0 = 1.0$. The black, red, and blue lines represent the simulation results from the CVODE, GBCTNet and BCTNet, respectively.
  • Figure 3: A) Comparison of the long-term temperature evolution between BCTNet (left) and GBCTNet (right) across varying initial temperatures $T_0 \in [1200, 1400]$ K. (B) The average relative error propagation in temperature. The initial pressure is set to $p_0 = 1$ atm, with an equivalence ratio of $\phi_0 = 1.0$. The simulation is run for 50,000 steps, corresponding to 50 ms since the time step $\Delta t = 10^{-6}$ s used for DNN prediction.
  • Figure 4: The evolution of $^{12}$C-$^{16}$O nuclear reaction dynamics. The initial temperature is $T_0 = 2\times10^9$ K and density $\rho_0 = 1\times 10^8$ g/cm$^3$ with simulation time of 0.1 ms. The black solid line denotes simulations obtained by CVODE. The red and blue dashed line denote the predicted results of GBCTNet and BCTNet, respectively.
  • Figure 5: The species concentration $y_1(x,y)$ and the absolute error of BCTNet and GBCTNet for the Robertson diffusion problem after 1000 simulation steps with a time step of $\Delta t=7\times10^{-7}$ s.
  • ...and 10 more figures