An output scaling layer boosts deep neural networks for multiscale ODE systems
Yuxiao Yi, Weizong Wang, Tianhan Zhang, Zhi-Qin John Xu
TL;DR
This work tackles the challenge of modeling multiscale, stiff reaction dynamics by introducing Generalized Box-Cox Transformation (GBCT) as an output-scaling layer. GBCT is implemented as an odd extension of Box-Cox to handle sign-changing data, integrated into a data-driven surrogate framework (GBCTNet) that uses Box-Cox transformed inputs (B(Y)) and GBCT-transformed outputs to stabilize training and improve long-term predictions. Across six benchmarks—including methane/air kinetics, nuclear reactions, Robertson diffusion, turbulent ignition, and nuclear flames—GBCTNet significantly reduces prediction errors, enhances stability, and accelerates training (roughly 6x faster in terms of epochs to reach similar accuracy). Frequency analysis shows GBCT shifts high-frequency components toward lower frequencies, aligning with neural networks’ bias and improving generalization, while remaining plug-in compatible with PINNs and operator-learning methods. The results suggest GBCT is a practical, model-agnostic tool to mitigate multiscale effects in complex dynamical systems.
Abstract
Simulating complex diffusion-reaction systems is often prohibitively expensive due to the high dimensionality and stiffness of the underlying ODEs, where state variables may span tens of orders of magnitude. Deep learning has recently emerged as a powerful tool in scientific computing, achieving remarkable progress in modeling and sampling stiff systems. However, data scaling techniques remain largely underexplored, despite their crucial role in addressing the frequency bias of deep neural networks when handling multi-magnitude or high-frequency data. In this work, we propose the Generalized Box-Cox Transformation (GBCT), a novel nonlinear scaling method designed to mitigate multiscale challenges by rescaling inherent multi-magnitude components toward a more consistent order of magnitude. We integrate GBCT into our previous data-driven framework and evaluate its performance against the original baseline surrogate model across six representative scenarios: a 21-species chemical reaction kinetics, a 13-isotope nuclear reaction model, the well-known Robertson problem coupled with diffusion, and practically relevant simulations of two-dimensional turbulent reaction-diffusion systems as well as one- and two-dimensional nuclear reactive flows. Numerical experiments demonstrate that GBCT reduces prediction errors by up to two orders of magnitude compared with the baseline model - particularly in the long-term evolution of dynamical systems - and achieves comparable performance with only about one-sixth of the training epochs. Frequency analysis further reveals that GBCT rescales high-frequency components of the objective function toward lower frequencies to align with the neural network's natural low frequency bias, thereby boosting training and generalization. The source code to reproduce the results in this paper is available at https://github.com/Seauagain/GBCT.
