Residual Multi-Fidelity Neural Network Computing
Owen Davis, Mohammad Motamed, Raul Tempone
TL;DR
This paper introduces the Residual Multi-Fidelity Neural Network (RMFNN), a theoretically grounded framework for constructing fast high-fidelity surrogates by learning a nonlinear residual $F$ between a low-fidelity model $Q_{LF}$ and a high-fidelity model $Q_{HF}$. The approach trains two networks in tandem: ResNN learns the residual $F(\boldsymbol{\theta}, Q_{LF}(\boldsymbol{\theta}))$, enabling generation of synthetic high-fidelity data, and DNN learns the high-fidelity quantity $Q_{HF}$ using both original and synthetic data. A key theoretical result from Davis_Motamed:2024 guarantees the existence of a ReLU network approximating a broad class of targets with network complexity tied to the uniform norm of the target, justifying low-complexity learning when the residual is small. Numerical experiments show substantial computational savings, particularly when $||F||_{L^{\infty}}$ is small, and illustrate the framework's advantages over direct high-fidelity learning and discrepancy-based rivals, with clear paths toward multi-fidelity extensions and uncertainty quantification tasks.
Abstract
In this work, we consider the general problem of constructing a neural network surrogate model using multi-fidelity information. Motivated by error-complexity estimates for ReLU neural networks, we formulate the correlation between an inexpensive low-fidelity model and an expensive high-fidelity model as a possibly non-linear residual function. This function defines a mapping between 1) the shared input space of the models along with the low-fidelity model output, and 2) the discrepancy between the outputs of the two models. The computational framework proceeds by training two neural networks to work in concert. The first network learns the residual function on a small set of high- and low-fidelity data. Once trained, this network is used to generate additional synthetic high-fidelity data, which is used in the training of the second network. The trained second network then acts as our surrogate for the high-fidelity quantity of interest. We present four numerical examples to demonstrate the power of the proposed framework, showing that significant savings in computational cost may be achieved when the output predictions are desired to be accurate within small tolerances.
