Table of Contents
Fetching ...

Convolutional-neural-operator-based transfer learning for solving PDEs

Peng Fan, Guofei Pang

TL;DR

The paper addresses data-scarce transfer learning for PDE operator estimation by pre-training a convolutional neural operator (CNO) on a source dataset and adapting it to a small target dataset. It systematically compares three transfer strategies—fine-tuning, LoRA, and neuron linear transformation (NLT)—finding that NLT delivers the highest surrogate accuracy and robustness to distribution shifts. Across three challenging PDEs (Kuramoto–Sivashinsky, Brusselator, Navier–Stokes), the study demonstrates a clear generalization gap for non-adapted CNOs and shows substantial performance gains with transfer, with NLT providing the best, most stable results and enabling effective few-shot learning and multi-fidelity data fusion. The findings suggest practical pathways for data-efficient surrogate modeling of PDEs in engineering contexts where high-fidelity data are expensive to obtain.

Abstract

Convolutional neural operator is a CNN-based architecture recently proposed to enforce structure-preserving continuous-discrete equivalence and enable the genuine, alias-free learning of solution operators of PDEs. This neural operator was demonstrated to outperform for certain cases some baseline models such as DeepONet, Fourier neural operator, and Galerkin transformer in terms of surrogate accuracy. The convolutional neural operator, however, seems not to be validated for few-shot learning. We extend the model to few-shot learning scenarios by first pre-training a convolutional neural operator using a source dataset and then adjusting the parameters of the trained neural operator using only a small target dataset. We investigate three strategies for adjusting the parameters of a trained neural operator, including fine-tuning, low-rank adaption, and neuron linear transformation, and find that the neuron linear transformation strategy enjoys the highest surrogate accuracy in solving PDEs such as Kuramoto-Sivashinsky equation, Brusselator diffusion-reaction system, and Navier-Stokes equations.

Convolutional-neural-operator-based transfer learning for solving PDEs

TL;DR

The paper addresses data-scarce transfer learning for PDE operator estimation by pre-training a convolutional neural operator (CNO) on a source dataset and adapting it to a small target dataset. It systematically compares three transfer strategies—fine-tuning, LoRA, and neuron linear transformation (NLT)—finding that NLT delivers the highest surrogate accuracy and robustness to distribution shifts. Across three challenging PDEs (Kuramoto–Sivashinsky, Brusselator, Navier–Stokes), the study demonstrates a clear generalization gap for non-adapted CNOs and shows substantial performance gains with transfer, with NLT providing the best, most stable results and enabling effective few-shot learning and multi-fidelity data fusion. The findings suggest practical pathways for data-efficient surrogate modeling of PDEs in engineering contexts where high-fidelity data are expensive to obtain.

Abstract

Convolutional neural operator is a CNN-based architecture recently proposed to enforce structure-preserving continuous-discrete equivalence and enable the genuine, alias-free learning of solution operators of PDEs. This neural operator was demonstrated to outperform for certain cases some baseline models such as DeepONet, Fourier neural operator, and Galerkin transformer in terms of surrogate accuracy. The convolutional neural operator, however, seems not to be validated for few-shot learning. We extend the model to few-shot learning scenarios by first pre-training a convolutional neural operator using a source dataset and then adjusting the parameters of the trained neural operator using only a small target dataset. We investigate three strategies for adjusting the parameters of a trained neural operator, including fine-tuning, low-rank adaption, and neuron linear transformation, and find that the neuron linear transformation strategy enjoys the highest surrogate accuracy in solving PDEs such as Kuramoto-Sivashinsky equation, Brusselator diffusion-reaction system, and Navier-Stokes equations.

Paper Structure

This paper contains 19 sections, 14 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Relative $L_1$ error (%) plotted against the number of samples in the target dataset for the Navier-Stokes equations in which the transfer is made from a high kinematic viscosity to a low one (from $\nu=5\times 10^{-4}$ to $\nu=1\times 10^{-1}$). The blue and orange curves correspond to the transfer learning framework and the supervised learning baseline, respectively. When a large target dataset is available (e.g. for a target dataset of size 256), training a CNO on the target dataset will be enough; however, when the size becomes small (say $n_t=16$) due to the high cost of attaining high-fidelity solutions, the direct training fails, but the transfer learning provides the opportunities for yielding a highly accurate surrogate model.
  • Figure 1: Contour plots of three test samples for the Kuramoto-Sivashinsky equation for the transfer scenario from $K = 6$ to $K = 8$. The columns display, from left to right: the input function (initial condition $u_0$), the ground truth solution (output function $u(x,T)$), and the predicted solutions from the no-transfer baseline, supervised-learning baseline, fine-tuning, LoRA, and the NLT transfer strategies. Each row corresponds to one of the three test samples in the target dataset. The values of input and output functions are both reshaped into matrices of shape $128\times 128$.
  • Figure 2: Contour plots of three test samples for Brusselator diffusion-reaction system for the transfer scenario from $K = 6$ to $K = 8$. The columns display, from left to right: the input function (initial condition $v_0$), the ground truth solution (output function $v(x,T)$), and the predicted solutions from the no-transfer baseline, supervised-learning baseline, fine-tuning, LoRA, and the NLT transfer strategies. Each row corresponds to one of the three test samples in the target dataset. The values of input and output functions are both reshaped into matrices of shape $128\times 128$.
  • Figure 3: Contour plots of three test samples for the Navier-Stokes equations for the transfer scenario from viscosity $\nu = 5 \times 10^{-4}$ to $\nu = 1 \times 10^{-4}$. The columns display, from left to right: the input function (initial condition $\omega_0$), the ground truth solution (output function $\omega(x,T)$), and the predicted solutions from the no-transfer baseline, supervised-learning baseline, fine-tuning, LoRA, and the NLT transfer strategies. Each row corresponds to one of the three test samples in the target dataset. The values of input and output functions are both reshaped into matrices of shape $128\times 128$.