Generative flow induced neural architecture search: Towards discovering optimal architecture in wavelet neural operator

Hartej Soin; Tapas Tripura; Souvik Chakraborty

Generative flow induced neural architecture search: Towards discovering optimal architecture in wavelet neural operator

Hartej Soin, Tapas Tripura, Souvik Chakraborty

TL;DR

This work tackles the challenge of neural-operator hyperparameter tuning by introducing FWNO, a generative flow-based NAS that learns stochastic policies to assemble per-layer wavelet bases and activation operators for the Wavelet Neural Operator. A terminal reward is defined as $R(s_T)=\exp(-\mathcal{L}_{val})$, and two feed-forward networks model the flow to generate architecture trajectories, minimizing a flow-consistency loss to align inflow and outflow across states. FWNO demonstrates improved accuracy over vanilla WNO across four PDE benchmarks (Burgers, rectangular and triangular Darcy, and Navier–Stokes) and enables zero-shot super-resolution, while reducing the computational cost compared to grid-search NAS. The approach preserves discretization-invariant operator learning and can be extended to other neural operators, offering a scalable path for automated hyperparameter selection in physics-informed learning of parametric PDEs.

Abstract

We propose a generative flow-induced neural architecture search algorithm. The proposed approach devices simple feed-forward neural networks to learn stochastic policies to generate sequences of architecture hyperparameters such that the generated states are in proportion with the reward from the terminal state. We demonstrate the efficacy of the proposed search algorithm on the wavelet neural operator (WNO), where we learn a policy to generate a sequence of hyperparameters like wavelet basis and activation operators for wavelet integral blocks. While the trajectory of the generated wavelet basis and activation sequence is cast as flow, the policy is learned by minimizing the flow violation between each state in the trajectory and maximizing the reward from the terminal state. In the terminal state, we train WNO simultaneously to guide the search. We propose to use the exponent of the negative of the WNO loss on the validation dataset as the reward function. While the grid search-based neural architecture generation algorithms foresee every combination, the proposed framework generates the most probable sequence based on the positive reward from the terminal state, thereby reducing exploration time. Compared to reinforcement learning schemes, where complete episodic training is required to get the reward, the proposed algorithm generates the hyperparameter trajectory sequentially. Through four fluid mechanics-oriented problems, we illustrate that the learned policies can sample the best-performing architecture of the neural operator, thereby improving the performance of the vanilla wavelet neural operator.

Generative flow induced neural architecture search: Towards discovering optimal architecture in wavelet neural operator

TL;DR

, and two feed-forward networks model the flow to generate architecture trajectories, minimizing a flow-consistency loss to align inflow and outflow across states. FWNO demonstrates improved accuracy over vanilla WNO across four PDE benchmarks (Burgers, rectangular and triangular Darcy, and Navier–Stokes) and enables zero-shot super-resolution, while reducing the computational cost compared to grid-search NAS. The approach preserves discretization-invariant operator learning and can be extended to other neural operators, offering a scalable path for automated hyperparameter selection in physics-informed learning of parametric PDEs.

Abstract

Paper Structure (15 sections, 16 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 15 sections, 16 equations, 6 figures, 2 tables, 1 algorithm.

Introduction
Background on wavelet neural operator and flow networks
Wavelet Neural Operator (WNO)
Generative Flow Network
Flow induced wavelet neural operator
Initial setup
Sequential construction of the architecture
Training the networks to learn flow
Numerical Results
Hyperparameter settings.
Burgers diffusion dynamics
Darcy's flow equation
Darcy's flow equation in triangular geometry with a notch
Navier-Stokes viscous fluid dynamics
Conclusions

Figures (6)

Figure 1: Schematic architecture of the proposed flow-induced wavelet neural operator. (a) The flow network of wavelets and activation operators leverages the DAG structure of the Markov decision process. The algorithm chooses a binary set consisting of wavelet basis and activation function for each of the wavelet integral blocks (WIBs). The arrows leading from one state to the next represent the flow. Starting at $s_0$, we sample the next action (wavelet/activation) proportional to the output of the Neural Networks $N_w$/$N_a$, which is the flow from one node to the next nodes. A representative set of states chosen by FWNO with maximum probability is shown as $\{w_0,a_n,w_n,a_0,\ldots,w_0,a_n\}$ given the highest flow corresponding to trajectories indicated by the blue path. The loss $\mathcal{L}$ over the trajectory is accumulated, and the agents $N_w$ and $N_a$ are updated at the end of the trajectory. (b) The schematic of the wavelet neural operator. It uses the selected set of $\{w,a\}$ to parameterize the WNO kernels in wavelet space. (c) The wavelet integral block uses selected basis $w_i,i=1,\ldots,\ell$ to perform forward and inverse wavelet transforms $\psi_{w}$ and $\psi_{w}^{-1}$. The approximated $\mathbb{A}$ and detailed $\mathbb{D}$ features of the inputs are convolved with the kernels $\mathcal{K}_{\mathbb{A}}$ and $\mathcal{K}_{\mathbb{D}}$.
Figure 2: 1D Burgers equation with periodic boundary conditions. (a) Four representative samples of initial conditions. (b) Ground truth vs. the prediction from best-performing WNO architecture sampled by FWNO at $T=1$s. The predictions suggest that the FWNO approximated solution operator approximates the true integral operator very closely. Differences between the truth and prediction are not easily discerned from the figure.
Figure 3: Zero-shot super-resolution on 1D Burgers equation. (a) Four representative samples of initial conditions from the test dataset. (b) Comparison of ground truth and predictions from the best-performing model at $T=1$s. The model is trained on a spatial resolution of 1024, while predictions are made on 2048, 4096, and 8192 resolutions. The predictions suggest the capability of the FWNO approximated solution operators to make zero-shot predictions on the higher-resolution datasets without fine-tuning.
Figure 4: Time independent Darcy Flow in a rectangular domain. The plots show four representative samples of permeability fields, corresponding true pressure fields, and pressure field prediction, along with the absolute error plot for the best case among all states sampled by the FWNO. The figures show almost an exact match between the FWNO and the true solution.
Figure 5: Darcy flow simulation in a triangular domain with a notch. The plots show four representative boundary conditions, the corresponding true pressure fields, and predicted pressure fields for the best case among all states sampled by the FWNO along the absolute error fields. The optimal solutions show an almost exact match with the ground truth, even for a complex geometry.
...and 1 more figures

Generative flow induced neural architecture search: Towards discovering optimal architecture in wavelet neural operator

TL;DR

Abstract

Generative flow induced neural architecture search: Towards discovering optimal architecture in wavelet neural operator

Authors

TL;DR

Abstract

Table of Contents

Figures (6)