Table of Contents
Fetching ...

Streamlining Ocean Dynamics Modeling with Fourier Neural Operators: A Multiobjective Hyperparameter and Architecture Optimization Approach

Yixuan Sun, Ololade Sowunmi, Romain Egele, Sri Hari Krishna Narayanan, Luke Van Roekel, Prasanna Balaprakash

TL;DR

The paper addresses efficient construction of ocean dynamics surrogates by automatically tuning hyperparameters and architectures for Fourier neural operators (FNOs) using DeepHyper. They formulate the learning objective as a weighted combination of $L_{\text{MSE}}$ and $L_{\text{NegACC}}$ to capture mean state and anomalies. A centralized Bayesian optimization with multipoint $q$UCB explores data preprocessing, architecture, and training hyperparameters on HPC resources, seeking a Pareto front for two objectives. Results show that the optimal hyperparameters improve single-step forecasts for four prognostic variables and dramatically enhance autoregressive rollout up to 30 days, validating the utility of multiobjective HPO for ocean forecasting with FNOs.

Abstract

Training an effective deep learning model to learn ocean processes involves careful choices of various hyperparameters. We leverage the advanced search algorithms for multiobjective optimization in DeepHyper, a scalable hyperparameter optimization software, to streamline the development of neural networks tailored for ocean modeling. The focus is on optimizing Fourier neural operators (FNOs), a data-driven model capable of simulating complex ocean behaviors. Selecting the correct model and tuning the hyperparameters are challenging tasks, requiring much effort to ensure model accuracy. DeepHyper allows efficient exploration of hyperparameters associated with data preprocessing, FNO architecture-related hyperparameters, and various model training strategies. We aim to obtain an optimal set of hyperparameters leading to the most performant model. Moreover, on top of the commonly used mean squared error for model training, we propose adopting the negative anomaly correlation coefficient as the additional loss term to improve model performance and investigate the potential trade-off between the two terms. The experimental results show that the optimal set of hyperparameters enhanced model performance in single timestepping forecasting and greatly exceeded the baseline configuration in the autoregressive rollout for long-horizon forecasting up to 30 days. Utilizing DeepHyper, we demonstrate an approach to enhance the use of FNOs in ocean dynamics forecasting, offering a scalable solution with improved precision.

Streamlining Ocean Dynamics Modeling with Fourier Neural Operators: A Multiobjective Hyperparameter and Architecture Optimization Approach

TL;DR

The paper addresses efficient construction of ocean dynamics surrogates by automatically tuning hyperparameters and architectures for Fourier neural operators (FNOs) using DeepHyper. They formulate the learning objective as a weighted combination of and to capture mean state and anomalies. A centralized Bayesian optimization with multipoint UCB explores data preprocessing, architecture, and training hyperparameters on HPC resources, seeking a Pareto front for two objectives. Results show that the optimal hyperparameters improve single-step forecasts for four prognostic variables and dramatically enhance autoregressive rollout up to 30 days, validating the utility of multiobjective HPO for ocean forecasting with FNOs.

Abstract

Training an effective deep learning model to learn ocean processes involves careful choices of various hyperparameters. We leverage the advanced search algorithms for multiobjective optimization in DeepHyper, a scalable hyperparameter optimization software, to streamline the development of neural networks tailored for ocean modeling. The focus is on optimizing Fourier neural operators (FNOs), a data-driven model capable of simulating complex ocean behaviors. Selecting the correct model and tuning the hyperparameters are challenging tasks, requiring much effort to ensure model accuracy. DeepHyper allows efficient exploration of hyperparameters associated with data preprocessing, FNO architecture-related hyperparameters, and various model training strategies. We aim to obtain an optimal set of hyperparameters leading to the most performant model. Moreover, on top of the commonly used mean squared error for model training, we propose adopting the negative anomaly correlation coefficient as the additional loss term to improve model performance and investigate the potential trade-off between the two terms. The experimental results show that the optimal set of hyperparameters enhanced model performance in single timestepping forecasting and greatly exceeded the baseline configuration in the autoregressive rollout for long-horizon forecasting up to 30 days. Utilizing DeepHyper, we demonstrate an approach to enhance the use of FNOs in ocean dynamics forecasting, offering a scalable solution with improved precision.
Paper Structure (22 sections, 5 equations, 5 figures, 3 tables)

This paper contains 22 sections, 5 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Parallel coordinate plots of the hyperparameters in the search space with respect to the validation MSE (in log scale). The space is divided into three categories. (a) shows the data-related hyperparameters, (b) shows training-related hyperparameters, and (c) contains neural architecture-related hyperparameters.
  • Figure 2: Parallel coordinate plots of the hyperparameters in the search space with respect to the validation ACC. The space is divided into three categories. (a) shows the data-related hyperparameters, (b) shows training-related hyperparameters, and (c) contains neural architecture-related hyperparameters.
  • Figure 3: Scatter plot of the quantile-transformed MSE and negative ACC among the search results. The points are color-coded based on the summation of the two objectives, where a lower value indicates better performance.
  • Figure 4: Meridional velocity profiles of model predictions using the testing set. Both models, with different hyperparameters, were trained with 100 epochs using the composite loss function. (a) shows the model performance with the baseline hyperparameter configuration; (b) shows the model with the best configuration from the search results.
  • Figure 5: Rollout performance comparison between the models with baseline hyperparameters and searched optimal configurations. (a) shows the rollout MSE values where the error accumulation is much slower for the model with the optimal configuration; (b) shows the rollout ACC scores where the model with default configuration degrades quickly as the rollout horizon increases; (c) shows the model rollout forecasts at Day 10 for meridional velocity profiles.