Simultaneous emulation and downscaling with physically-consistent deep learning-based regional ocean emulators

Leonard Lupin-Jimenez; Moein Darman; Subhashis Hazarika; Tianning Wu; Michael Gray; Ruyoing He; Anthony Wong; Ashesh Chattopadhyay

Simultaneous emulation and downscaling with physically-consistent deep learning-based regional ocean emulators

Leonard Lupin-Jimenez, Moein Darman, Subhashis Hazarika, Tianning Wu, Michael Gray, Ruyoing He, Anthony Wong, Ashesh Chattopadhyay

TL;DR

The paper addresses the challenge of stable, long-term regional ocean emulation by combining a data-driven autoregressive forecast with a physics-constrained downscaling step for the GoM region. It uses a Fourier Neural Operator (FNO2D) forecast trained on low-resolution GLORYS data at $8$ km, paired with two downscaling architectures (UNET and a Variational Autoencoder with PatchGAN) to produce high-resolution outputs at $4$ km, guided by a loss that blends grid-space and spectral information with $\lambda=0.2$. A key contribution is the online fine-tuning of the downscaling models to correct drift and bias introduced by model error and resolution differences, enabling accurate long-term statistics and physically plausible spectra over decadal time scales. The results demonstrate robust short-term skill, faithful spectral properties, and stable long-term means and variability, all while offering substantial computational speedups relative to physics-based models. The framework represents a practical path toward fast, large-scale regional ocean emulation with quantitative uncertainty-relevant metrics suitable for rapid analyses and scenario exploration.

Abstract

Building on top of the success in AI-based atmospheric emulation, we propose an AI-based ocean emulation and downscaling framework focusing on the high-resolution regional ocean over Gulf of Mexico. Regional ocean emulation presents unique challenges owing to the complex bathymetry and lateral boundary conditions as well as from fundamental biases in deep learning-based frameworks, such as instability and hallucinations. In this paper, we develop a deep learning-based framework to autoregressively integrate ocean-surface variables over the Gulf of Mexico at $8$ Km spatial resolution without unphysical drifts over decadal time scales and simulataneously downscale and bias-correct it to $4$ Km resolution using a physics-constrained generative model. The framework shows both short-term skills as well as accurate long-term statistics in terms of mean and variability.

Simultaneous emulation and downscaling with physically-consistent deep learning-based regional ocean emulators

TL;DR

km, paired with two downscaling architectures (UNET and a Variational Autoencoder with PatchGAN) to produce high-resolution outputs at

km, guided by a loss that blends grid-space and spectral information with

. A key contribution is the online fine-tuning of the downscaling models to correct drift and bias introduced by model error and resolution differences, enabling accurate long-term statistics and physically plausible spectra over decadal time scales. The results demonstrate robust short-term skill, faithful spectral properties, and stable long-term means and variability, all while offering substantial computational speedups relative to physics-based models. The framework represents a practical path toward fast, large-scale regional ocean emulation with quantitative uncertainty-relevant metrics suitable for rapid analyses and scenario exploration.

Abstract

Km spatial resolution without unphysical drifts over decadal time scales and simulataneously downscale and bias-correct it to

Km resolution using a physics-constrained generative model. The framework shows both short-term skills as well as accurate long-term statistics in terms of mean and variability.

Paper Structure (14 sections, 11 equations, 17 figures)

This paper contains 14 sections, 11 equations, 17 figures.

Introduction
Datasets
Methodology
Loss functions
FC model Training and Testing
Downscaling Architectures
UNET
Variational Autoencoder with Adversarial Training
Online fine tuning of the DS model for bias correction
Results
Short-term skills of the FCDS framework
Power spectrum of the FCDS framework
Long-term stability, mean, and variability
Discussion

Figures (17)

Figure 1: Example snaphosts for GLORYS low-resolution and CNAPS high-resolution datasets, for SSH, SSU, and SSV. There are differences between the fields, due to the differences in the reanalysis products.
Figure 2: Distributions for GLORYS LR and CNAPS HR datasets, for SSH, SSU, and SSV.
Figure 3: Framework of FC (left to right) and DS (top to bottom). Forecast inference is done from an initial low resolution GLORYS field state (GLORYS LR) at $t_0$. Before downscaling using our data driven models (top to bottom), the low resolution state is linearly interpolated to the high resolution CNAPS spatial grid.
Figure 4: (a) A diagram of the full FNO2D network, with channel raising, Fourier layers, and channel lowering to the original dimensionality. Data is inputted with dimensionality (4+1), indicating the four channels and a single boolean 1/0 mask representing land or ocean. Then, the latitude and longitude coordinates are concatenated, giving (4+1+2). The channels are raised, and passed through 6 Fourier layers. Finally, the channels are lowered to the original 4-channel space, and the loss is computed from this output, show in equations Eqs. (\ref{['eq:grid_loss']})-(\ref{['eq:tot_loss']}).(b) An individual Fourier layer. The input data passes through two separate channels: one, which performs a linear transformation $W(v(x))$ on the input, and another which performs a 2-D Fourier transform on the data $\text{FFT2}(v(x))$. In the second Fourier layer pipeline on the right of the diagram, the Fourier amplitudes are truncated, to remove higher wavenumber modes. A linear transform $R$ is then applied to this truncated form of the 2-D Fourier data, and then an inverse transform is applied. The linear transformation tensor is added to the Fourier operated tensor, and is passed through an activation function.
Figure 5: Diagram of UNET architecture used; successive layers shown top to bottom. The $x$ and $y$ sizes of each layer, as well as the number of channels $c$, are shown in the form in the form $x \times y \times c$. There are for contraction, which perform $3 \times 3$ kernel convolutions with ReLU activation, along with a max-pooling layer. Then, 4 layers of expansion are done, with skip connection concatenation, up-convolution $2\times2$ kernel convolution.
...and 12 more figures

Simultaneous emulation and downscaling with physically-consistent deep learning-based regional ocean emulators

TL;DR

Abstract

Simultaneous emulation and downscaling with physically-consistent deep learning-based regional ocean emulators

Authors

TL;DR

Abstract

Table of Contents

Figures (17)