Table of Contents
Fetching ...

Conditional updates of neural network weights for increased out of training performance

Jan Saynisch-Wagner, Saran Rajendran Sari

TL;DR

The paper tackles the problem of neural networks failing outside their training distribution in climate and geoscience contexts. It introduces a weight-prediction framework that finetunes a parent network on training data, regresses weight anomalies on informative predictors, and extrapolates to application data to generate child networks. Across three climate-inspired use cases—temporal tipping of the AMOC, spatial density estimation, and cross-domain wind-velocity uncertainty—the approach generally improves out-of-distribution performance, with notable gains in tipping scenarios and deep-ocean extrapolation, and meaningful but variable gains in cross-domain uncertainty. The work emphasizes that this extrapolation via weight-regression offers a promising path to making neural networks adaptive to unseen regimes, while also acknowledging stochasticity and stability challenges that warrant further research and ensemble strategies.

Abstract

This study proposes a method to enhance neural network performance when training data and application data are not very similar, e.g., out of distribution problems, as well as pattern and regime shifts. The method consists of three main steps: 1) Retrain the neural network towards reasonable subsets of the training data set and note down the resulting weight anomalies. 2) Choose reasonable predictors and derive a regression between the predictors and the weight anomalies. 3) Extrapolate the weights, and thereby the neural network, to the application data. We show and discuss this method in three use cases from the climate sciences, which include successful temporal, spatial and cross-domain extrapolations of neural networks.

Conditional updates of neural network weights for increased out of training performance

TL;DR

The paper tackles the problem of neural networks failing outside their training distribution in climate and geoscience contexts. It introduces a weight-prediction framework that finetunes a parent network on training data, regresses weight anomalies on informative predictors, and extrapolates to application data to generate child networks. Across three climate-inspired use cases—temporal tipping of the AMOC, spatial density estimation, and cross-domain wind-velocity uncertainty—the approach generally improves out-of-distribution performance, with notable gains in tipping scenarios and deep-ocean extrapolation, and meaningful but variable gains in cross-domain uncertainty. The work emphasizes that this extrapolation via weight-regression offers a promising path to making neural networks adaptive to unseen regimes, while also acknowledging stochasticity and stability challenges that warrant further research and ensemble strategies.

Abstract

This study proposes a method to enhance neural network performance when training data and application data are not very similar, e.g., out of distribution problems, as well as pattern and regime shifts. The method consists of three main steps: 1) Retrain the neural network towards reasonable subsets of the training data set and note down the resulting weight anomalies. 2) Choose reasonable predictors and derive a regression between the predictors and the weight anomalies. 3) Extrapolate the weights, and thereby the neural network, to the application data. We show and discuss this method in three use cases from the climate sciences, which include successful temporal, spatial and cross-domain extrapolations of neural networks.

Paper Structure

This paper contains 11 sections, 6 figures.

Figures (6)

  • Figure 1: AMOC strength and an example of good performance of the weight-prediction method. Black Line: AMOC overturning decay and tipping, i.e., the target ground truth values. Red Line: Output of parent model trained until the dashed end of training (EOT) line. Orange Line: Output of weight-predicted child model. Forgetful online learning and subsequent fitting of EOF-to-weight relations use only data before dashed EOT line.
  • Figure 2: Impact of the choice of the weight-prediction method on the predicted model's performance. Polynomials of order 1 (left panel) and 2 (right panel) where fitted to the leading 4 principal components of the input variable (i.e., the meridional velocity maps). Yellow Boxes: RMSE of unperturbed models. Green Boxes: RMSE of weight-predicted models. Blue Boxes: Pairwise RMSE differences (i.e., unperturbed "parent" model minus corresponding weight-predicted "child" model). Positive values represent improvements due to the presented weight-prediction approach. Note, "full period" and "period > 1800" is the same for every simulation of the ensemble. The term "full validation period" refers to all data point after each simulation's individual EOT (which is randomly chosen for every simulation before training). Each box bases on 300 parents/children/pairs.
  • Figure 3: As Fig. \ref{['fig:AMOC']}, but with a sub-optimal example of child model performance. Still, the predicted child model (orange) outperform the parent model (red) beyond the EOT threshold in the RMSE sense.
  • Figure 4: Impact of EOT on the RMSE of a fixed target range (1800yrs,2000yrs]. Polynomials of order 1 (left panel) and 2 (right panel) where fitted to the leading 4 principal components of the input variable (i.e., the meridional velocity maps). Yellow Boxes: RMSE of unperturbed models. Green Boxes: RMSE of weight-predicted models. Blue Boxes: Pairwise RMSE differences (i.e., unperturbed "parent" model minus corresponding weight-predicted "child" model). Positive values represent improvements due to the presented weight-prediction approach. Each box bases on 100 parents/children/pairs.
  • Figure 5: Depth dependent zonally averaged$^*$ (upper panels) and meridionally averaged$^*$ (lower panels) RMSE of NN based sea water density estimation after training above the dashed line only. Left panels: Performance of the parent model. Middle panels: Performance of weight-predicted child model. Right panels: RMSE differences (parent minus child, the green color represents improvements by the child model). $^*$: In giving every global output location equal weight, these averages represent a data-centered view and do not follow oceanographic standards.
  • ...and 1 more figures