Table of Contents
Fetching ...

Prototype-enhanced prediction in graph neural networks for climate applications

Nawid Keshtmand, Elena Fillola, Jeffrey Nicholas Clark, Raul Santos-Rodriguez, Matthew Rigby

TL;DR

Proposes prototypes as additional inputs to a Graph Neural Network emulator to reduce the computational burden of LPDM-based footprints for greenhouse gas emissions monitoring. The methodology compares human expert, random, and data-driven (k-means) prototype selection and demonstrates that data-driven prototypes can yield notable gains in IoU and MSE, especially with around 20 prototypes. The results show up to about 8% IoU improvement over the baseline and crisper, less-smoothed footprint predictions that better reflect wind direction. The work offers a scalable path toward near-real-time atmospheric transport emulation and suggests extending prototype-based inputs to other physics-based emulators.

Abstract

Data-driven emulators are increasingly being used to learn and emulate physics-based simulations, reducing computational expense and run time. Here, we present a structured way to improve the quality of these high-dimensional emulated outputs, through the use of prototypes: an approximation of the emulator's output passed as an input, which informs the model and leads to better predictions. We demonstrate our approach to emulate atmospheric dispersion, key for greenhouse gas emissions monitoring, by comparing a baseline model to models trained using prototypes as an additional input. The prototype models achieve better performance, even with few prototypes and even if they are chosen at random, but we show that choosing the prototypes through data-driven methods (k-means) can lead to almost 10\% increased performance in some metrics.

Prototype-enhanced prediction in graph neural networks for climate applications

TL;DR

Proposes prototypes as additional inputs to a Graph Neural Network emulator to reduce the computational burden of LPDM-based footprints for greenhouse gas emissions monitoring. The methodology compares human expert, random, and data-driven (k-means) prototype selection and demonstrates that data-driven prototypes can yield notable gains in IoU and MSE, especially with around 20 prototypes. The results show up to about 8% IoU improvement over the baseline and crisper, less-smoothed footprint predictions that better reflect wind direction. The work offers a scalable path toward near-real-time atmospheric transport emulation and suggests extending prototype-based inputs to other physics-based emulators.

Abstract

Data-driven emulators are increasingly being used to learn and emulate physics-based simulations, reducing computational expense and run time. Here, we present a structured way to improve the quality of these high-dimensional emulated outputs, through the use of prototypes: an approximation of the emulator's output passed as an input, which informs the model and leads to better predictions. We demonstrate our approach to emulate atmospheric dispersion, key for greenhouse gas emissions monitoring, by comparing a baseline model to models trained using prototypes as an additional input. The prototype models achieve better performance, even with few prototypes and even if they are chosen at random, but we show that choosing the prototypes through data-driven methods (k-means) can lead to almost 10\% increased performance in some metrics.

Paper Structure

This paper contains 4 sections, 3 figures.

Figures (3)

  • Figure 1: Examples of two prototype sets, with size n = 4. Prototypes are chosen from the true footprints in the training dataset. During training and testing, the footprint from the set closest to the true label is passed as an input to improve prediction. A human expert (a) selects footprints that have distinct wind directions, whereas k-means (b) chooses footprints based on the data distribution.
  • Figure 2: Comparison of different models, showing metrics (a) Intersection over Union and (b) Mean Squared Error. The left panels show training curves for three selected models averaged over three seeds (shaded area shows standard distribution). The right panels show the mean score at epoch 100 for different prototype sets and sizes. Error bars show standard deviation across three seeds.
  • Figure 3: For three random samples from the test set (each row), comparison of the true footprint (first column) with the baseline output not utilising prototypes (second column) and three prototype models. Models are shown ordered by increasing performance, from left to right.