Table of Contents
Fetching ...

SmaAT-QMix-UNet: A Parameter-Efficient Vector-Quantized UNet for Precipitation Nowcasting

Nikolas Stavrou, Siamak Mehrkanoon

Abstract

Weather forecasting supports critical socioeconomic activities and complements environmental protection, yet operational Numerical Weather Prediction (NWP) systems remain computationally intensive, thus being inefficient for certain applications. Meanwhile, recent advances in deep data-driven models have demonstrated promising results in nowcasting tasks. This paper presents SmaAT-QMix-UNet, an enhanced variant of SmaAT-UNet that introduces two key innovations: a vector quantization (VQ) bottleneck at the encoder-decoder bridge, and mixed kernel depth-wise convolutions (MixConv) replacing selected encoder and decoder blocks. These enhancements both reduce the model's size and improve its nowcasting performance. We train and evaluate SmaAT-QMix-UNet on a Dutch radar precipitation dataset (2016-2019), predicting precipitation 30 minutes ahead. Three configurations are benchmarked: using only VQ, only MixConv, and the full SmaAT-QMix-UNet. Grad-CAM saliency maps highlight the regions influencing each nowcast, while a UMAP embedding of the codewords illustrates how the VQ layer clusters encoder outputs. The source code for SmaAT-QMix-UNet is publicly available on GitHub \footnote{\href{https://github.com/nstavr04/MasterThesisSnellius}{https://github.com/nstavr04/MasterThesisSnellius}}.

SmaAT-QMix-UNet: A Parameter-Efficient Vector-Quantized UNet for Precipitation Nowcasting

Abstract

Weather forecasting supports critical socioeconomic activities and complements environmental protection, yet operational Numerical Weather Prediction (NWP) systems remain computationally intensive, thus being inefficient for certain applications. Meanwhile, recent advances in deep data-driven models have demonstrated promising results in nowcasting tasks. This paper presents SmaAT-QMix-UNet, an enhanced variant of SmaAT-UNet that introduces two key innovations: a vector quantization (VQ) bottleneck at the encoder-decoder bridge, and mixed kernel depth-wise convolutions (MixConv) replacing selected encoder and decoder blocks. These enhancements both reduce the model's size and improve its nowcasting performance. We train and evaluate SmaAT-QMix-UNet on a Dutch radar precipitation dataset (2016-2019), predicting precipitation 30 minutes ahead. Three configurations are benchmarked: using only VQ, only MixConv, and the full SmaAT-QMix-UNet. Grad-CAM saliency maps highlight the regions influencing each nowcast, while a UMAP embedding of the codewords illustrates how the VQ layer clusters encoder outputs. The source code for SmaAT-QMix-UNet is publicly available on GitHub \footnote{\href{https://github.com/nstavr04/MasterThesisSnellius}{https://github.com/nstavr04/MasterThesisSnellius}}.
Paper Structure (19 sections, 3 equations, 5 figures, 1 table)

This paper contains 19 sections, 3 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: (a) SmaAT-QMix-UNet architecture: Rectangles represent feature maps, with height indicating spatial resolution and width the channel dimension. MixConv blocks are used in the last two encoder levels and the first decoder stage, while a VQ layer discretizes the $B \times 18 \times 18 \times 512$ bottleneck tensor. (b) Vector-quantization module: Latent features are flattened, each 512-D vector is assigned to its nearest codebook entry ($K=32$), and reshaped into a quantized feature map. Training optimizes the combined codebook and $\beta$-weighted commitment losses.
  • Figure 2: Mixed depthwise convolution (MixConv). The input tensor is split in two disjoint groups. First group is processed by a 3x3 depthwise convolution and the second group by a 5x5 depthwise convolution. The two outputs are then concatenated along the channel dimension.
  • Figure 3: Comparison of predictions generated by different models. The SmaAT-QMix-UNet model shows better alignment with the ground truth.
  • Figure 4: (a) UMAP visualization of encoder feature vectors before and after vector quantization in SmaAT-QMix-UNet, where grey points denote pre-VQ representations and colored points indicate their assigned codewords. (b) Hyperparameter tuning results for the VQ module, showing validation performance across 16 combinations of codebook size and commitment cost, with $K=32$ and $\beta=0.75$ achieving the best performance.
  • Figure 5: Heatmaps generated with Grad-CAM for SmaAT-QMix-UNet, showing activation regions across the five encoder and four decoder levels, including responses from the convolutional blocks (DoubleDSC or MixConv) and the CBAM modules in the encoder.