Table of Contents
Fetching ...

ARFA: An Asymmetric Receptive Field Autoencoder Model for Spatiotemporal Prediction

Wenxuan Zhang, Xuechao Zou, Li Wu, Xiaoying Wang, Jianqiang Huang, Junliang Xing

TL;DR

ARFA addresses spatiotemporal prediction by employing an asymmetric receptive field autoencoder that splits global context capture in the encoder from local detail reconstruction in the decoder. It introduces Large Kernel Module ($LKM$) and Small Kernel Module ($SKM$) to realize this design, with $F_{global}$ and $F_{local}$ fused as $F_{out} = \sigma(\phi(F_{global} + F_{local}))$. To support meteorological forecasting, RainBench, a large radar echo dataset, is constructed. Experiments on Moving-MNIST, KTH, and RainBench show ARFA achieves state-of-the-art performance across datasets, validating the asymmetric receptive field strategy and the utility of RainBench.

Abstract

Spatiotemporal prediction aims to generate future sequences by paradigms learned from historical contexts. It is essential in numerous domains, such as traffic flow prediction and weather forecasting. Recently, research in this field has been predominantly driven by deep neural networks based on autoencoder architectures. However, existing methods commonly adopt autoencoder architectures with identical receptive field sizes. To address this issue, we propose an Asymmetric Receptive Field Autoencoder (ARFA) model, which introduces corresponding sizes of receptive field modules tailored to the distinct functionalities of the encoder and decoder. In the encoder, we present a large kernel module for global spatiotemporal feature extraction. In the decoder, we develop a small kernel module for local spatiotemporal information reconstruction. Experimental results demonstrate that ARFA consistently achieves state-of-the-art performance on popular datasets. Additionally, we construct the RainBench, a large-scale radar echo dataset for precipitation prediction, to address the scarcity of meteorological data in the domain.

ARFA: An Asymmetric Receptive Field Autoencoder Model for Spatiotemporal Prediction

TL;DR

ARFA addresses spatiotemporal prediction by employing an asymmetric receptive field autoencoder that splits global context capture in the encoder from local detail reconstruction in the decoder. It introduces Large Kernel Module () and Small Kernel Module () to realize this design, with and fused as . To support meteorological forecasting, RainBench, a large radar echo dataset, is constructed. Experiments on Moving-MNIST, KTH, and RainBench show ARFA achieves state-of-the-art performance across datasets, validating the asymmetric receptive field strategy and the utility of RainBench.

Abstract

Spatiotemporal prediction aims to generate future sequences by paradigms learned from historical contexts. It is essential in numerous domains, such as traffic flow prediction and weather forecasting. Recently, research in this field has been predominantly driven by deep neural networks based on autoencoder architectures. However, existing methods commonly adopt autoencoder architectures with identical receptive field sizes. To address this issue, we propose an Asymmetric Receptive Field Autoencoder (ARFA) model, which introduces corresponding sizes of receptive field modules tailored to the distinct functionalities of the encoder and decoder. In the encoder, we present a large kernel module for global spatiotemporal feature extraction. In the decoder, we develop a small kernel module for local spatiotemporal information reconstruction. Experimental results demonstrate that ARFA consistently achieves state-of-the-art performance on popular datasets. Additionally, we construct the RainBench, a large-scale radar echo dataset for precipitation prediction, to address the scarcity of meteorological data in the domain.
Paper Structure (10 sections, 3 equations, 4 figures, 6 tables)

This paper contains 10 sections, 3 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Overall pipeline for spatiotemporal prediction using the encoder and decoder with shared weights.
  • Figure 2: Overall architecture of our proposed ARFA. ARFA is an autoencoder consisting of carefully designed Large Kernel Modules (LKM) and Small Kernel Modules (SKM), serving as the encoder and decoder, respectively. The LKMs offer a large receptive field for global feature extraction in the encoder, while the decoder utilizes SKMs for local information reconstruction.
  • Figure 3: Visual results of our ARFA and existing methods on the Moving-MNIST and KTH dataset.
  • Figure 4: A visual comparison on the RainBench dataset.