ARFA: An Asymmetric Receptive Field Autoencoder Model for Spatiotemporal Prediction
Wenxuan Zhang, Xuechao Zou, Li Wu, Xiaoying Wang, Jianqiang Huang, Junliang Xing
TL;DR
ARFA addresses spatiotemporal prediction by employing an asymmetric receptive field autoencoder that splits global context capture in the encoder from local detail reconstruction in the decoder. It introduces Large Kernel Module ($LKM$) and Small Kernel Module ($SKM$) to realize this design, with $F_{global}$ and $F_{local}$ fused as $F_{out} = \sigma(\phi(F_{global} + F_{local}))$. To support meteorological forecasting, RainBench, a large radar echo dataset, is constructed. Experiments on Moving-MNIST, KTH, and RainBench show ARFA achieves state-of-the-art performance across datasets, validating the asymmetric receptive field strategy and the utility of RainBench.
Abstract
Spatiotemporal prediction aims to generate future sequences by paradigms learned from historical contexts. It is essential in numerous domains, such as traffic flow prediction and weather forecasting. Recently, research in this field has been predominantly driven by deep neural networks based on autoencoder architectures. However, existing methods commonly adopt autoencoder architectures with identical receptive field sizes. To address this issue, we propose an Asymmetric Receptive Field Autoencoder (ARFA) model, which introduces corresponding sizes of receptive field modules tailored to the distinct functionalities of the encoder and decoder. In the encoder, we present a large kernel module for global spatiotemporal feature extraction. In the decoder, we develop a small kernel module for local spatiotemporal information reconstruction. Experimental results demonstrate that ARFA consistently achieves state-of-the-art performance on popular datasets. Additionally, we construct the RainBench, a large-scale radar echo dataset for precipitation prediction, to address the scarcity of meteorological data in the domain.
