Table of Contents
Fetching ...

PFGNet: A Fully Convolutional Frequency-Guided Peripheral Gating Network for Efficient Spatiotemporal Predictive Learning

Xinyong Cai, Changbin Sun, Yong Wang, Hongyu Yang, Yuankai Wu

Abstract

Spatiotemporal predictive learning (STPL) aims to forecast future frames from past observations and is essential across a wide range of applications. Compared with recurrent or hybrid architectures, pure convolutional models offer superior efficiency and full parallelism, yet their fixed receptive fields limit their ability to adaptively capture spatially varying motion patterns. Inspired by biological center-surround organization and frequency-selective signal processing, we propose PFGNet, a fully convolutional framework that dynamically modulates receptive fields through pixel-wise frequency-guided gating. The core Peripheral Frequency Gating (PFG) block extracts localized spectral cues and adaptively fuses multi-scale large-kernel peripheral responses with learnable center suppression, effectively forming spatially adaptive band-pass filters. To maintain efficiency, all large kernels are decomposed into separable 1D convolutions ($1 \times k$ followed by $k \times 1$), reducing per-channel computational cost from $O(k^2)$ to $O(2k)$. PFGNet enables structure-aware spatiotemporal modeling without recurrence or attention. Experiments on Moving MNIST, TaxiBJ, Human3.6M, and KTH show that PFGNet delivers SOTA or near-SOTA forecasting performance with substantially fewer parameters and FLOPs. Our code is available at https://github.com/fhjdqaq/PFGNet.

PFGNet: A Fully Convolutional Frequency-Guided Peripheral Gating Network for Efficient Spatiotemporal Predictive Learning

Abstract

Spatiotemporal predictive learning (STPL) aims to forecast future frames from past observations and is essential across a wide range of applications. Compared with recurrent or hybrid architectures, pure convolutional models offer superior efficiency and full parallelism, yet their fixed receptive fields limit their ability to adaptively capture spatially varying motion patterns. Inspired by biological center-surround organization and frequency-selective signal processing, we propose PFGNet, a fully convolutional framework that dynamically modulates receptive fields through pixel-wise frequency-guided gating. The core Peripheral Frequency Gating (PFG) block extracts localized spectral cues and adaptively fuses multi-scale large-kernel peripheral responses with learnable center suppression, effectively forming spatially adaptive band-pass filters. To maintain efficiency, all large kernels are decomposed into separable 1D convolutions ( followed by ), reducing per-channel computational cost from to . PFGNet enables structure-aware spatiotemporal modeling without recurrence or attention. Experiments on Moving MNIST, TaxiBJ, Human3.6M, and KTH show that PFGNet delivers SOTA or near-SOTA forecasting performance with substantially fewer parameters and FLOPs. Our code is available at https://github.com/fhjdqaq/PFGNet.
Paper Structure (34 sections, 3 theorems, 38 equations, 18 figures, 10 tables)

This paper contains 34 sections, 3 theorems, 38 equations, 18 figures, 10 tables.

Key Result

Theorem 1

Let $H_1, H_2: [0, \pi] \to \mathbb{R}$ be continuous and radially symmetric. Define $f(r) = H_1(r) - \beta H_2(r)$ for $\beta \in (-1, 1)$. Assume there exist $0 < c < a < b \le \pi$ such that Then there exist $0 < r_1 < r_2 \le b$ such that Thus $H_\beta(\omega) = H_1(\|\omega\|) - \beta H_2(\|\omega\|)$ has a non-degenerate ring-shaped pass band $\{\omega : r_1 < \|\omega\| < r_2\}$.

Figures (18)

  • Figure 1: Performance-efficiency trade-off on TaxiBJ. Bubble size denotes FLOPs. PFG achieves SOTA MSE with minimal compute.
  • Figure 2: Overall architecture and core modules of PFGNet. The model follows a SimVP-style encoder--translator--decoder pipeline. The input sequence $\{I_t\}_{t=1}^{T_{\mathrm{in}}}$ is encoded into latent features, temporally packed, and processed by a MSInit followed by $N_t$ PFG blocks.
  • Figure 3: Frequency-Guided Peripheral Gating in the PFG block. Three local spectral cues (gradient magnitude, Laplacian, local variance) are extracted via fixed depthwise filters, channel-averaged, concatenated into a 3-channel frequency map, and passed through a $1\!\times\!1$ conv to produce per-pixel gate logits. Softmax over the $K$ scales yields selection weights $\alpha_k$ (the argmax visualization is only used to illustrate the preferred scale).
  • Figure 4: Qualitative results of PFGNet on Moving MNIST.
  • Figure 5: Qualitative results of PFGNet on TaxiBJ.
  • ...and 13 more figures

Theorems & Definitions (6)

  • Theorem 1: Weak Existence of Ring-Shaped Pass Band
  • proof
  • Theorem 2: Existence of an SNR-maximizing $\beta^\star$
  • proof
  • Lemma 1: SNR Advantage of $H_L - \beta H_S$ over $H_L$
  • proof