ImputeFormer: Low Rankness-Induced Transformers for Generalizable Spatiotemporal Imputation

Tong Nie; Guoyang Qin; Wei Ma; Yuewen Mei; Jian Sun

ImputeFormer: Low Rankness-Induced Transformers for Generalizable Spatiotemporal Imputation

Tong Nie, Guoyang Qin, Wei Ma, Yuewen Mei, Jian Sun

TL;DR

ImputeFormer tackles spatiotemporal missing data by injecting a low-rank inductive bias into a Transformer framework. It introduces temporal projected attention and embedded spatial attention, plus a Fourier Imputation Loss to regularize spectrum, yielding linear-time complexity in practice. Across traffic, energy, solar, and air quality datasets, it achieves state-of-the-art accuracy with superior efficiency and robustness, while offering interpretable mechanisms through spectrum and embedding analyses. This approach promises broad applicability to real-world imputation tasks and time-series representation learning, especially under highly sparse or cross-domain conditions.

Abstract

Missing data is a pervasive issue in both scientific and engineering tasks, especially for the modeling of spatiotemporal data. This problem attracts many studies to contribute to data-driven solutions. Existing imputation solutions mainly include low-rank models and deep learning models. The former assumes general structural priors but has limited model capacity. The latter possesses salient features of expressivity but lacks prior knowledge of the underlying spatiotemporal structures. Leveraging the strengths of both two paradigms, we demonstrate a low rankness-induced Transformer to achieve a balance between strong inductive bias and high model expressivity. The exploitation of the inherent structures of spatiotemporal data enables our model to learn balanced signal-noise representations, making it generalizable for a variety of imputation problems. We demonstrate its superiority in terms of accuracy, efficiency, and versatility in heterogeneous datasets, including traffic flow, solar energy, smart meters, and air quality. Promising empirical results provide strong conviction that incorporating time series primitives, such as low-rankness, can substantially facilitate the development of a generalizable model to approach a wide range of spatiotemporal imputation problems.

ImputeFormer: Low Rankness-Induced Transformers for Generalizable Spatiotemporal Imputation

TL;DR

Abstract

Paper Structure (27 sections, 1 theorem, 21 equations, 13 figures, 7 tables)

This paper contains 27 sections, 1 theorem, 21 equations, 13 figures, 7 tables.

Introduction
Preliminary
Related Work
Low Rankness-Induced Transformer
Architectural Overview
Spatiotemporal Input Embedding
Temporal Projected Attention
Spatial Embedded Attention
Fourier Imputation Loss
Empirical Evaluations
Results on Traffic Benchmarks
Results on Environmental and Energy Data
Ablation Study
Model Efficiency
Robustness and Versatility Analysis
...and 12 more sections

Key Result

Lemma 1

Given a smooth or periodic time series $\mathbf{x}\in\mathbb{R}^{T}$, its circulant (convolution) matrix $\mathcal{C}(\mathbf{x})\in\mathbb{R}^{T\times T}$ reflects the Tucker low-rankness, depicted by the convolutional nuclear norm. This property can be revealed by using the Discrete Fourier Transf As DFT diagonalizes the circulant matrix by $\mathcal{C}(\mathbf{x})=\mathbf{U}^{\mathsf{H}}\text{d

Figures (13)

Figure 1: (a) The distribution of singular values in spatiotemporal data is long-tailed. The existence of missing data can increase its rank (or singular values). (b) Low-rank models can filter out informative signals and generate a smooth reconstruction, resulting in truncating too much energy in the left part of its spectrum. (c) Deep models can preserve high-frequency noise and generate sharp imputations, maintaining too much energy for the right part of the singular spectrum. With the generality of low-rank models and the expressivity of deep models, model achieves a signal-noise balance for accurate imputation.
Figure 2: Low-rankness in time series and the induced model. (a) Redundancy in time series: PEMS08 data can be reasonably reconstructed using only five dominant patterns. (b) Low-rank spatial attention map: the singular values of the multivariate attention map show a long-tailed distribution and most of them are small values. (c) Fourier sparsity in both space and time axes: both the spatial and temporal signals possess a sparse Fourier spectrum, with most amplitudes close to zero.
Figure 3: Comparison of computational efficiency.
Figure 4: Inference under different lengths of input sequence with a single trained model (zero-shot).
Figure 5: Impact of $\lambda$ in FIL.
...and 8 more figures

Theorems & Definitions (3)

Remark : Difference between projected attention and canonical self-attention
Remark : Difference between embedded attention and canonical self-attention
Lemma : Equivalence between convolution nuclear norm and Fourier $\ell_1$ norm liu2022recoverychen2022laplacian

ImputeFormer: Low Rankness-Induced Transformers for Generalizable Spatiotemporal Imputation

TL;DR

Abstract

ImputeFormer: Low Rankness-Induced Transformers for Generalizable Spatiotemporal Imputation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (3)