VISTA: Unsupervised 2D Temporal Dependency Representations for Time Series Anomaly Detection
Sinchee Chin, Fan Zhang, Xiaochen Yang, Jing-Hao Xue, Wenming Yang, Peng Jia, Guijin Wang, Luo Yingqun
TL;DR
VISTA introduces a training-free approach to multivariate time series anomaly detection by integrating STL-based decomposition, a 2D Temporal Correlation Matrix through Temporal Self-Attention, and memory-efficient multivariate aggregation with a greedy coreset memory bank. The method preserves temporal structure across trend, seasonal, and residual components, enabling a visualizable representation that can be processed by pretrained CNNs for robust feature extraction. Empirical results across five public TSAD datasets show state-of-the-art performance in both F1 and ROC-AUC, with ablations confirming the effectiveness of the 2D representation, full decomposition, and Layer3/Layer4 features. The work offers practical deployment benefits due to its training-free nature and provides a bridge between TSAD and image-based anomaly detection through interpretable 2D representations.
Abstract
Time Series Anomaly Detection (TSAD) is essential for uncovering rare and potentially harmful events in unlabeled time series data. Existing methods are highly dependent on clean, high-quality inputs, making them susceptible to noise and real-world imperfections. Additionally, intricate temporal relationships in time series data are often inadequately captured in traditional 1D representations, leading to suboptimal modeling of dependencies. We introduce VISTA, a training-free, unsupervised TSAD algorithm designed to overcome these challenges. VISTA features three core modules: 1) Time Series Decomposition using Seasonal and Trend Decomposition via Loess (STL) to decompose noisy time series into trend, seasonal, and residual components; 2) Temporal Self-Attention, which transforms 1D time series into 2D temporal correlation matrices for richer dependency modeling and anomaly detection; and 3) Multivariate Temporal Aggregation, which uses a pretrained feature extractor to integrate cross-variable information into a unified, memory-efficient representation. VISTA's training-free approach enables rapid deployment and easy hyperparameter tuning, making it suitable for industrial applications. It achieves state-of-the-art performance on five multivariate TSAD benchmarks.
