A Spatio-temporal CP decomposition analysis of New England region in the US
Fatoumata Sanogo
TL;DR
This work tackles spatio-temporal climate data analysis by leveraging a spatio-temporal PCA (STPCA) to initialize CANDECOMP/PARAFAC (CP) tensor decomposition, enabling more identifiable and accurate latent factors. CP decomposition is estimated via Alternating Least Squares (ALS), initialized with STPCA components, and its performance is contrasted with HOSVD and random initializations. A subsequent K-means clustering step on CP factors assesses the coherence of the extracted modes, with silhouette scores showing STPCA-based initialization yields superior cluster separation. Applied to NCAR precipitation and temperature data over New England, the approach improves reconstruction accuracy and reveals more distinct spatio-temporal regimes, offering a practical framework for downscaling and forecasting in regional climate analysis.
Abstract
Spatio temporal data consist of measurement for one or more raster fields such as weather, traffic volume, crime rate, or disease incidents. Advances in modern technology have increased the number of available information for this type of data hence the rise of multidimensional data. In this paper we take advantage of the multidimensional structure of the data but also its temporal and spatial structure. In fact, we will be using the NCAR Climate Data Gateway website which provides data discovery and access services for global and regional climate model data. The daily values of total precipitation (prec), maximum (tmax), and minimum (tmin) temperature are combined to create a multidimensional data called tensor (a multidimensional array). In this paper, we propose a spatio temporal principal component analysis to initialize CP decomposition component. We take full advantage of the spatial and temporal structure of the data in the initialization step for cp component analysis. The performance of our method is tested via comparison with most popular initialization method. We also run a clustering analysis to further show the performance of our analysis.
