Tensor-based compression of the sea temperature data
Ilya Kosolapov, Tatiana Sheloput, Sergey Matveev
TL;DR
The study addresses compressing a large spatiotemporal sea temperature tensor with extensive land-induced gaps by testing a partition-then-SVD pipeline against a tensor completion approach. It demonstrates that a greedy spatial partitioning into ocean blocks, followed by Tucker, TT, or QTT decompositions, achieves consistent compression while keeping the maximum absolute temperature error under $0.5^\circ$C, with January data generally more compressible than May. A key finding is a strong temporal dependency in compressibility, with an approximately two-day optimal partition across formats. Overall, the proposed approach delivers around 4–5× compression on the full dataset and offers a practical, robust option for storing and processing large environmental tensors without the drawbacks of completion-based methods.
Abstract
In this work we investigate efficient data compression for spatiotemporal Black, Azov and Marmara Seas temperature tensors that contain significant number of missing values. These tensors have a complex structure influenced by the coastlines and bathymetry, as well as temporal temperature changes. While such missing data typically provokes utilization of tensor completion algorithms, we demonstrate that standard SVD-based compression approaches (including the Tucker, Tensor-Train (TT) and Quantized-TT formats) are remarkably effective and yield comparable results. We propose a greedy spatial data partitioning algorithm enhancing their performance. We divide the data into the smaller subtensors before compression via exploitation of this trick. Furthermore, our analysis reveals a strong temporal dependency in the data's compressibility caused by its nature. Fixing the level of precision we observe a significant seasonal variation. Investigating this, we find that a temporal partitioning on a scale of approximately two days is nearly optimal for all tested tensor based formats. The combined application of these spatial and temporal strategies with tensor methods ultimately achieves a robust compression ratio of 5 times across the entire dataset.
