Table of Contents
Fetching ...

Tensor-based compression of the sea temperature data

Ilya Kosolapov, Tatiana Sheloput, Sergey Matveev

TL;DR

The study addresses compressing a large spatiotemporal sea temperature tensor with extensive land-induced gaps by testing a partition-then-SVD pipeline against a tensor completion approach. It demonstrates that a greedy spatial partitioning into ocean blocks, followed by Tucker, TT, or QTT decompositions, achieves consistent compression while keeping the maximum absolute temperature error under $0.5^\circ$C, with January data generally more compressible than May. A key finding is a strong temporal dependency in compressibility, with an approximately two-day optimal partition across formats. Overall, the proposed approach delivers around 4–5× compression on the full dataset and offers a practical, robust option for storing and processing large environmental tensors without the drawbacks of completion-based methods.

Abstract

In this work we investigate efficient data compression for spatiotemporal Black, Azov and Marmara Seas temperature tensors that contain significant number of missing values. These tensors have a complex structure influenced by the coastlines and bathymetry, as well as temporal temperature changes. While such missing data typically provokes utilization of tensor completion algorithms, we demonstrate that standard SVD-based compression approaches (including the Tucker, Tensor-Train (TT) and Quantized-TT formats) are remarkably effective and yield comparable results. We propose a greedy spatial data partitioning algorithm enhancing their performance. We divide the data into the smaller subtensors before compression via exploitation of this trick. Furthermore, our analysis reveals a strong temporal dependency in the data's compressibility caused by its nature. Fixing the level of precision we observe a significant seasonal variation. Investigating this, we find that a temporal partitioning on a scale of approximately two days is nearly optimal for all tested tensor based formats. The combined application of these spatial and temporal strategies with tensor methods ultimately achieves a robust compression ratio of 5 times across the entire dataset.

Tensor-based compression of the sea temperature data

TL;DR

The study addresses compressing a large spatiotemporal sea temperature tensor with extensive land-induced gaps by testing a partition-then-SVD pipeline against a tensor completion approach. It demonstrates that a greedy spatial partitioning into ocean blocks, followed by Tucker, TT, or QTT decompositions, achieves consistent compression while keeping the maximum absolute temperature error under C, with January data generally more compressible than May. A key finding is a strong temporal dependency in compressibility, with an approximately two-day optimal partition across formats. Overall, the proposed approach delivers around 4–5× compression on the full dataset and offers a practical, robust option for storing and processing large environmental tensors without the drawbacks of completion-based methods.

Abstract

In this work we investigate efficient data compression for spatiotemporal Black, Azov and Marmara Seas temperature tensors that contain significant number of missing values. These tensors have a complex structure influenced by the coastlines and bathymetry, as well as temporal temperature changes. While such missing data typically provokes utilization of tensor completion algorithms, we demonstrate that standard SVD-based compression approaches (including the Tucker, Tensor-Train (TT) and Quantized-TT formats) are remarkably effective and yield comparable results. We propose a greedy spatial data partitioning algorithm enhancing their performance. We divide the data into the smaller subtensors before compression via exploitation of this trick. Furthermore, our analysis reveals a strong temporal dependency in the data's compressibility caused by its nature. Fixing the level of precision we observe a significant seasonal variation. Investigating this, we find that a temporal partitioning on a scale of approximately two days is nearly optimal for all tested tensor based formats. The combined application of these spatial and temporal strategies with tensor methods ultimately achieves a robust compression ratio of 5 times across the entire dataset.

Paper Structure

This paper contains 12 sections, 25 equations, 8 figures, 12 tables, 5 algorithms.

Figures (8)

  • Figure 1: Schematic distribution of $\sigma$-levels by ocean depth.
  • Figure 2: Results of greedy partitioning of sea temperature data layer into blocks. In total we have 23 blocks
  • Figure 3: The largest block $\mathcal{X}_{I{\text{max}}}$
  • Figure 4: Convergence of the SVP algorithm on the random mask of the measured elements, assessed by the Frobenius norm (left) and Chebyshev norm (right) for $\mathcal{X}_{I_{\max}}$ of January data. The dependence on the number of iterations is shown for different compression ratios $CR = 2, 4, \dots, 10$.
  • Figure 5: Convergence of the SVP algorithm on the random mask of the measured elements, assessed by the Frobenius norm (left) and Chebyshev norm (right) for $\mathcal{X}_{I_{\max}}$ of May data. The dependence on the number of iterations is shown for different compression ratios $CR = 2, 4, \dots, 10$..
  • ...and 3 more figures