Table of Contents
Fetching ...

A Sparse Tensor Generator with Efficient Feature Extraction

Tugba Torun, Ameer Taweel, Didem Unat

TL;DR

The paper tackles the scarcity of large-scale sparse tensor datasets and the need for informative, multi-mode tensor features to guide format choice and algorithm selection. It introduces FeaTensor, a parallel, multi-method feature extractor that covers global and mode-dependent statistics, and GenTensor, a feature-preserving tensor generator that uses size-independent metrics to emulate real tensor sparsity patterns. Through extensive experiments, FeaTensor’s methods are shown to yield identical feature sets with different runtime trade-offs, while GenTensor produces realistic tensors whose sparsity structure and CPD performance closely resemble real data, even across higher orders. Both tools are open-source, enabling scalable benchmarking and principled evaluation of tensor algorithms and storage schemes.

Abstract

Sparse tensor operations are increasingly important in diverse applications such as social networks, deep learning, diagnosis, crime, and review analysis. However, a major obstacle in sparse tensor research is the lack of large-scale sparse tensor datasets. Another challenge lies in analyzing sparse tensor features, which are essential not only for understanding the nonzero pattern but also for selecting the most suitable storage format, decomposition algorithm, and reordering methods. However, due to the large size of real-world tensors, even extracting these features can be computationally expensive without careful optimization. To address these limitations, we have developed a smart sparse tensor generator that replicates key characteristics of real sparse tensors. Additionally, we propose efficient methods for extracting a comprehensive set of sparse tensor features. The effectiveness of our generator is validated through the quality of extracted features and the performance of decomposition on the generated tensors. Both the sparse tensor feature extractor and the tensor generator are open source with all the artifacts available at https://github.com/sparcityeu/FeaTensor and https://github.com/sparcityeu/GenTensor, respectively.

A Sparse Tensor Generator with Efficient Feature Extraction

TL;DR

The paper tackles the scarcity of large-scale sparse tensor datasets and the need for informative, multi-mode tensor features to guide format choice and algorithm selection. It introduces FeaTensor, a parallel, multi-method feature extractor that covers global and mode-dependent statistics, and GenTensor, a feature-preserving tensor generator that uses size-independent metrics to emulate real tensor sparsity patterns. Through extensive experiments, FeaTensor’s methods are shown to yield identical feature sets with different runtime trade-offs, while GenTensor produces realistic tensors whose sparsity structure and CPD performance closely resemble real data, even across higher orders. Both tools are open-source, enabling scalable benchmarking and principled evaluation of tensor algorithms and storage schemes.

Abstract

Sparse tensor operations are increasingly important in diverse applications such as social networks, deep learning, diagnosis, crime, and review analysis. However, a major obstacle in sparse tensor research is the lack of large-scale sparse tensor datasets. Another challenge lies in analyzing sparse tensor features, which are essential not only for understanding the nonzero pattern but also for selecting the most suitable storage format, decomposition algorithm, and reordering methods. However, due to the large size of real-world tensors, even extracting these features can be computationally expensive without careful optimization. To address these limitations, we have developed a smart sparse tensor generator that replicates key characteristics of real sparse tensors. Additionally, we propose efficient methods for extracting a comprehensive set of sparse tensor features. The effectiveness of our generator is validated through the quality of extracted features and the performance of decomposition on the generated tensors. Both the sparse tensor feature extractor and the tensor generator are open source with all the artifacts available at https://github.com/sparcityeu/FeaTensor and https://github.com/sparcityeu/GenTensor, respectively.
Paper Structure (27 sections, 3 figures, 7 tables, 3 algorithms)

This paper contains 27 sections, 3 figures, 7 tables, 3 algorithms.

Figures (3)

  • Figure 1: Sample slice and fibers of a 3-mode tensor.
  • Figure 2: The workflow of feature extraction for a 3-mode tensor using the mode-order approach.
  • Figure 3: Runtime comparison for different feature extraction methods.