Revisiting the Sparse Matrix Compression Problem

Vincent Jugé; Dominik Köppl; Vincent Limouzy; Andrea Marino; Jannik Olblich; Giulia Punzi; Takeaki Uno

Revisiting the Sparse Matrix Compression Problem

Vincent Jugé, Dominik Köppl, Vincent Limouzy, Andrea Marino, Jannik Olblich, Giulia Punzi, Takeaki Uno

TL;DR

A DP-algorithm is devised that solves the problem for double-logarithmic matrix widths or logarithmic widths for further restrictions by trimming the non-zero borders of the resulting integer array by trimming the non-zero borders.

Abstract

The sparse matrix compression problem asks for a one-dimensional representation of a binary $n \times \ell$ matrix, formed by an integer array of row indices and a shift function for each row, such that accessing a matrix entry is possible in constant time by consulting this representation. It has been shown that the decision problem for finding an integer array of length $\ell+ρ$ or restricting the shift function up to values of $ρ$ is NP-complete (cf. the textbook of Garey and Johnson). As a practical heuristic, a greedy algorithm has been proposed to shift the $i$-th row until it forms a solution with its predecessor rows. Despite that this greedy algorithm is cherished for its good approximation in practice, we show that it actually exhibits an approximation ratio of $Θ(\sqrt{\ell+ρ})$. We give further hardness results for parameterizations such as the number of distinct rows or the maximum number of non-zero entries per row. Finally, we devise a DP-algorithm that solves the problem for double-logarithmic matrix widths or logarithmic widths for further restrictions. We study all these findings also under a new perspective by introducing a variant of the problem, where we wish to minimize the length of the resulting integer array by trimming the non-zero borders, which has not been studied in the literature before but has practical motivations.

Revisiting the Sparse Matrix Compression Problem

TL;DR

Abstract

The sparse matrix compression problem asks for a one-dimensional representation of a binary

matrix, formed by an integer array of row indices and a shift function for each row, such that accessing a matrix entry is possible in constant time by consulting this representation. It has been shown that the decision problem for finding an integer array of length

or restricting the shift function up to values of

is NP-complete (cf. the textbook of Garey and Johnson). As a practical heuristic, a greedy algorithm has been proposed to shift the

-th row until it forms a solution with its predecessor rows. Despite that this greedy algorithm is cherished for its good approximation in practice, we show that it actually exhibits an approximation ratio of

. We give further hardness results for parameterizations such as the number of distinct rows or the maximum number of non-zero entries per row. Finally, we devise a DP-algorithm that solves the problem for double-logarithmic matrix widths or logarithmic widths for further restrictions. We study all these findings also under a new perspective by introducing a variant of the problem, where we wish to minimize the length of the resulting integer array by trimming the non-zero borders, which has not been studied in the literature before but has practical motivations.

Paper Structure (16 sections, 9 theorems, 6 equations, 1 figure, 3 tables, 2 algorithms)

This paper contains 16 sections, 9 theorems, 6 equations, 1 figure, 3 tables, 2 algorithms.

Introduction
Preliminaries
Approximation Ratio of Greedy Strategies
Proof of \ref{['thmApproxSqrtM']}: Lower Bound
Proof of \ref{['thmApproxSqrtM']}: Upper Bound
Dynamic-Programming Algorithms
One Distinct Tile with Logarithmic Length
General DP-algorithm
Parameterized Hardness
One distinct tile (parameter $\mathbf{\kappa = 1}$)
Constant number of numerals per tile (parameter $\mathbf{\zeta}$)
Experiments
Problems' Variants and Relationships
Improved Algorithm for One Distinct Tile for Shorter Lengths
Missing Figures and Examples
...and 1 more sections

Key Result

Theorem 4

Both greedy strategies have an approximation ratio of ${m} \mathop{}\mathopen{}\mathcal{O}\mathopen{}$ for minLength (resp. minMaxShift) if the optimal solution has length $m$ (resp. maximum shift $m$). This ratio is tight in the sense that there is an instance for which both strategies exhibit a r

Figures (1)

Figure 1: Placement lengths for different heuristics studied in \ref{['section:experiments']}. Left: Input of the form $X,Y,X,Y,\ldots$ with two tile types. Right: Input of the form $X,Y,Z,X,Y,Z,\ldots$ with three tile types, and additional helper tiles to fill all gaps in the shortest solution. All tiles $X$, $Y$ and $Z$ have the same number of ones.

Theorems & Definitions (17)

Example 1
Example 3
Theorem 4
Lemma 5
Example 6
Theorem 7
Theorem 8
Theorem 9
Theorem 10
Theorem 11
...and 7 more

Revisiting the Sparse Matrix Compression Problem

TL;DR

Abstract

Revisiting the Sparse Matrix Compression Problem

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (17)