Table of Contents
Fetching ...

Sample and Expand: Discovering Low-rank Submatrices With Quality Guarantees

Martino Ciaperoni, Aristides Gionis, Heikki Mannila

TL;DR

This paper addresses discovering submatrices that are provably close to a low-rank representation when the entire matrix is not globally low-rank. It introduces Sample-And-Expand, a two-phase method that first seeds a $2 \times 2$ near-rank-$1$ submatrix and then expands it to a larger near-low-rank submatrix while controlling the approximation error, with generalization to near-rank-$k$ patterns. The authors formalize LNROSR, LNROS, and LNR$k$S, prove NP-hardness for the latter two, and derive approximation guarantees linking the expansion process to row/column anchor ratios, along with probabilistic and scalability analyses. They validate the approach against strong baselines on synthetic and real data, showing favorable performance in recovering interpretable, local low-rank structures. Overall, the method provides provable guarantees and practical scalability for identifying local low-rank patterns across diverse domains.

Abstract

The problem of approximating a matrix by a low-rank one has been extensively studied. This problem assumes, however, that the whole matrix has a low-rank structure. This assumption is often false for real-world matrices. We consider the problem of discovering submatrices from the given matrix with bounded deviations from their low-rank approximations. We introduce an effective two-phase method for this task: first, we use sampling to discover small nearly low-rank submatrices, and then they are expanded while preserving proximity to a low-rank approximation. An extensive experimental evaluation confirms that the method we introduce compares favorably to existing approaches.

Sample and Expand: Discovering Low-rank Submatrices With Quality Guarantees

TL;DR

This paper addresses discovering submatrices that are provably close to a low-rank representation when the entire matrix is not globally low-rank. It introduces Sample-And-Expand, a two-phase method that first seeds a near-rank- submatrix and then expands it to a larger near-low-rank submatrix while controlling the approximation error, with generalization to near-rank- patterns. The authors formalize LNROSR, LNROS, and LNRS, prove NP-hardness for the latter two, and derive approximation guarantees linking the expansion process to row/column anchor ratios, along with probabilistic and scalability analyses. They validate the approach against strong baselines on synthetic and real data, showing favorable performance in recovering interpretable, local low-rank structures. Overall, the method provides provable guarantees and practical scalability for identifying local low-rank patterns across diverse domains.

Abstract

The problem of approximating a matrix by a low-rank one has been extensively studied. This problem assumes, however, that the whole matrix has a low-rank structure. This assumption is often false for real-world matrices. We consider the problem of discovering submatrices from the given matrix with bounded deviations from their low-rank approximations. We introduce an effective two-phase method for this task: first, we use sampling to discover small nearly low-rank submatrices, and then they are expanded while preserving proximity to a low-rank approximation. An extensive experimental evaluation confirms that the method we introduce compares favorably to existing approaches.

Paper Structure

This paper contains 4 sections, 2 figures.

Figures (2)

  • Figure 1: Example. A subset of data points (in orange) in the $3$-dimensional space are close to their projection (in red) onto a line in the $xy$-plane (a) or to a plane in the $3$-dimensional space (b), while other points (in blue) can be further away.
  • Figure 2: Hyperspectral dataset. On the left, we show the values of the rows in a $50 \times 50$ matrix and the nearly-proportional values of the rows in a near-rank-$1$$11 \times 44$ sub-matrix discovered by our method. On the right we show the matrix and, next to it, the discovered sub-matrix (top) and its accurate approximation expressing each row as collinear with the row highlighted in red (bottom).