Table of Contents
Fetching ...

Inhomogeneous Submatrix Detection

Mor Oren-Loberman, Dvir Jerbi andd Tamir Bendory, Wasim Huleihel

TL;DR

The statistical limits of detection are analyzed by proving information-theoretic lower bounds and by designing algorithms that match these bounds up to logarithmic factors, for a wide family of templates.

Abstract

In this paper, we study the problem of detecting multiple hidden submatrices in a large Gaussian random matrix when the planted signal is inhomogeneous across entries. Under the null hypothesis, the observed matrix has independent and identically distributed standard normal entries. Under the alternative, there exist several planted submatrices whose entries deviate from the background in one of two ways: in the mean-shift model, planted entries (templates) have nonzero and possibly varying means; in the variance-shift model, planted entries have inflated and possibly varying variances. We consider two placement regimes for the planted submatrices. In the first, the row and column index sets are arbitrary. Motivated by scientific applications, in the second regime the row and column indices are restricted to be consecutive. For both alternatives and both placement regimes, we analyze the statistical limits of detection by proving information-theoretic lower bounds and by designing algorithms that match these bounds up to logarithmic factors, for a wide family of templates.

Inhomogeneous Submatrix Detection

TL;DR

The statistical limits of detection are analyzed by proving information-theoretic lower bounds and by designing algorithms that match these bounds up to logarithmic factors, for a wide family of templates.

Abstract

In this paper, we study the problem of detecting multiple hidden submatrices in a large Gaussian random matrix when the planted signal is inhomogeneous across entries. Under the null hypothesis, the observed matrix has independent and identically distributed standard normal entries. Under the alternative, there exist several planted submatrices whose entries deviate from the background in one of two ways: in the mean-shift model, planted entries (templates) have nonzero and possibly varying means; in the variance-shift model, planted entries have inflated and possibly varying variances. We consider two placement regimes for the planted submatrices. In the first, the row and column index sets are arbitrary. Motivated by scientific applications, in the second regime the row and column indices are restricted to be consecutive. For both alternatives and both placement regimes, we analyze the statistical limits of detection by proving information-theoretic lower bounds and by designing algorithms that match these bounds up to logarithmic factors, for a wide family of templates.
Paper Structure (39 sections, 10 theorems, 142 equations, 2 figures, 1 table)

This paper contains 39 sections, 10 theorems, 142 equations, 2 figures, 1 table.

Key Result

Theorem 1

Consider the finite-template mean-shift model introduced in Section sec:prob_form. Let $M_{\max}$ be defined as in eq:M_max. The following statements hold.

Figures (2)

  • Figure 1: Schematic illustration of the placement families (shown for $n=16$, $k=4$). In the non-consecutive model, arbitrary row and column subsets are selected, yielding blocks of the form $\mathsf{S}\times\mathsf{T}$. In the consecutive model, the row and column sets are intervals of length $k$.
  • Figure 2: Illustration of the coordinate map $\varphi_{\mathsf{B}}$ in \ref{['eq:induced_coord_map']}. For a block $\mathsf{B}=\mathsf{S}\times\mathsf{T}$, the map $\varphi_{\mathsf{B}}(i,j)=(u,v)$ records the relative row and column indices of $(i,j)$ within $\mathsf{B}$, thereby aligning entries of $\mathsf{B}$ with template coordinates. The figure shows both a consecutive block $\mathsf{B}_1$ and a non-consecutive block $\mathsf{B}_2$ mapped to their respective templates.

Theorems & Definitions (22)

  • Theorem 1: Mean-shift upper bounds
  • Theorem 2: Variance-shift upper bounds
  • Remark 1
  • Remark 2
  • Theorem 3: Information-theoretic lower bounds
  • Corollary 4: Impossibility for standard consecutive placements
  • proof : Proof sketch of Corollary \ref{['cor:standard_consecutive']}
  • Definition 1: Smooth-signal regime
  • Corollary 5: Smooth-signal upper bounds
  • Corollary 6: Smooth-signal lower bounds
  • ...and 12 more