Efficient Algorithms for Verifying Kruskal Rank in Sparse Linear Regression and Related Applications
Fengqin Zhou
TL;DR
This work tackles the problem of verifying Kruskal rank for matrices encountered in sparse linear regression, tensor decomposition, and latent-variable models. It introduces a unified framework that fuses randomized hashing with dynamic programming to achieve high-probability correctness across binary fields, general finite fields, and integer matrices, with runtimes close to established lower bounds. The key contributions are threefold: (i) a collision-based hashing algorithm for binary fields with runtime $O(dk \cdot n^{\lceil k/2 \rceil})$, (ii) a finite-field extension with $GF(q)$ producing $O(dk \cdot (n(q-1))^{\lceil k/2 \rceil})$, and (iii) an integer-matrix approach leveraging Cramer’s rule and the Leibniz formula to obtain $O(dk \cdot (nM)^{\lceil k/2 \rceil})$ along with a deterministic dimensionality-reduction method. Together, these methods form a robust toolkit for certifying identifiability conditions in tensor decompositions and for diagnosing noise-transition matrices in deep learning, offering practical, near-optimal performance guarantees without requiring empirical experiments. The framework thus advances both the theoretical understanding and practical verification of linear-dependence structures in diverse data-model settings.
Abstract
We present novel algorithmic techniques to efficiently verify the Kruskal rank of matrices that arise in sparse linear regression, tensor decomposition, and latent variable models. Our unified framework combines randomized hashing techniques with dynamic programming strategies, and is applicable in various settings, including binary fields, general finite fields, and integer matrices. In particular, our algorithms achieve a runtime of $\mathcal{O}\left(dk \cdot \left(nM\right)^{\lceil k / 2 \rceil}\right)$ while ensuring high-probability correctness. Our contributions include: A unified framework for verifying Kruskal rank across different algebraic settings; Rigorous runtime and high-probability guarantees that nearly match known lower bounds; Practical implications for identifiability in tensor decompositions and deep learning, particularly for the estimation of noise transition matrices.
