A computational transition for detecting multivariate shuffled linear regression by low-degree polynomials
Zhangsong Li
TL;DR
The paper analyzes multivariate shuffled linear regression under a Bayesian setup where the row correspondence between predictors and responses is permuted. It introduces a low-degree polynomial framework to study the information-computation trade-off in the detection problem and identifies a phase transition across regimes defined by the dimensions $d$, $m$ and the noise level $\sigma$. Three regimes are established: (i) hardness for $m=o(d)$ with degree up to $D$ when $D^4=o(d/m)$, (ii) hardness for $m=d$ and large $\sigma$ with $D=o(\sigma^{-1})$, and (iii) tractability for $m=d$ and small $\sigma$ via a constant-degree polynomial that strongly distinguishes. The results provide evidence for an information-computation gap in this model, connect to lattice-based methods in the noiseless case, and suggest directions for stronger lower bounds (e.g., SOS or SQ) in broader parameter ranges. Overall, the work contributes a rigorous, phase-transition view of the computational limits of detection in shuffled multivariate regression and clarifies how problem dimensions and noise shape algorithmic feasibility.
Abstract
In this paper, we study the problem of multivariate shuffled linear regression, where the correspondence between predictors and responses in a linear model is obfuscated by a latent permutation. Specifically, we investigate the model $Y=\tfrac{1}{\sqrt{1+σ^2}}(Π_* X Q_* + σZ)$, where $X$ is an $n*d$ standard Gaussian design matrix, $Z$ is an $n*m$ Gaussian noise matrix, $Π_*$ is an unknown $n*n$ permutation matrix, and $Q_*$ is an unknown $d*m$ on the Grassmanian manifold satisfying $Q_*^{\top} Q_* = \mathbb I_m$. Consider the hypothesis testing problem of distinguishing this model from the case where $X$ and $Y$ are independent Gaussian random matrices of sizes $n*d$ and $n*m$, respectively. Our results reveal a phase transition phenomenon in the performance of low-degree polynomial algorithms for this task. (1) When $m=o(d)$, we show that all degree-$D$ polynomials fail to distinguish these two models even when $σ=0$, provided with $D^4=o\big( \tfrac{d}{m} \big)$. (2) When $m=d$ and $σ=ω(1)$, we show that all degree-$D$ polynomials fail to distinguish these two models provided with $D=o(σ)$. (3) When $m=d$ and $σ=o(1)$, we show that there exists a constant-degree polynomial that strongly distinguish these two models. These results establish a smooth transition in the effectiveness of low-degree polynomial algorithms for this problem, highlighting the interplay between the dimensions $m$ and $d$, the noise level $σ$, and the computational complexity of the testing task.
