Table of Contents
Fetching ...

Sharp Information-Theoretic Thresholds for Shuffled Linear Regression

Leon Lufkin, Yihong Wu, Jiaming Xu

TL;DR

This paper studies the problem of shuffled linear regression, where the correspondence between predictors and responses in a linear model is obfuscated by a latent permutation, and determines the sharp threshold of almost exact recovery to be SNR = n = 2, where all but a vanishing fraction of the permutation is reconstructed.

Abstract

This paper studies the problem of shuffled linear regression, where the correspondence between predictors and responses in a linear model is obfuscated by a latent permutation. Specifically, we consider the model $y = Π_* X β_* + w$, where $X$ is an $n \times d$ standard Gaussian design matrix, $w$ is Gaussian noise with entrywise variance $σ^2$, $Π_*$ is an unknown $n \times n$ permutation matrix, and $β_*$ is the regression coefficient, also unknown. Previous work has shown that, in the large $n$-limit, the minimal signal-to-noise ratio ($\mathsf{SNR}$), $\lVert β_* \rVert^2/σ^2$, for recovering the unknown permutation exactly with high probability is between $n^2$ and $n^C$ for some absolute constant $C$ and the sharp threshold is unknown even for $d=1$. We show that this threshold is precisely $\mathsf{SNR} = n^4$ for exact recovery throughout the sublinear regime $d=o(n)$. As a by-product of our analysis, we also determine the sharp threshold of almost exact recovery to be $\mathsf{SNR} = n^2$, where all but a vanishing fraction of the permutation is reconstructed.

Sharp Information-Theoretic Thresholds for Shuffled Linear Regression

TL;DR

This paper studies the problem of shuffled linear regression, where the correspondence between predictors and responses in a linear model is obfuscated by a latent permutation, and determines the sharp threshold of almost exact recovery to be SNR = n = 2, where all but a vanishing fraction of the permutation is reconstructed.

Abstract

This paper studies the problem of shuffled linear regression, where the correspondence between predictors and responses in a linear model is obfuscated by a latent permutation. Specifically, we consider the model , where is an standard Gaussian design matrix, is Gaussian noise with entrywise variance , is an unknown permutation matrix, and is the regression coefficient, also unknown. Previous work has shown that, in the large -limit, the minimal signal-to-noise ratio (), , for recovering the unknown permutation exactly with high probability is between and for some absolute constant and the sharp threshold is unknown even for . We show that this threshold is precisely for exact recovery throughout the sublinear regime . As a by-product of our analysis, we also determine the sharp threshold of almost exact recovery to be , where all but a vanishing fraction of the permutation is reconstructed.
Paper Structure (16 sections, 9 theorems, 87 equations)

This paper contains 16 sections, 9 theorems, 87 equations.

Key Result

Theorem 1

Fix an arbitrary $\epsilon>0$. Assume that $d=o(n)$.

Theorems & Definitions (11)

  • Theorem 1
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Lemma 6
  • Lemma 7
  • proof
  • Lemma 8
  • ...and 1 more