Table of Contents
Fetching ...

Structured Matrix Learning under Arbitrary Entrywise Dependence and Estimation of Markov Transition Kernel

Jinhang Chai, Jianqing Fan

TL;DR

This work tackles structured matrix learning where the signal is a sum of a low-rank and a sparse part under arbitrary entrywise noise. It develops an incoherent-constrained least-squares estimator and proves deterministic and minimax-optimal guarantees by leveraging a novel separation lemma that forces energy to spread across entries in incoherent matrices. The framework is instantiated to Markov transition kernel estimation, achieving minimax rates, and extended to structured RL, multitask regression, and robust covariance estimation. A practical alternating-minimization algorithm is proposed and empirically validated, with convergence in a few iterations and strong performance in both simulated and real data settings. The results offer a broadly applicable blueprint for precise recovery under heavy dependence and have implications for scalable RL and high-dimensional statistical learning.

Abstract

The problem of structured matrix estimation has been studied mostly under strong noise dependence assumptions. This paper considers a general framework of noisy low-rank-plus-sparse matrix recovery, where the noise matrix may come from any joint distribution with arbitrary dependence across entries. We propose an incoherent-constrained least-square estimator and prove its tightness both in the sense of deterministic lower bound and matching minimax risks under various noise distributions. To attain this, we establish a novel result asserting that the difference between two arbitrary low-rank incoherent matrices must spread energy out across its entries; in other words, it cannot be too sparse, which sheds light on the structure of incoherent low-rank matrices and may be of independent interest. We then showcase the applications of our framework to several important statistical machine learning problems. In the problem of estimating a structured Markov transition kernel, the proposed method achieves the minimax optimality and the result can be extended to estimating the conditional mean operator, a crucial component in reinforcement learning. The applications to multitask regression and structured covariance estimation are also presented. We propose an alternating minimization algorithm to approximately solve the potentially hard optimization problem. Numerical results corroborate the effectiveness of our method which typically converges in a few steps.

Structured Matrix Learning under Arbitrary Entrywise Dependence and Estimation of Markov Transition Kernel

TL;DR

This work tackles structured matrix learning where the signal is a sum of a low-rank and a sparse part under arbitrary entrywise noise. It develops an incoherent-constrained least-squares estimator and proves deterministic and minimax-optimal guarantees by leveraging a novel separation lemma that forces energy to spread across entries in incoherent matrices. The framework is instantiated to Markov transition kernel estimation, achieving minimax rates, and extended to structured RL, multitask regression, and robust covariance estimation. A practical alternating-minimization algorithm is proposed and empirically validated, with convergence in a few iterations and strong performance in both simulated and real data settings. The results offer a broadly applicable blueprint for precise recovery under heavy dependence and have implications for scalable RL and high-dimensional statistical learning.

Abstract

The problem of structured matrix estimation has been studied mostly under strong noise dependence assumptions. This paper considers a general framework of noisy low-rank-plus-sparse matrix recovery, where the noise matrix may come from any joint distribution with arbitrary dependence across entries. We propose an incoherent-constrained least-square estimator and prove its tightness both in the sense of deterministic lower bound and matching minimax risks under various noise distributions. To attain this, we establish a novel result asserting that the difference between two arbitrary low-rank incoherent matrices must spread energy out across its entries; in other words, it cannot be too sparse, which sheds light on the structure of incoherent low-rank matrices and may be of independent interest. We then showcase the applications of our framework to several important statistical machine learning problems. In the problem of estimating a structured Markov transition kernel, the proposed method achieves the minimax optimality and the result can be extended to estimating the conditional mean operator, a crucial component in reinforcement learning. The applications to multitask regression and structured covariance estimation are also presented. We propose an alternating minimization algorithm to approximately solve the potentially hard optimization problem. Numerical results corroborate the effectiveness of our method which typically converges in a few steps.
Paper Structure (44 sections, 26 theorems, 180 equations, 6 figures, 2 tables, 2 algorithms)

This paper contains 44 sections, 26 theorems, 180 equations, 6 figures, 2 tables, 2 algorithms.

Key Result

Theorem 2.5

Let $\Delta_L=\widehat{L}-L^{\star}, \Delta_S=\widehat{S}-S^{\star}$. Under Assumptions asp:RSC, asp:RSC-parameters, and asp:L,S, we have

Figures (6)

  • Figure 1: Trajectories of optimization error for Algorithm 1 based on synthetic data with 3 noise scenarios. The left panel corresponds to no-noise case ($W=0$). The middle panel corresponds to the i.i.d. Gaussian noise with the standard deviation of each entry being $10^{-3}$. The right panel corresponds to the noise sampled from empirical probability error distribution with the standard error of each entry being approximately $10^{-3}$. In each panel, blue curve is the result of $\emph{Init1}$, the red curve is the result of $\emph{Init2}$, and the green curve is the result of Random Init. As can be seen, $\emph{Init1}$ and $\emph{Random Init}$ performs similarly and are better than $\emph{Init2}$. All three initialization methods converge the same optimal value.
  • Figure 2: Alternating minimization method for estimating Markov transition matrix. We show in the upper-left panel how error rates of estimating the frequency matrix in Frobenius norm decay with sample size. Not surprisingly, the decay rate is $\frac{1}{\sqrt{n}}$ as justified in Theorem \ref{['thm:main']}, which is clearly shown in the upper-right panel. The lower-left panel depicts how error rates of estimating the transition matrix in $L_1$ norm decay in the sample size. The curve is a little wiggling, reflecting the involvement of $\pi_{\min}$ in the bound given in the Theorem \ref{['thm:main']}. The lower right panel is shown to verify that our estimated matrix is indeed low-incoherent, consistent with our theory.
  • Figure 3: Alternating minimization method for estimating Markov transition matrix (continued). We plot in the left panel how error rates of estimating the frequency matrix in Frobenius norm decay with dimension $p$ using two methods. From the right panel, the decay is a typical square root $p$ as justifies in Theorem \ref{['thm:main']}.
  • Figure 4: The comparison of the spectral estimator and alternating minimization method in synthetic data. The left panel presents the Frobenius norm error of the frequency matrix, while the right panel depicts the $L_1$ norm error of the transition matrix. The estimation error of the alternating minimization method decreases monotonically as sample size increases, whereas the spectral method does not due to irreducible bias.
  • Figure 5: Singular value plot of $P^\star$ in real data
  • ...and 1 more figures

Theorems & Definitions (69)

  • Remark 1
  • Remark 2
  • Definition 2.3
  • Theorem 2.5
  • Remark 3
  • Lemma 2.6
  • Theorem 2.7
  • proof
  • Theorem 2.8
  • Remark 4
  • ...and 59 more