Table of Contents
Fetching ...

Precise analysis of ridge interpolators under heavy correlations -- a Random Duality Theory view

Mihailo Stojnic

TL;DR

Close form results show how the risk explicitly depends on all key model parameters, including the problem dimensions and covariance matrices, including the problem dimensions and covariance matrices.

Abstract

We consider fully row/column-correlated linear regression models and study several classical estimators (including minimum norm interpolators (GLS), ordinary least squares (LS), and ridge regressors). We show that \emph{Random Duality Theory} (RDT) can be utilized to obtain precise closed form characterizations of all estimators related optimizing quantities of interest, including the \emph{prediction risk} (testing or generalization error). On a qualitative level out results recover the risk's well known non-monotonic (so-called double-descent) behavior as the number of features/sample size ratio increases. On a quantitative level, our closed form results show how the risk explicitly depends on all key model parameters, including the problem dimensions and covariance matrices. Moreover, a special case of our results, obtained when intra-sample (or time-series) correlations are not present, precisely match the corresponding ones obtained via spectral methods in [6,16,17,24].

Precise analysis of ridge interpolators under heavy correlations -- a Random Duality Theory view

TL;DR

Close form results show how the risk explicitly depends on all key model parameters, including the problem dimensions and covariance matrices, including the problem dimensions and covariance matrices.

Abstract

We consider fully row/column-correlated linear regression models and study several classical estimators (including minimum norm interpolators (GLS), ordinary least squares (LS), and ridge regressors). We show that \emph{Random Duality Theory} (RDT) can be utilized to obtain precise closed form characterizations of all estimators related optimizing quantities of interest, including the \emph{prediction risk} (testing or generalization error). On a qualitative level out results recover the risk's well known non-monotonic (so-called double-descent) behavior as the number of features/sample size ratio increases. On a quantitative level, our closed form results show how the risk explicitly depends on all key model parameters, including the problem dimensions and covariance matrices. Moreover, a special case of our results, obtained when intra-sample (or time-series) correlations are not present, precisely match the corresponding ones obtained via spectral methods in [6,16,17,24].
Paper Structure (18 sections, 11 theorems, 191 equations, 3 figures)

This paper contains 18 sections, 11 theorems, 191 equations, 3 figures.

Key Result

Lemma 1

(Algebraic optimization representation) Let $V\in{\mathbb R}^{n\times n}$ and $\overline{U}\in{\mathbb R}^{m\times m}$ be two given unitary (orthogonal) matrices and let $\Sigma\in{\mathbb R}^{n\times n}$ and $\overline{\Sigma}\in{\mathbb R}^{m\times m}$ be two given diagonal positive definite matri Then

Figures (3)

  • Figure 1: Prediction risk -- all three estimators (GLS, Ridge, and LS); row-correlated features $X$; Covariance matrices are: $A={\mathcal{A}}(q)$, and $\overline{A}={\mathcal{A}}(q_v)$; $q=0.5,q_v=0.4$.
  • Figure 2: Prediction risk -- all three estimators (GLS, Ridge, and LS); row-correlated features $X$; Covariance matrices are: $\overline{\overline{A}}={\mathcal{A}}(q_y)$, $A={\mathcal{A}}(q)$, and $\overline{A}={\mathcal{A}}(q_v)$; $q_y=0.7,q=0.5,q_v=0.4$.
  • Figure 3: Prediction risk as a function of intra-sample correlation -- GLS; Covariance matrices are: $\overline{\overline{A}}=\frac{1}{2}{\mathcal{A}}(q_y)$, $A=\frac{1}{2}{\mathcal{A}}(q)$, and $\overline{A}=\frac{1}{2}{\mathcal{A}}(q_v)$; $q=0.5,q_v=0.4$; $q_y\in[0,1]$.

Theorems & Definitions (22)

  • Lemma 1
  • proof
  • Theorem 1
  • proof
  • Lemma 2
  • proof
  • Theorem 2
  • proof
  • Lemma 3
  • proof
  • ...and 12 more