Universality in block dependent linear models with applications to nonparametric regression
Samriddha Lahiry, Pragya Sur
TL;DR
This work extends high-dimensional universality results to block-dependent covariate designs in the proportional regime by showing that the optimal empirical risk and the estimation risk for convex penalties (Lasso and ridge) converge to the corresponding Gaussian-design limits, even when covariates exhibit a block structure. The authors formalize a block-dependent sub-Gaussian design class and derive fixed-point characterizations that govern the exact asymptotic risks in nonparametric regression settings obtained via basis expansions (e.g., penalized additive models and scalar-on-function regression). A novel leave-d-out technique combined with Stein/CGMT-inspired arguments establishes compactness and universality of both the optimum and the optimizer under these dependencies. The results enable precise risk predictions for high-dimensional nonparametric regressions in applications like GWAS and functional data analysis, with experiments showing universality emerges at moderate sample sizes.
Abstract
Over the past decade, characterizing the exact asymptotic risk of regularized estimators in high-dimensional regression has emerged as a popular line of work. This literature considers the proportional asymptotics framework, where the number of features and samples both diverge, at a rate proportional to each other. Substantial work in this area relies on Gaussianity assumptions on the observed covariates. Further, these studies often assume the design entries to be independent and identically distributed. Parallel research investigates the universality of these findings, revealing that results based on the i.i.d.~Gaussian assumption extend to a broad class of designs, such as i.i.d.~sub-Gaussians. However, universality results examining dependent covariates so far focused on correlation-based dependence or a highly structured form of dependence, as permitted by right rotationally invariant designs. In this paper, we break this barrier and study a dependence structure that in general falls outside the purview of these established classes. We seek to pin down the extent to which results based on i.i.d.~Gaussian assumptions persist. We identify a class of designs characterized by a block dependence structure that ensures the universality of i.i.d.~Gaussian-based results. We establish that the optimal values of the regularized empirical risk and the risk associated with convex regularized estimators, such as the Lasso and ridge, converge to the same limit under block dependent designs as they do for i.i.d.~Gaussian entry designs. Our dependence structure differs significantly from correlation-based dependence, and enables, for the first time, asymptotically exact risk characterization in prevalent nonparametric regression problems in high dimensions. Finally, we illustrate through experiments that this universality becomes evident quite early, even for relatively moderate sample sizes.
