Table of Contents
Fetching ...

The Conjugate Domain Dichotomy: Exact Risk of M-Estimators under Infinite-Variance Noise in High Dimensions

Charalampos Agiropoulos

Abstract

This paper studies high-dimensional M-estimation in the proportional asymptotic regime (p/n -> gamma > 0) when the noise distribution has infinite variance. For noise with regularly-varying tails of index alpha in (1,2), we establish that the asymptotic behavior of a regularized M-estimator is governed by a single geometric property of the loss function: the boundedness of the domain of its Fenchel conjugate. When this conjugate domain is bounded -- as is the case for the Huber, absolute-value, and quantile loss functions -- the dual variable in the min-max formulation of the estimator is confined, the effective noise reduces to the finite first absolute moment of the noise distribution, and the estimator achieves bounded risk without recourse to external information. When the conjugate domain is unbounded -- as for the squared loss -- the dual variable scales with the noise, the effective noise involves the diverging second moment, and bounded risk can be achieved only through transfer regularization toward an external prior. For the squared-loss class specifically, we derive the exact asymptotic risk via the Convex Gaussian Minimax Theorem under a noise-adapted regularization scaling. The resulting risk converges to a universal floor that is independent of the regularizer, yielding a loss-risk trichotomy: squared-loss estimators without transfer diverge; Huber-loss estimators achieve bounded but non-vanishing risk; transfer-regularized estimators attain the floor.

The Conjugate Domain Dichotomy: Exact Risk of M-Estimators under Infinite-Variance Noise in High Dimensions

Abstract

This paper studies high-dimensional M-estimation in the proportional asymptotic regime (p/n -> gamma > 0) when the noise distribution has infinite variance. For noise with regularly-varying tails of index alpha in (1,2), we establish that the asymptotic behavior of a regularized M-estimator is governed by a single geometric property of the loss function: the boundedness of the domain of its Fenchel conjugate. When this conjugate domain is bounded -- as is the case for the Huber, absolute-value, and quantile loss functions -- the dual variable in the min-max formulation of the estimator is confined, the effective noise reduces to the finite first absolute moment of the noise distribution, and the estimator achieves bounded risk without recourse to external information. When the conjugate domain is unbounded -- as for the squared loss -- the dual variable scales with the noise, the effective noise involves the diverging second moment, and bounded risk can be achieved only through transfer regularization toward an external prior. For the squared-loss class specifically, we derive the exact asymptotic risk via the Convex Gaussian Minimax Theorem under a noise-adapted regularization scaling. The resulting risk converges to a universal floor that is independent of the regularizer, yielding a loss-risk trichotomy: squared-loss estimators without transfer diverge; Huber-loss estimators achieve bounded but non-vanishing risk; transfer-regularized estimators attain the floor.

Paper Structure

This paper contains 29 sections, 10 theorems, 12 equations, 4 figures.

Key Result

Theorem 2.5

Under Assumptions ass:design--ass:rv, the following hold for the estimator eq:m-estimator:

Figures (4)

  • Figure 1: Risk as a function of noise scale for three squared-loss estimators. OLS and fixed-$\lambda$ ridge exhibit power-law divergence; transfer ridge is constant at $q_\Sigma \approx 0.001$.
  • Figure 2: Transfer ridge and transfer Lasso estimators converge to the same universal risk floor $q_\Sigma$ as the noise scale increases.
  • Figure 3: Finite-$\sigma_n$ risk of transfer ridge: theoretical prediction (solid) versus Monte Carlo simulation (dots, 500 trials each). Median relative error: $0.03\%$.
  • Figure 4: The loss--risk trichotomy. OLS and fixed ridge diverge; the Huber estimator plateaus at $\mathcal{R}_H \approx 0.19$; transfer ridge achieves $q_\Sigma \approx 0.001$.

Theorems & Definitions (22)

  • Definition 2.3: Winsorization
  • Definition 2.4: Conjugate domain classes
  • Theorem 2.5: Conjugate Domain Dichotomy
  • Corollary 2.6: The moment mechanism
  • Remark 2.7: The moment hierarchy
  • Theorem 3.1: Truncation universality
  • proof : Proof of (i)
  • proof : Proof of (ii)
  • Theorem 3.2: Fisher information diverges
  • proof
  • ...and 12 more