Table of Contents
Fetching ...

On the Effect of Regularization on Nonparametric Mean-Variance Regression

Eliot Wong-Toi, Alex Boyd, Vincent Fortuin, Stephan Mandt

TL;DR

The paper investigates why overparameterized mean-variance regression exhibits sharp phase transitions as regularization changes, hindering reliable uncertainty quantification. It introduces a field-theoretic framework that casts the learning problem as a variational problem over mean and input-dependent precision fields, deriving coupled Euler–Lagrange equations that reveal how data fidelity and smoothing compete across input space. A Bayesian reformulation (BFT) places smoothness priors directly on the predictor fields, linking the deterministic FT to Gaussian-process–like priors and enabling ensemble-based uncertainty estimation. Experiments on synthetic data, UCI benchmarks, and ClimSim show consistent phase diagrams with stable, underfitting, and overfitting regimes, and demonstrate that a one-dimensional reparameterization of regularization along a diagonal reduces hyperparameter search while maintaining calibration performance. This work provides both theoretical insight into MVR instabilities and a practical tuning strategy that improves robust uncertainty quantification in large-scale, heterogeneous data contexts.

Abstract

Uncertainty quantification is vital for decision-making and risk assessment in machine learning. Mean-variance regression models, which predict both a mean and residual noise for each data point, provide a simple approach to uncertainty quantification. However, overparameterized mean-variance models struggle with signal-to-noise ambiguity, deciding whether prediction targets should be attributed to signal (mean) or noise (variance). At one extreme, models fit all training targets perfectly with zero residual noise, while at the other, they provide constant, uninformative predictions and explain the targets as noise. We observe a sharp phase transition between these extremes, driven by model regularization. Empirical studies with varying regularization levels illustrate this transition, revealing substantial variability across repeated runs. To explain this behavior, we develop a statistical field theory framework, which captures the observed phase transition in alignment with experimental results. This analysis reduces the regularization hyperparameter search space from two dimensions to one, significantly lowering computational costs. Experiments on UCI datasets and the large-scale ClimSim dataset demonstrate robust calibration performance, effectively quantifying predictive uncertainty.

On the Effect of Regularization on Nonparametric Mean-Variance Regression

TL;DR

The paper investigates why overparameterized mean-variance regression exhibits sharp phase transitions as regularization changes, hindering reliable uncertainty quantification. It introduces a field-theoretic framework that casts the learning problem as a variational problem over mean and input-dependent precision fields, deriving coupled Euler–Lagrange equations that reveal how data fidelity and smoothing compete across input space. A Bayesian reformulation (BFT) places smoothness priors directly on the predictor fields, linking the deterministic FT to Gaussian-process–like priors and enabling ensemble-based uncertainty estimation. Experiments on synthetic data, UCI benchmarks, and ClimSim show consistent phase diagrams with stable, underfitting, and overfitting regimes, and demonstrate that a one-dimensional reparameterization of regularization along a diagonal reduces hyperparameter search while maintaining calibration performance. This work provides both theoretical insight into MVR instabilities and a practical tuning strategy that improves robust uncertainty quantification in large-scale, heterogeneous data contexts.

Abstract

Uncertainty quantification is vital for decision-making and risk assessment in machine learning. Mean-variance regression models, which predict both a mean and residual noise for each data point, provide a simple approach to uncertainty quantification. However, overparameterized mean-variance models struggle with signal-to-noise ambiguity, deciding whether prediction targets should be attributed to signal (mean) or noise (variance). At one extreme, models fit all training targets perfectly with zero residual noise, while at the other, they provide constant, uninformative predictions and explain the targets as noise. We observe a sharp phase transition between these extremes, driven by model regularization. Empirical studies with varying regularization levels illustrate this transition, revealing substantial variability across repeated runs. To explain this behavior, we develop a statistical field theory framework, which captures the observed phase transition in alignment with experimental results. This analysis reduces the regularization hyperparameter search space from two dimensions to one, significantly lowering computational costs. Experiments on UCI datasets and the large-scale ClimSim dataset demonstrate robust calibration performance, effectively quantifying predictive uncertainty.

Paper Structure

This paper contains 78 sections, 7 theorems, 99 equations, 7 figures, 4 tables.

Key Result

Proposition 1

Assume that $\mathcal{X} \subset \mathbb R^d$ is a bounded, connected Lipschitz domain, that $p \in C^1(\overline{\mathcal{X}})$ is strictly positive on $\overline{\mathcal{X}}$, and that the data field satisfies $y \in H^1(\mathcal{X})$. Let $(\hat{\mu},\hat{\Lambda}) \in H^1(\mathcal{X}) \times H^ Hence, well-posed formulations require $\rho\!\in\!(0,1)$ and $\gamma\!\in\!(0,1)$, or equivalently

Figures (7)

  • Figure 1: Phase-diagram sketch of mean–variance regression in the $(\rho,\gamma)$ plane (left). Here $\rho$ controls the data-fit vs. smoothness trade-off, and $\gamma$ allocates smoothness between the mean and precision functions. Labeled regions indicate mean collapse ($O_\mu$), variance collapse ($O_\Lambda$), underfitting ($U_\mu$, $U_\Lambda$), and the stable regime $S$. Solid and dotted curves mark sharp and smooth transitions. Representative FT mean fits (red, with pointwise $\pm$ s.d. in orange) illustrate each regime (middle, right).
  • Figure 2: Ensemble fits from two modeling approaches. Training data are shown in orange; the ensemble mean (blue) and its pointwise $\pm 1$ s.d. band (shaded) are overlaid for six independent runs. Panels (\ref{['fig:ensemble_mlp_l2']}) and (\ref{['fig:ensemble_ft']}) show a neural implementation and its FT counterpart, respectively. Panels (\ref{['fig:ensemble_c']})--(\ref{['fig:ensemble_f']}) illustrate representative neural network fits in different overfitting and underfitting regimes, with panel (\ref{['fig:ensemble_f']}) displaying phase coexistence in $(\rho,\gamma)$ space.
  • Figure 3: Array plot of evaluation metrics (rows) across datasets or fitting methods (columns) on the $(\rho,\gamma)$ regularization grid. The leftmost column shows FT solutions; remaining columns show neural-network fits on held-out data. Each heatmap averages six runs. Ticks mark $\rho=0.5$ and $\gamma=0.5$ in the lower-left panel. Both axes use a logit parameterization of $\rho,\gamma\in(0,1)$ to highlight limiting behaviors near $0$ and $1$. The FT captures the same transition structure observed in the empirical diagrams across datasets.
  • Figure 4: We compute the standard deviation over six runs for each metric in \ref{['fig:summary_mean']}, illustrating how variability changes across the regularization space. The shapes of the instability regions remain consistent across datasets and between the neural networks and the FT, as reflected in the Dirichlet energies and geometric complexities. These quantities show the largest disagreement in overfitting regimes, though this does not always correspond to high variability in the MSEs.
  • Figure 5: Test metrics across six runs along the $\rho = 1-\gamma$ diagonal. Stars denote the minimum MSE for each dataset. All metrics are plotted on a $\log_{10}$ scale, and $\rho$ is shown on a logit scale to highlight behavior near the boundaries. Errors drop sharply near the transition into the $S$ phase and then increase again as $\rho$ moves past this region, consistent with the qualitative structure in \ref{['fig:cartoonphases']}.
  • ...and 2 more figures

Theorems & Definitions (15)

  • Proposition 1: Combined analytical structure of MVR
  • Lemma 1: Weighted Green’s identity
  • proof
  • Remark 1: On the weighted Laplacian
  • Corollary 1: Natural (Neumann) boundary conditions
  • Proposition 2: General Field Theory
  • proof
  • Remark 2: Uniform-density case
  • Proposition 3: Extreme Settings in the General FT
  • proof
  • ...and 5 more