Understanding Pathologies of Deep Heteroskedastic Regression

Eliot Wong-Toi; Alex Boyd; Vincent Fortuin; Stephan Mandt

Understanding Pathologies of Deep Heteroskedastic Regression

Eliot Wong-Toi, Alex Boyd, Vincent Fortuin, Stephan Mandt

TL;DR

This work analyzes deep, overparameterized heteroskedastic regression through a field-theoretic lens, revealing phase-transition-like behavior in a bounded regularization space ($\rho$, $\gamma$). By deriving a nonparametric two-field free energy $\mathcal{L}_{\rho,\gamma}$ and its stationary PDEs, the authors explain how mean and noise-variance networks can overfit in distinct regimes and predict a well-calibrated region $S$. Numerical FT solutions qualitatively match experiments with neural networks across diverse data, enabling a practical one-dimensional hyperparameter search along $\rho\approx1-\gamma$ that substitutes for a two-dimensional sweep and improves calibration. The approach offers a physics-inspired, architecture-agnostic explanation for calibration pathologies and yields tangible gains in climate modeling and regression tasks, while outlining avenues for fully Bayesian extensions.

Abstract

Deep, overparameterized regression models are notorious for their tendency to overfit. This problem is exacerbated in heteroskedastic models, which predict both mean and residual noise for each data point. At one extreme, these models fit all training data perfectly, eliminating residual noise entirely; at the other, they overfit the residual noise while predicting a constant, uninformative mean. We observe a lack of middle ground, suggesting a phase transition dependent on model regularization strength. Empirical verification supports this conjecture by fitting numerous models with varying mean and variance regularization. To explain the transition, we develop a theoretical framework based on a statistical field theory, yielding qualitative agreement with experiments. As a practical consequence, our analysis simplifies hyperparameter tuning from a two-dimensional to a one-dimensional search, substantially reducing the computational burden. Experiments on diverse datasets, including UCI datasets and the large-scale ClimSim climate dataset, demonstrate significantly improved performance in various calibration tasks.

Understanding Pathologies of Deep Heteroskedastic Regression

TL;DR

This work analyzes deep, overparameterized heteroskedastic regression through a field-theoretic lens, revealing phase-transition-like behavior in a bounded regularization space (

). By deriving a nonparametric two-field free energy

and its stationary PDEs, the authors explain how mean and noise-variance networks can overfit in distinct regimes and predict a well-calibrated region

. Numerical FT solutions qualitatively match experiments with neural networks across diverse data, enabling a practical one-dimensional hyperparameter search along

that substitutes for a two-dimensional sweep and improves calibration. The approach offers a physics-inspired, architecture-agnostic explanation for calibration pathologies and yields tangible gains in climate modeling and regression tasks, while outlining avenues for fully Bayesian extensions.

Abstract

Paper Structure (44 sections, 2 theorems, 15 equations, 8 figures, 5 tables)

This paper contains 44 sections, 2 theorems, 15 equations, 8 figures, 5 tables.

Introduction
Pitfalls of Overparameterized Heteroskedastic Regression
Heteroskedastic Regression
Overparameterized Neural Networks
Pitfalls of MLE
Regularization
Reparameterized Regularization
Qualitative Description of Phases
Theoretic Considerations
Field Theory
Numerically Solving the FT
FT Insights
Experiments
Modeling Choices
Datasets
...and 29 more sections

Key Result

Proposition 1

Under the assumptions of our FT (see above), the following properties hold: (i) in the absence of regularization ($\rho = 1$), there are no solutions to the FT; (ii) in the absence of data ($\rho = 0$), there is no unique solution to the FT; and (iii) there are no valid solutions to the FT if $\rho\

Figures (8)

Figure 1: Visualization of a typical phase diagram in $\rho-\gamma$ regularization space for a heteroskedastic regression model (left). Solid and dotted lines indicate sharp and smooth transitions in model behavior respectively. Example model mean fits shown in red (with pointwise $\pm$ standard deviation in orange) from the FT for each key phase (middle and right).
Figure 2: Array plot of metrics (rows) across different data or fitting techniques (columns). Leftmost column: results from our field theory (FT); remaining columns: results from fitting neural networks to data (data sets refer to test splits). Averaged results of six runs are shown. Intermediate ticks mark $\gamma=0.5$ and $\rho=0.5$ on the lower-left plot. Our FT aligns qualitatively well with empirical phase diagrams, with consistent phase transitions across models and datasets.
Figure 3: Test metrics for six runs achieved along the $\rho=1-\gamma$ minor diagonal. Stars indicate minimum MSE values. All metrics are reported on a $\log_{10}$ scale. $\rho$ values are shown on a logit scale with $\overline{10^k}:=1-10^k$. From left to right, note the sharp decrease in test metric values, especially in the solutions to neural network models followed by a typical smoother increase. This empirically supports the existence of the well-calibrated $S$ phase shown in \ref{['fig:cartoonphases']}.
Figure 4: Visualization of heteroskedastic and homoskedastic versions of simulated datasets. Specific details for the functional form of these can be found in \ref{['tab:sim_data']}.
Figure 5: The standard deviation over the six runs of each metric shown in \ref{['fig:summary']}
...and 3 more figures

Theorems & Definitions (3)

Proposition 1
Proposition 1
proof

Understanding Pathologies of Deep Heteroskedastic Regression

TL;DR

Abstract

Understanding Pathologies of Deep Heteroskedastic Regression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (3)