Table of Contents
Fetching ...

Geometric Properties of Neural Multivariate Regression

George Andriopoulos, Zixuan Dong, Bimarsha Adhikari, Keith Ross

TL;DR

The paper addresses why neural regression suffers from geometry-driven generalization limits, contrasting neural collapse in classification with Neural Regression Collapse (NRC). It proposes intrinsic dimension, specifically $ID_H$ for last-layer features and $ID_Y$ for targets, as a refined geometric lens and employs the 2-NN intrinsic-dimension estimator alongside the NRC1 metric. The key findings show that collapsed models have $ID_H < ID_Y$, leading to over-compression and poor generalization, while non-collapsed models typically satisfy $ID_H > ID_Y$ and exhibit regime-dependent generalization behavior. The authors derive a geometric argument using Sard's theorem and identify two practical regimes—over-compressed and under-compressed—under which adjusting feature dimensionality improves performance. These insights yield actionable guidelines for improving generalization in applied neural regression tasks and establish intrinsic-dimension as a principled diagnostic for regression representations.

Abstract

Neural multivariate regression underpins a wide range of domains such as control, robotics, and finance, yet the geometry of its learned representations remains poorly characterized. While neural collapse has been shown to benefit generalization in classification, we find that analogous collapse in regression consistently degrades performance. To explain this contrast, we analyze models through the lens of intrinsic dimension. Across control tasks and synthetic datasets, we estimate the intrinsic dimension of last-layer features (ID_H) and compare it with that of the regression targets (ID_Y). Collapsed models exhibit ID_H < ID_Y, leading to over-compression and poor generalization, whereas non-collapsed models typically maintain ID_H > ID_Y. For the non-collapsed models, performance with respect to ID_H depends on the data quantity and noise levels. From these observations, we identify two regimes (over-compressed and under-compressed) that determine when expanding or reducing feature dimensionality improves performance. Our results provide new geometric insights into neural regression and suggest practical strategies for enhancing generalization.

Geometric Properties of Neural Multivariate Regression

TL;DR

The paper addresses why neural regression suffers from geometry-driven generalization limits, contrasting neural collapse in classification with Neural Regression Collapse (NRC). It proposes intrinsic dimension, specifically for last-layer features and for targets, as a refined geometric lens and employs the 2-NN intrinsic-dimension estimator alongside the NRC1 metric. The key findings show that collapsed models have , leading to over-compression and poor generalization, while non-collapsed models typically satisfy and exhibit regime-dependent generalization behavior. The authors derive a geometric argument using Sard's theorem and identify two practical regimes—over-compressed and under-compressed—under which adjusting feature dimensionality improves performance. These insights yield actionable guidelines for improving generalization in applied neural regression tasks and establish intrinsic-dimension as a principled diagnostic for regression representations.

Abstract

Neural multivariate regression underpins a wide range of domains such as control, robotics, and finance, yet the geometry of its learned representations remains poorly characterized. While neural collapse has been shown to benefit generalization in classification, we find that analogous collapse in regression consistently degrades performance. To explain this contrast, we analyze models through the lens of intrinsic dimension. Across control tasks and synthetic datasets, we estimate the intrinsic dimension of last-layer features (ID_H) and compare it with that of the regression targets (ID_Y). Collapsed models exhibit ID_H < ID_Y, leading to over-compression and poor generalization, whereas non-collapsed models typically maintain ID_H > ID_Y. For the non-collapsed models, performance with respect to ID_H depends on the data quantity and noise levels. From these observations, we identify two regimes (over-compressed and under-compressed) that determine when expanding or reducing feature dimensionality improves performance. Our results provide new geometric insights into neural regression and suggest practical strategies for enhancing generalization.

Paper Structure

This paper contains 20 sections, 1 theorem, 6 equations, 12 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

Let $\mathcal{M}$ be a smooth $m$-dimensional manifold and $\mathcal{N}$ be a smooth $n$-dimensional manifold, with $m < n$. A smooth map $g: \mathcal{M} \to \mathcal{N}$ cannot be surjective, i.e., $g(\mathcal{M}) \neq \mathcal{N}$.

Figures (12)

  • Figure 1: Neural Regression Collapse typically correlates with high Test MSE. The smaller the NRC value, the closer the features lie to the $n$-dimensional subspace.
  • Figure 2: When the target dimension is $n=2$, the collapsed features (blue points) lie close to a subspace (yellow plane) spanned by the first 2 principal components (red arrows) of the last-layer features. Moreover, the collapsed features lie in a non-linear manifold of smaller dimension than $n$.
  • Figure 3: NRC1 decreases with stronger weight decay, leading to model collapse.
  • Figure 4: Relationship between NRC1 and intrinsic dimension of the last-layer features. Dots correspond to models trained with different architectures and weight decay parameters, with the colors denoting the degree of weight decay. The horizontal red dashed line is drawn at $ID_Y$.
  • Figure 5: Intrinsic dimension of input, output, and hidden layers over training epochs for a collapsed (left) and a non-collapsed model (right) for the Reacher dataset. Each subfigure shows the evolution of intrinsic dimension across layers with blue, orange dashed and pink lines denoting the intrinsic dimension of inputs, targets, and predicted outputs, respectively.
  • ...and 7 more figures

Theorems & Definitions (2)

  • Theorem 1
  • proof