Table of Contents
Fetching ...

On the Equivalence of Regression and Classification

Jayadeva, Naman Dwivedi, Hari Krishnan, N. M. Anoop Krishnan

TL;DR

This work establishes a formal regression-classification equivalence by showing that an $M$-sample regression task with all points on a hyperplane corresponds to a linearly separable classification task with $2M$ samples, where the regression can be recovered from the classifier. It reframes regression through an equivalent SVC problem and introduces a regressability measure that estimates regression difficulty without fitting a model. The authors further propose learning a linearizing map $\phi(x)$ via a neural network using the $J_4$ loss, so that $z = w^T \phi(x)$, enabling a two-step regression process that avoids extensive hyperparameter tuning. Experimental results on large, challenging datasets show that enforcing linearity in the learned representation improves predictive performance (higher $R^2$) relative to a strong neural baseline, highlighting both theoretical unification and practical benefits for regression tasks.

Abstract

A formal link between regression and classification has been tenuous. Even though the margin maximization term $\|w\|$ is used in support vector regression, it has at best been justified as a regularizer. We show that a regression problem with $M$ samples lying on a hyperplane has a one-to-one equivalence with a linearly separable classification task with $2M$ samples. We show that margin maximization on the equivalent classification task leads to a different regression formulation than traditionally used. Using the equivalence, we demonstrate a ``regressability'' measure, that can be used to estimate the difficulty of regressing a dataset, without needing to first learn a model for it. We use the equivalence to train neural networks to learn a linearizing map, that transforms input variables into a space where a linear regressor is adequate.

On the Equivalence of Regression and Classification

TL;DR

This work establishes a formal regression-classification equivalence by showing that an -sample regression task with all points on a hyperplane corresponds to a linearly separable classification task with samples, where the regression can be recovered from the classifier. It reframes regression through an equivalent SVC problem and introduces a regressability measure that estimates regression difficulty without fitting a model. The authors further propose learning a linearizing map via a neural network using the loss, so that , enabling a two-step regression process that avoids extensive hyperparameter tuning. Experimental results on large, challenging datasets show that enforcing linearity in the learned representation improves predictive performance (higher ) relative to a strong neural baseline, highlighting both theoretical unification and practical benefits for regression tasks.

Abstract

A formal link between regression and classification has been tenuous. Even though the margin maximization term is used in support vector regression, it has at best been justified as a regularizer. We show that a regression problem with samples lying on a hyperplane has a one-to-one equivalence with a linearly separable classification task with samples. We show that margin maximization on the equivalent classification task leads to a different regression formulation than traditionally used. Using the equivalence, we demonstrate a ``regressability'' measure, that can be used to estimate the difficulty of regressing a dataset, without needing to first learn a model for it. We use the equivalence to train neural networks to learn a linearizing map, that transforms input variables into a space where a linear regressor is adequate.

Paper Structure

This paper contains 7 sections, 29 equations, 13 figures, 3 tables, 1 algorithm.

Figures (13)

  • Figure 1: Left: $M$ regression samples on a line. Right: samples for the classification task obtained by duplicating and shifting samples by $\pm \epsilon$
  • Figure 2: Left: Regression samples taken from the line $z = mx$. Right: Samples for the equivalent classification task lie at $\pm \frac{1}{m}$. Note that the margin for the classification task is $\frac{2}{m}$, that tends to zero as the line becomes more vertical.
  • Figure 3: The separating hyperplane is given by $w^Tx = 0$. Hyperplanes $w^Tx = 1$ and $w^Tx = -1$ are proximal to samples of the two classes. A sample $x^i$ not lying on either of these hyperplanes will be at a non-zero distance from its proximal plane, given by $w^Tx^i = z_i - (q_i^+ - q_i^-)$
  • Figure 4: (a) $z=2x$, (b) $z=x^2$, (c) $z=x^3$, (d) $z=((x+3)^3)((x-2)(x+1)(x+2))^2$; (e)--(h) equivalent binary classification problems corresponding to (a)--(d), with samples of the two classes distinguished by colour.
  • Figure 5: (a) Samples of a classification problem and (b) the same samples with the class label indicated by the surface height.
  • ...and 8 more figures