Table of Contents
Fetching ...

Pathological Regularization Regimes in Classification Tasks

Maximilian Wiesmann, Paul Larsen

TL;DR

The possibility of a trend reversal in binary classification tasks between the dataset and a classification score obtained from a trained model is demonstrated and connections to datasets exhibiting Simpson's paradox are drawn, providing a natural source of pathological datasets.

Abstract

In this paper we demonstrate the possibility of a trend reversal in binary classification tasks between the dataset and a classification score obtained from a trained model. This trend reversal occurs for certain choices of the regularization parameter for model training, namely, if the parameter is contained in what we call the pathological regularization regime. For ridge regression, we give necessary and sufficient algebraic conditions on the dataset for the existence of a pathological regularization regime. Moreover, our results provide a data science practitioner with a hands-on tool to avoid hyperparameter choices suffering from trend reversal. We furthermore present numerical results on pathological regularization regimes for logistic regression. Finally, we draw connections to datasets exhibiting Simpson's paradox, providing a natural source of pathological datasets.

Pathological Regularization Regimes in Classification Tasks

TL;DR

The possibility of a trend reversal in binary classification tasks between the dataset and a classification score obtained from a trained model is demonstrated and connections to datasets exhibiting Simpson's paradox are drawn, providing a natural source of pathological datasets.

Abstract

In this paper we demonstrate the possibility of a trend reversal in binary classification tasks between the dataset and a classification score obtained from a trained model. This trend reversal occurs for certain choices of the regularization parameter for model training, namely, if the parameter is contained in what we call the pathological regularization regime. For ridge regression, we give necessary and sufficient algebraic conditions on the dataset for the existence of a pathological regularization regime. Moreover, our results provide a data science practitioner with a hands-on tool to avoid hyperparameter choices suffering from trend reversal. We furthermore present numerical results on pathological regularization regimes for logistic regression. Finally, we draw connections to datasets exhibiting Simpson's paradox, providing a natural source of pathological datasets.
Paper Structure (21 sections, 7 theorems, 33 equations, 7 figures, 1 table, 2 algorithms)

This paper contains 21 sections, 7 theorems, 33 equations, 7 figures, 1 table, 2 algorithms.

Key Result

Proposition 2

The ridge regression estimator is given by where $\mathds{1}_p$ denotes the $p\times p$ identity matrix.

Figures (7)

  • Figure 1: The trend indicator from the credit score example with respect to the regularization parameter $c$. The interval marked in red is the pathological regularization regime.
  • Figure 2: Two typical examples of regularization paths. In general, regularization paths need not be monotone; this leads to pathological regularization regimes.
  • Figure 3: The shape of a regularization path satisfying inequality \ref{['equ:InequPosRoot']} according to Lemmas \ref{['lem:limitPaths']} and \ref{['lem:zeroRegPath']}. The regime marked in red contributes to pathological behavior.
  • Figure 4: The average value of $\gamma$ among $10^4$ datasets uniformly drawn from $\mathcal{D}_N$ having a pathological regularization regime.
  • Figure 5: Proportion of datasets having pathological regularization regimes for different sample sizes if uniformly sampled from arbitrary contingency tables (orange) or from those exhibiting Simpson's paradox (blue).
  • ...and 2 more figures

Theorems & Definitions (21)

  • Definition 1: informal
  • Proposition 2
  • Definition 3
  • Lemma 4
  • Definition 5
  • Remark 6
  • Remark 7
  • Definition 8
  • Lemma 9
  • proof
  • ...and 11 more