Pathological Regularization Regimes in Classification Tasks

Maximilian Wiesmann; Paul Larsen

Pathological Regularization Regimes in Classification Tasks

Maximilian Wiesmann, Paul Larsen

TL;DR

The possibility of a trend reversal in binary classification tasks between the dataset and a classification score obtained from a trained model is demonstrated and connections to datasets exhibiting Simpson's paradox are drawn, providing a natural source of pathological datasets.

Abstract

In this paper we demonstrate the possibility of a trend reversal in binary classification tasks between the dataset and a classification score obtained from a trained model. This trend reversal occurs for certain choices of the regularization parameter for model training, namely, if the parameter is contained in what we call the pathological regularization regime. For ridge regression, we give necessary and sufficient algebraic conditions on the dataset for the existence of a pathological regularization regime. Moreover, our results provide a data science practitioner with a hands-on tool to avoid hyperparameter choices suffering from trend reversal. We furthermore present numerical results on pathological regularization regimes for logistic regression. Finally, we draw connections to datasets exhibiting Simpson's paradox, providing a natural source of pathological datasets.

Pathological Regularization Regimes in Classification Tasks

TL;DR

Abstract

Paper Structure (21 sections, 7 theorems, 33 equations, 7 figures, 1 table, 2 algorithms)

This paper contains 21 sections, 7 theorems, 33 equations, 7 figures, 1 table, 2 algorithms.

Introduction
Illustrating Example
Prior Work
Preliminaries
Data
Ridge Regression
Logistic Regression
Trend Indicator
Pathological Regularization Regimes
Main Results
Existence and Description of Pathological Regularization Regimes
Ridge Regression with Intercept
Sample Size and Simpson's Paradox
Beyond $2\times 2\times 2$ Contingency Tables
Logistic Regression
...and 6 more sections

Key Result

Proposition 2

The ridge regression estimator is given by where $\mathds{1}_p$ denotes the $p\times p$ identity matrix.

Figures (7)

Figure 1: The trend indicator from the credit score example with respect to the regularization parameter $c$. The interval marked in red is the pathological regularization regime.
Figure 2: Two typical examples of regularization paths. In general, regularization paths need not be monotone; this leads to pathological regularization regimes.
Figure 3: The shape of a regularization path satisfying inequality \ref{['equ:InequPosRoot']} according to Lemmas \ref{['lem:limitPaths']} and \ref{['lem:zeroRegPath']}. The regime marked in red contributes to pathological behavior.
Figure 4: The average value of $\gamma$ among $10^4$ datasets uniformly drawn from $\mathcal{D}_N$ having a pathological regularization regime.
Figure 5: Proportion of datasets having pathological regularization regimes for different sample sizes if uniformly sampled from arbitrary contingency tables (orange) or from those exhibiting Simpson's paradox (blue).
...and 2 more figures

Theorems & Definitions (21)

Definition 1: informal
Proposition 2
Definition 3
Lemma 4
Definition 5
Remark 6
Remark 7
Definition 8
Lemma 9
proof
...and 11 more

Pathological Regularization Regimes in Classification Tasks

TL;DR

Abstract

Pathological Regularization Regimes in Classification Tasks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (21)