Table of Contents
Fetching ...

Benign Overfitting in Linear Classifiers with a Bias Term

Yuta Kondo

TL;DR

This analysis proves that benign overfitting persists in these more complex models, and provides a more complete picture of benign overfitting, revealing the non-trivial impact of the bias term on the conditions required for good generalization.

Abstract

Modern machine learning models with a large number of parameters often generalize well despite perfectly interpolating noisy training data - a phenomenon known as benign overfitting. A foundational explanation for this in linear classification was recently provided by Hashimoto et al. (2025). However, this analysis was limited to the setting of "homogeneous" models, which lack a bias (intercept) term - a standard component in practice. This work directly extends Hashimoto et al.'s results to the more realistic inhomogeneous case, which incorporates a bias term. Our analysis proves that benign overfitting persists in these more complex models. We find that the presence of the bias term introduces new constraints on the data's covariance structure required for generalization, an effect that is particularly pronounced when label noise is present. However, we show that in the isotropic case, these new constraints are dominated by the requirements inherited from the homogeneous model. This work provides a more complete picture of benign overfitting, revealing the non-trivial impact of the bias term on the conditions required for good generalization.

Benign Overfitting in Linear Classifiers with a Bias Term

TL;DR

This analysis proves that benign overfitting persists in these more complex models, and provides a more complete picture of benign overfitting, revealing the non-trivial impact of the bias term on the conditions required for good generalization.

Abstract

Modern machine learning models with a large number of parameters often generalize well despite perfectly interpolating noisy training data - a phenomenon known as benign overfitting. A foundational explanation for this in linear classification was recently provided by Hashimoto et al. (2025). However, this analysis was limited to the setting of "homogeneous" models, which lack a bias (intercept) term - a standard component in practice. This work directly extends Hashimoto et al.'s results to the more realistic inhomogeneous case, which incorporates a bias term. Our analysis proves that benign overfitting persists in these more complex models. We find that the presence of the bias term introduces new constraints on the data's covariance structure required for generalization, an effect that is particularly pronounced when label noise is present. However, we show that in the isotropic case, these new constraints are dominated by the requirements inherited from the homogeneous model. This work provides a more complete picture of benign overfitting, revealing the non-trivial impact of the bias term on the conditions required for good generalization.

Paper Structure

This paper contains 49 sections, 9 theorems, 53 equations, 1 table.

Key Result

Theorem 1

Consider the inhomogeneous model with $\tilde{\boldsymbol{z}}=(\boldsymbol{z},1)$, where $\boldsymbol{z}$ is generated according to Model (EM) in the noiseless case ($\eta=0$). Assume $n \ge \left(6C_2(g)\right)^\frac{k}{k-2} \delta^{-\frac{2}{k-2}}$. Then, there exists a constant $C$ (depending on and that $\mathop{\mathrm{tr}}\nolimits(\Sigma) \ge C \cdot \max\{T_{Hom}, T_{Inhom}\}$, where we

Theorems & Definitions (22)

  • Theorem 1: Inhomogeneous, Noiseless, Intermediate Signal
  • proof : Proof Sketch for Theorem \ref{['thm:1']}
  • Theorem 2: Inhomogeneous, Noiseless, Large Signal
  • proof : Proof Sketch for Theorem \ref{['thm:2']}
  • Theorem 3: Inhomogeneous, Noisy
  • proof : Proof Sketch for Theorem \ref{['thm:3']}
  • Corollary 4
  • proof : Proof of Corollary \ref{['cor:1']}
  • Lemma 5
  • proof
  • ...and 12 more