Table of Contents
Fetching ...

Convergence and Generalization of Anti-Regularization for Parametric Models

Dongseok Kim, Wonjun Jeong, Gisung Oh

TL;DR

Anti-regularization (AR) introduces a negative reward term into the empirical risk to boost model expressivity in small-sample regimes, with a principled power-law decay $|\\lambda(n)|$ that vanishes as $n$ grows. The framework establishes spectral safety and trust-region safeguards (with gradient clipping and projection) to guarantee stable training, and extends to linear smoothers, NTK, and shallow MLPs. Theoretical results cover existence, convergence, degrees-of-freedom control, and generalization bounds for regression and classification, including practical calibration considerations. Empirically, AR improves underfitting in regression and calibration in classification across small data regimes, while safely fading to baseline performance as data accumulates; ablations confirm the decay schedule and safeguards are essential for stability and effectiveness.

Abstract

Anti-regularization introduces a reward term with a reversed sign into the loss function, deliberately amplifying model expressivity in small-sample regimes while ensuring that the intervention gradually vanishes as the sample size grows through a power-law decay schedule. We formalize spectral safety conditions and trust-region constraints, and we design a lightweight safeguard that combines a projection operator with gradient clipping to guarantee stable intervention. Theoretical analysis extends to linear smoothers and the Neural Tangent Kernel regime, providing practical guidance on the choice of decay exponents through the balance between empirical risk and variance. Empirical results show that Anti-regularization mitigates underfitting in both regression and classification while preserving generalization and improving calibration. Ablation studies confirm that the decay schedule and safeguards are essential to avoiding overfitting and instability. As an alternative, we also propose a degrees-of-freedom targeting schedule that maintains constant per-sample complexity. Anti-regularization constitutes a simple and reproducible procedure that integrates seamlessly into standard empirical risk minimization pipelines, enabling robust learning under limited data and resource constraints by intervening only when necessary and vanishing otherwise.

Convergence and Generalization of Anti-Regularization for Parametric Models

TL;DR

Anti-regularization (AR) introduces a negative reward term into the empirical risk to boost model expressivity in small-sample regimes, with a principled power-law decay that vanishes as grows. The framework establishes spectral safety and trust-region safeguards (with gradient clipping and projection) to guarantee stable training, and extends to linear smoothers, NTK, and shallow MLPs. Theoretical results cover existence, convergence, degrees-of-freedom control, and generalization bounds for regression and classification, including practical calibration considerations. Empirically, AR improves underfitting in regression and calibration in classification across small data regimes, while safely fading to baseline performance as data accumulates; ablations confirm the decay schedule and safeguards are essential for stability and effectiveness.

Abstract

Anti-regularization introduces a reward term with a reversed sign into the loss function, deliberately amplifying model expressivity in small-sample regimes while ensuring that the intervention gradually vanishes as the sample size grows through a power-law decay schedule. We formalize spectral safety conditions and trust-region constraints, and we design a lightweight safeguard that combines a projection operator with gradient clipping to guarantee stable intervention. Theoretical analysis extends to linear smoothers and the Neural Tangent Kernel regime, providing practical guidance on the choice of decay exponents through the balance between empirical risk and variance. Empirical results show that Anti-regularization mitigates underfitting in both regression and classification while preserving generalization and improving calibration. Ablation studies confirm that the decay schedule and safeguards are essential to avoiding overfitting and instability. As an alternative, we also propose a degrees-of-freedom targeting schedule that maintains constant per-sample complexity. Anti-regularization constitutes a simple and reproducible procedure that integrates seamlessly into standard empirical risk minimization pipelines, enabling robust learning under limited data and resource constraints by intervening only when necessary and vanishing otherwise.

Paper Structure

This paper contains 137 sections, 40 theorems, 100 equations, 17 tables.

Key Result

Theorem 1

Let $\hat{\Sigma}=\tfrac{1}{|S|}X^\top X$ in linear regression. If $\lambda<\sigma_{\min}(\hat{\Sigma})$, then $\hat{F}_\lambda$ is $\sigma_{\min}(\hat{\Sigma})-\lambda$ strongly convex and admits a unique global minimizer.

Theorems & Definitions (86)

  • Theorem 1: Safe region and strong convexity in regression
  • proof : Proof sketch
  • Remark 1: Applicability
  • Corollary 2: Closed-form solution and stability of gradient descent
  • Lemma 3: Classification: boundedness from below and potential non-attainment
  • proof : Proof sketch
  • Lemma 4: Classification: conditions for existence of minimizers
  • proof : Proof sketch
  • Theorem 5: Condition for correcting underfitting
  • proof : Proof sketch
  • ...and 76 more