Regularized Meta-Learning for Improved Generalization

Noor Islam S. Mohammad; Md Muntaqim Meherab

Regularized Meta-Learning for Improved Generalization

Noor Islam S. Mohammad, Md Muntaqim Meherab

TL;DR

This paper tackles the problem of unstable and inefficient stacking in high-dimensional ensembles by introducing redundancy-aware regularized meta-learning. It presents a four-stage pipeline—redundancy projection, meta-feature augmentation, cross-validated Ridge/Lasso/ElasticNet, and inverse-RMSE blending—operating on leakage-free OOF predictions to improve conditioning and reduce effective rank. Empirical results on a 100k-sample Playground Series S6E1 benchmark show improved RMSE over averaging and standard stacking, with a fourfold reduction in runtime while retaining a larger ensemble; ablations confirm the contributions of de-duplication, meta-features, and blending. The approach stabilizes meta-learning under multicollinearity and offers deployment-friendly scalability and interpretability, making it a strong default for high-dimensional ensemble systems. Future work may extend calibration, uncertainty quantification, and online adaptation, broadening applicability to AutoML and multi-task settings.

Abstract

Deep ensemble methods often improve predictive performance, yet they suffer from three practical limitations: redundancy among base models that inflates computational cost and degrades conditioning, unstable weighting under multicollinearity, and overfitting in meta-learning pipelines. We propose a regularized meta-learning framework that addresses these challenges through a four-stage pipeline combining redundancy-aware projection, statistical meta-feature augmentation, and cross-validated regularized meta-models (Ridge, Lasso, and ElasticNet). Our multi-metric de-duplication strategy removes near-collinear predictors using correlation and MSE thresholds ($τ_{\text{corr}}=0.95$), reducing the effective condition number of the meta-design matrix while preserving predictive diversity. Engineered ensemble statistics and interaction terms recover higher-order structure unavailable to raw prediction columns. A final inverse-RMSE blending stage mitigates regularizer-selection variance. On the Playground Series S6E1 benchmark (100K samples, 72 base models), the proposed framework achieves an out-of-fold RMSE of 8.582, improving over simple averaging (8.894) and conventional Ridge stacking (8.627), while matching greedy hill climbing (8.603) with substantially lower runtime (4 times faster). Conditioning analysis shows a 53.7\% reduction in effective matrix condition number after redundancy projection. Comprehensive ablations demonstrate consistent contributions from de-duplication, statistical meta-features, and meta-ensemble blending. These results position regularized meta-learning as a stable and deployment-efficient stacking strategy for high-dimensional ensemble systems.

Regularized Meta-Learning for Improved Generalization

TL;DR

Abstract

), reducing the effective condition number of the meta-design matrix while preserving predictive diversity. Engineered ensemble statistics and interaction terms recover higher-order structure unavailable to raw prediction columns. A final inverse-RMSE blending stage mitigates regularizer-selection variance. On the Playground Series S6E1 benchmark (100K samples, 72 base models), the proposed framework achieves an out-of-fold RMSE of 8.582, improving over simple averaging (8.894) and conventional Ridge stacking (8.627), while matching greedy hill climbing (8.603) with substantially lower runtime (4 times faster). Conditioning analysis shows a 53.7\% reduction in effective matrix condition number after redundancy projection. Comprehensive ablations demonstrate consistent contributions from de-duplication, statistical meta-features, and meta-ensemble blending. These results position regularized meta-learning as a stable and deployment-efficient stacking strategy for high-dimensional ensemble systems.

Paper Structure (47 sections, 4 theorems, 27 equations, 7 figures, 6 tables, 4 algorithms)

This paper contains 47 sections, 4 theorems, 27 equations, 7 figures, 6 tables, 4 algorithms.

Introduction
Related Work
Classical Ensemble Methods
Stacking and Meta-Learning
Deep Ensembles and Uncertainty
Regularization and Model Selection
AutoML and Large Model Pools
Methodology
Problem Formulation
Phase 1: Redundancy Projection
Phase 2: Meta-Feature Augmentation
Phase 3: Regularized Meta-Learning
Phase 4: Risk-Aware Blending
Computational Complexity
Experimental Setup
...and 32 more sections

Key Result

Theorem 1

Spectral Preconditioning via Redundancy Projection: Let $\mathbf{P}\in\mathbb{R}^{N\times K}$ contain predictor clusters with intra-cluster correlation $\rho \ge \tau_{\text{corr}}$. Assume each cluster contributes at most one retained representative under $\Pi_\tau$. Then for the projected matrix $ for some $\Delta_\tau > 0$ depending on cluster redundancy. Consequently, Moreover, the increase i

Figures (7)

Figure 1: Conceptual comparison between classical stacking and our redundancy-aware regularized meta-learning framework. We explicitly reduce effective rank before applying multi-penalty meta-modeling, improving conditioning and stability.
Figure 2: Pareto efficiency of ensemble strategies. Trade-offs between RMSE, runtime, and retained model count. The proposed method lies on the empirical frontier, achieving lower runtime and competitive accuracy relative to greedy hill climbing and vanilla stacking.
Figure 3: Prediction-space redundancy before and after projection. Highly correlated clusters ($\rho>0.95$) induce ill-conditioning in the meta-design matrix. Redundancy projection $\Pi_{\tau}$ removes near-collinear predictors, enlarges the spectral gap, and reduces the condition number, stabilizing meta-weight estimation.
Figure 4: Regularization paths for Ridge, Lasso, and ElasticNet meta-learners. Each line is the mean RMSE across 10 folds; shaded regions denote $\pm1$ standard deviation. Vertical dashed lines mark selected $\lambda$.
Figure 5: Residual diagnostics. (a) Q--Q plot; (b) residuals vs. fitted values; (c) residual histogram; (d) MAE by target-score range.
...and 2 more figures

Theorems & Definitions (4)

Theorem 1
Theorem 2: Stability of the Composite Operator
Theorem 3: Effective Rank Reduction and Generalization
Theorem 4

Regularized Meta-Learning for Improved Generalization

TL;DR

Abstract

Regularized Meta-Learning for Improved Generalization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (4)