Regularized Meta-Learning for Improved Generalization
Noor Islam S. Mohammad, Md Muntaqim Meherab
TL;DR
This paper tackles the problem of unstable and inefficient stacking in high-dimensional ensembles by introducing redundancy-aware regularized meta-learning. It presents a four-stage pipeline—redundancy projection, meta-feature augmentation, cross-validated Ridge/Lasso/ElasticNet, and inverse-RMSE blending—operating on leakage-free OOF predictions to improve conditioning and reduce effective rank. Empirical results on a 100k-sample Playground Series S6E1 benchmark show improved RMSE over averaging and standard stacking, with a fourfold reduction in runtime while retaining a larger ensemble; ablations confirm the contributions of de-duplication, meta-features, and blending. The approach stabilizes meta-learning under multicollinearity and offers deployment-friendly scalability and interpretability, making it a strong default for high-dimensional ensemble systems. Future work may extend calibration, uncertainty quantification, and online adaptation, broadening applicability to AutoML and multi-task settings.
Abstract
Deep ensemble methods often improve predictive performance, yet they suffer from three practical limitations: redundancy among base models that inflates computational cost and degrades conditioning, unstable weighting under multicollinearity, and overfitting in meta-learning pipelines. We propose a regularized meta-learning framework that addresses these challenges through a four-stage pipeline combining redundancy-aware projection, statistical meta-feature augmentation, and cross-validated regularized meta-models (Ridge, Lasso, and ElasticNet). Our multi-metric de-duplication strategy removes near-collinear predictors using correlation and MSE thresholds ($τ_{\text{corr}}=0.95$), reducing the effective condition number of the meta-design matrix while preserving predictive diversity. Engineered ensemble statistics and interaction terms recover higher-order structure unavailable to raw prediction columns. A final inverse-RMSE blending stage mitigates regularizer-selection variance. On the Playground Series S6E1 benchmark (100K samples, 72 base models), the proposed framework achieves an out-of-fold RMSE of 8.582, improving over simple averaging (8.894) and conventional Ridge stacking (8.627), while matching greedy hill climbing (8.603) with substantially lower runtime (4 times faster). Conditioning analysis shows a 53.7\% reduction in effective matrix condition number after redundancy projection. Comprehensive ablations demonstrate consistent contributions from de-duplication, statistical meta-features, and meta-ensemble blending. These results position regularized meta-learning as a stable and deployment-efficient stacking strategy for high-dimensional ensemble systems.
