Table of Contents
Fetching ...

Regularized Meta-Learning for Improved Generalization

Noor Islam S. Mohammad, Md Muntaqim Meherab

TL;DR

This paper tackles the problem of unstable and inefficient stacking in high-dimensional ensembles by introducing redundancy-aware regularized meta-learning. It presents a four-stage pipeline—redundancy projection, meta-feature augmentation, cross-validated Ridge/Lasso/ElasticNet, and inverse-RMSE blending—operating on leakage-free OOF predictions to improve conditioning and reduce effective rank. Empirical results on a 100k-sample Playground Series S6E1 benchmark show improved RMSE over averaging and standard stacking, with a fourfold reduction in runtime while retaining a larger ensemble; ablations confirm the contributions of de-duplication, meta-features, and blending. The approach stabilizes meta-learning under multicollinearity and offers deployment-friendly scalability and interpretability, making it a strong default for high-dimensional ensemble systems. Future work may extend calibration, uncertainty quantification, and online adaptation, broadening applicability to AutoML and multi-task settings.

Abstract

Deep ensemble methods often improve predictive performance, yet they suffer from three practical limitations: redundancy among base models that inflates computational cost and degrades conditioning, unstable weighting under multicollinearity, and overfitting in meta-learning pipelines. We propose a regularized meta-learning framework that addresses these challenges through a four-stage pipeline combining redundancy-aware projection, statistical meta-feature augmentation, and cross-validated regularized meta-models (Ridge, Lasso, and ElasticNet). Our multi-metric de-duplication strategy removes near-collinear predictors using correlation and MSE thresholds ($τ_{\text{corr}}=0.95$), reducing the effective condition number of the meta-design matrix while preserving predictive diversity. Engineered ensemble statistics and interaction terms recover higher-order structure unavailable to raw prediction columns. A final inverse-RMSE blending stage mitigates regularizer-selection variance. On the Playground Series S6E1 benchmark (100K samples, 72 base models), the proposed framework achieves an out-of-fold RMSE of 8.582, improving over simple averaging (8.894) and conventional Ridge stacking (8.627), while matching greedy hill climbing (8.603) with substantially lower runtime (4 times faster). Conditioning analysis shows a 53.7\% reduction in effective matrix condition number after redundancy projection. Comprehensive ablations demonstrate consistent contributions from de-duplication, statistical meta-features, and meta-ensemble blending. These results position regularized meta-learning as a stable and deployment-efficient stacking strategy for high-dimensional ensemble systems.

Regularized Meta-Learning for Improved Generalization

TL;DR

This paper tackles the problem of unstable and inefficient stacking in high-dimensional ensembles by introducing redundancy-aware regularized meta-learning. It presents a four-stage pipeline—redundancy projection, meta-feature augmentation, cross-validated Ridge/Lasso/ElasticNet, and inverse-RMSE blending—operating on leakage-free OOF predictions to improve conditioning and reduce effective rank. Empirical results on a 100k-sample Playground Series S6E1 benchmark show improved RMSE over averaging and standard stacking, with a fourfold reduction in runtime while retaining a larger ensemble; ablations confirm the contributions of de-duplication, meta-features, and blending. The approach stabilizes meta-learning under multicollinearity and offers deployment-friendly scalability and interpretability, making it a strong default for high-dimensional ensemble systems. Future work may extend calibration, uncertainty quantification, and online adaptation, broadening applicability to AutoML and multi-task settings.

Abstract

Deep ensemble methods often improve predictive performance, yet they suffer from three practical limitations: redundancy among base models that inflates computational cost and degrades conditioning, unstable weighting under multicollinearity, and overfitting in meta-learning pipelines. We propose a regularized meta-learning framework that addresses these challenges through a four-stage pipeline combining redundancy-aware projection, statistical meta-feature augmentation, and cross-validated regularized meta-models (Ridge, Lasso, and ElasticNet). Our multi-metric de-duplication strategy removes near-collinear predictors using correlation and MSE thresholds (), reducing the effective condition number of the meta-design matrix while preserving predictive diversity. Engineered ensemble statistics and interaction terms recover higher-order structure unavailable to raw prediction columns. A final inverse-RMSE blending stage mitigates regularizer-selection variance. On the Playground Series S6E1 benchmark (100K samples, 72 base models), the proposed framework achieves an out-of-fold RMSE of 8.582, improving over simple averaging (8.894) and conventional Ridge stacking (8.627), while matching greedy hill climbing (8.603) with substantially lower runtime (4 times faster). Conditioning analysis shows a 53.7\% reduction in effective matrix condition number after redundancy projection. Comprehensive ablations demonstrate consistent contributions from de-duplication, statistical meta-features, and meta-ensemble blending. These results position regularized meta-learning as a stable and deployment-efficient stacking strategy for high-dimensional ensemble systems.
Paper Structure (47 sections, 4 theorems, 27 equations, 7 figures, 6 tables, 4 algorithms)

This paper contains 47 sections, 4 theorems, 27 equations, 7 figures, 6 tables, 4 algorithms.

Key Result

Theorem 1

Spectral Preconditioning via Redundancy Projection: Let $\mathbf{P}\in\mathbb{R}^{N\times K}$ contain predictor clusters with intra-cluster correlation $\rho \ge \tau_{\text{corr}}$. Assume each cluster contributes at most one retained representative under $\Pi_\tau$. Then for the projected matrix $ for some $\Delta_\tau > 0$ depending on cluster redundancy. Consequently, Moreover, the increase i

Figures (7)

  • Figure 1: Conceptual comparison between classical stacking and our redundancy-aware regularized meta-learning framework. We explicitly reduce effective rank before applying multi-penalty meta-modeling, improving conditioning and stability.
  • Figure 2: Pareto efficiency of ensemble strategies. Trade-offs between RMSE, runtime, and retained model count. The proposed method lies on the empirical frontier, achieving lower runtime and competitive accuracy relative to greedy hill climbing and vanilla stacking.
  • Figure 3: Prediction-space redundancy before and after projection. Highly correlated clusters ($\rho>0.95$) induce ill-conditioning in the meta-design matrix. Redundancy projection $\Pi_{\tau}$ removes near-collinear predictors, enlarges the spectral gap, and reduces the condition number, stabilizing meta-weight estimation.
  • Figure 4: Regularization paths for Ridge, Lasso, and ElasticNet meta-learners. Each line is the mean RMSE across 10 folds; shaded regions denote $\pm1$ standard deviation. Vertical dashed lines mark selected $\lambda$.
  • Figure 5: Residual diagnostics. (a) Q--Q plot; (b) residuals vs. fitted values; (c) residual histogram; (d) MAE by target-score range.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Theorem 1
  • Theorem 2: Stability of the Composite Operator
  • Theorem 3: Effective Rank Reduction and Generalization
  • Theorem 4