Table of Contents
Fetching ...

On the Degrees of Freedom of some Lasso procedures

Mauro Bernardi, Antonio Canale, Marco Stefanucci

TL;DR

This work derives unbiased estimators of the effective degrees of freedom for Adaptive Lasso, Group Lasso, and Adaptive Group Lasso within Stein's unbiased risk estimation, valid for both orthogonal and non-orthogonal designs. The resulting $\hat{df}_\gamma$ expressions reveal how adaptive weights and coefficient signs inflate or deflate model complexity and show a piecewise-linear dependence on the regularization parameter $\gamma$ for adaptive methods. Empirical validation on synthetic and Diabetes data demonstrates that using the correct $\hat{df}_\gamma$ improves model selection and risk estimation, while naive active-set-based df can mislead criteria like BIC or cross-validation. Overall, the paper provides a rigorous framework for complexity-aware inference in adaptive penalized regression, bridging theory and practice and enabling more reliable model comparison in high-dimensional settings.

Abstract

The effective degrees of freedom of penalized regression models quantify the actual amount of information used to generate predictions, playing a pivotal role in model evaluation and selection. Although a closed-form estimator is available for the Lasso penalty, adaptive extensions of widely used penalized approaches, including the Adaptive Lasso and Adaptive Group Lasso, have remained without analogous theoretical characterization. This paper presents the first unbiased estimator of the effective degrees of freedom for these methods, along with their main theoretical properties, for both orthogonal and non-orthogonal designs, derived within Stein's unbiased risk estimation framework. The resulting expressions feature inflation terms influenced by the regularization parameter, coefficient signs, and least-squares estimates. These advances enable more accurate model selection criteria and unbiased prediction error estimates, illustrated through synthetic and real data. These contributions offer a rigorous theoretical foundation for understanding model complexity in adaptive regression, bridging a critical gap between theory and practice.

On the Degrees of Freedom of some Lasso procedures

TL;DR

This work derives unbiased estimators of the effective degrees of freedom for Adaptive Lasso, Group Lasso, and Adaptive Group Lasso within Stein's unbiased risk estimation, valid for both orthogonal and non-orthogonal designs. The resulting expressions reveal how adaptive weights and coefficient signs inflate or deflate model complexity and show a piecewise-linear dependence on the regularization parameter for adaptive methods. Empirical validation on synthetic and Diabetes data demonstrates that using the correct improves model selection and risk estimation, while naive active-set-based df can mislead criteria like BIC or cross-validation. Overall, the paper provides a rigorous framework for complexity-aware inference in adaptive penalized regression, bridging theory and practice and enabling more reliable model comparison in high-dimensional settings.

Abstract

The effective degrees of freedom of penalized regression models quantify the actual amount of information used to generate predictions, playing a pivotal role in model evaluation and selection. Although a closed-form estimator is available for the Lasso penalty, adaptive extensions of widely used penalized approaches, including the Adaptive Lasso and Adaptive Group Lasso, have remained without analogous theoretical characterization. This paper presents the first unbiased estimator of the effective degrees of freedom for these methods, along with their main theoretical properties, for both orthogonal and non-orthogonal designs, derived within Stein's unbiased risk estimation framework. The resulting expressions feature inflation terms influenced by the regularization parameter, coefficient signs, and least-squares estimates. These advances enable more accurate model selection criteria and unbiased prediction error estimates, illustrated through synthetic and real data. These contributions offer a rigorous theoretical foundation for understanding model complexity in adaptive regression, bridging a critical gap between theory and practice.

Paper Structure

This paper contains 13 sections, 26 theorems, 149 equations, 3 figures, 1 table.

Key Result

Theorem 1

Let $\widehat{\boldsymbol\beta}$ the solution to the Adaptive Lasso problem in Equation eq:convex_regularized_problem_adalasso with weights $w_j = w_j(|\widehat{\beta}_j^\mathsf{LS}|)$ and $\gamma \in (\gamma_l, \gamma_{l+1})$. Denote with $\mathcal{A}$ the corresponding active set, and with $\mathb for the orthonormal design and for non-orthonormal designs.

Figures (3)

  • Figure 1: Estimated degrees of freedom using the appropriate theorem (x axis) versus degrees of freedom computed with the general covariance formula \ref{['eq:df_general_expression']} for Adaptive Lasso, Group Lasso and Adaptive Group Lasso.
  • Figure 2: Results of the Lasso (left) and Adaptive Lasso (right) estimation for the Diabetes (small) data. Estimated degrees of freedom (upper panels, continuous lines) along with the size of the active set (dashed lines). Complete solution path (lower panels) with vertical lines denoting the best $\gamma$ according to correct BIC (continuous line), cross validation (dotted line), BIC with active set size as estimator of the degrees of freedom (dashed).
  • Figure 3: Results of the Group Lasso (left) and Adaptive Group Lasso (right) estimation for the Diabetes data with the covariates discretized and encoded as dummy variables. Estimated degrees of freedom (upper panels, continuous lines) along with the size of the active set (dashed lines). For the Group Lasso also the size of the active groups is reported (dashed gray lines). Complete solution path (lower panels) with vertical lines denoting the best $\gamma$ according to correct BIC (continuous line), cross validation (dotted line), BIC with active set size as estimator of the degrees of freedom (dashed).

Theorems & Definitions (51)

  • Theorem 1
  • Corollary 1
  • Corollary 2
  • Corollary 3
  • Theorem 2
  • Theorem 3
  • Corollary 4
  • Theorem 4
  • Theorem 5
  • Corollary 5
  • ...and 41 more