Table of Contents
Fetching ...

Adaptive Lasso, Transfer Lasso, and Beyond: An Asymptotic Perspective

Masaaki Takada, Hironori Fujisawa

TL;DR

The paper analyzes the asymptotic properties of Adaptive Lasso and Transfer Lasso in high-dimensional sparse regression, highlighting when each method achieves desirable rates and variable selection behavior. It introduces Adaptive Transfer Lasso, a unified framework that blends adaptive weighting and transfer penalties, and derives its theoretical properties across multiple regimes, including an oracle-like region where both $\sqrt{m}$-rate and correct selection hold. Through simulations and empirical phase diagrams, the authors validate their theory and demonstrate that Adaptive Transfer Lasso can leverage large source data to improve estimation and selection, while remaining effective in moderate data scenarios. The work offers a practical pathway to reconcile the strengths of both methods and provides directions for extending these results to high-dimensional, multi-source settings.

Abstract

This paper presents a comprehensive exploration of the theoretical properties inherent in the Adaptive Lasso and the Transfer Lasso. The Adaptive Lasso, a well-established method, employs regularization divided by initial estimators and is characterized by asymptotic normality and variable selection consistency. In contrast, the recently proposed Transfer Lasso employs regularization subtracted by initial estimators with the demonstrated capacity to curtail non-asymptotic estimation errors. A pivotal question thus emerges: Given the distinct ways the Adaptive Lasso and the Transfer Lasso employ initial estimators, what benefits or drawbacks does this disparity confer upon each method? This paper conducts a theoretical examination of the asymptotic properties of the Transfer Lasso, thereby elucidating its differentiation from the Adaptive Lasso. Informed by the findings of this analysis, we introduce a novel method, one that amalgamates the strengths and compensates for the weaknesses of both methods. The paper concludes with validations of our theory and comparisons of the methods via simulation experiments.

Adaptive Lasso, Transfer Lasso, and Beyond: An Asymptotic Perspective

TL;DR

The paper analyzes the asymptotic properties of Adaptive Lasso and Transfer Lasso in high-dimensional sparse regression, highlighting when each method achieves desirable rates and variable selection behavior. It introduces Adaptive Transfer Lasso, a unified framework that blends adaptive weighting and transfer penalties, and derives its theoretical properties across multiple regimes, including an oracle-like region where both -rate and correct selection hold. Through simulations and empirical phase diagrams, the authors validate their theory and demonstrate that Adaptive Transfer Lasso can leverage large source data to improve estimation and selection, while remaining effective in moderate data scenarios. The work offers a practical pathway to reconcile the strengths of both methods and provides directions for extending these results to high-dimensional, multi-source settings.

Abstract

This paper presents a comprehensive exploration of the theoretical properties inherent in the Adaptive Lasso and the Transfer Lasso. The Adaptive Lasso, a well-established method, employs regularization divided by initial estimators and is characterized by asymptotic normality and variable selection consistency. In contrast, the recently proposed Transfer Lasso employs regularization subtracted by initial estimators with the demonstrated capacity to curtail non-asymptotic estimation errors. A pivotal question thus emerges: Given the distinct ways the Adaptive Lasso and the Transfer Lasso employ initial estimators, what benefits or drawbacks does this disparity confer upon each method? This paper conducts a theoretical examination of the asymptotic properties of the Transfer Lasso, thereby elucidating its differentiation from the Adaptive Lasso. Informed by the findings of this analysis, we introduce a novel method, one that amalgamates the strengths and compensates for the weaknesses of both methods. The paper concludes with validations of our theory and comparisons of the methods via simulation experiments.
Paper Structure (50 sections, 27 theorems, 231 equations, 19 figures)

This paper contains 50 sections, 27 theorems, 231 equations, 19 figures.

Key Result

Lemma 2.2

If $\lambda_n/n \rightarrow \lambda_0 \geq 0$, then

Figures (19)

  • Figure 1: Phase diagrams with the order of $\lambda_n$ for the Lasso (left) and the Adaptive Lasso (right). The Lasso does not achieve $\sqrt{n}$-consistent and consistent variable selection simultaneously, while the Adaptive Lasso satisfies both.
  • Figure 2: Phase diagrams with $\lambda_n$ for the Adaptive Lasso in Lemma \ref{['lemma:adapt-lasso-oracle-large']}--Theorem \ref{['theorem:alasso-convergence-rate']} (left) and $\lambda_n$ and $\eta_n$ for the Transfer Lasso in Theorem \ref{['theorem:trlasso-consistency']}--Theorem \ref{['theorem:trlasso-invariant-selection-consistency']} (right). The Adaptive Lasso has $\sqrt{n}$-consistency in (i) and (ii) and active variable selection consistency in (ii), but the convergence rate in (iii) is slower than $\sqrt{n}$. The Transfer Lasso has convergence rates of $\sqrt{m}$, $\sqrt{n}$, and $n/\lambda_n (< \sqrt{n})$ for (i), (ii), and (iii) respectively. It has invariant variable selection consistency in (i) but does not have active variable selection consistency in (i) and (ii).
  • Figure 3: Phase diagrams of convergence rate (top) and active/invariant variable selection (bottom left/right) with $\lambda_n$ and $\eta_n$ for the Adaptive Transfer Lasso in Theorems \ref{['theorem:adaptrlasso-consistency']}, \ref{['theorem:adaptrlasso-active-selection-consistency']}, \ref{['theorem:adaptrlasso-varying-selection-consistency']}, and Corollary \ref{['theorem:adaptrlasso-convergence-rate']}. They are $\sqrt{m}$-consistent in (i - ii), $\sqrt{n}$-consistent in (iii - v), and sub-$\sqrt{n}$-consistent in (vi). They yield consistent active variable selection in (ii), (v), and (vi) (left), while consistent invariant variable selection in (i), (ii), and (iv) (right). Estimators in (ii) satisfy $\sqrt{m}$-consistency and active/invariant variable selection consistency.
  • Figure 4: $\ell_2$ estimation errors for the Lasso (top left), the Adaptive Lasso (top right), the Transfer Lasso (bottom left), and the Adaptive Transfer Lasso (bottom right) with respect to sample size. The convergence rates of the Transfer Lasso in the region (i) and the Adaptive Transfer Lasso in the region (i) and (ii) are $\sqrt{m}$ (the slopes are $-1$), whereas the others are $\sqrt{n}$ or less (the slope are $-1/2$ or greater).
  • Figure 5: $\log$--$\log$ phase diagrams of convergence rate (top), active variable selection ratio (middle), and invariant variable selection ratio (bottom) for the Transfer Lasso (left) and the Adaptive Transfer Lasso (right). These empirical results confirm the theoretical results of Figure \ref{['fig:trlasso-consistent-region']} and \ref{['fig:adaptrlasso-consistent-selection-region']}.
  • ...and 14 more figures

Theorems & Definitions (43)

  • Lemma 2.2: Theorem 1 in fu2000asymptotics and Lemma 1 in zou2006adaptive
  • Corollary 2.3: Consistency for Lasso
  • Lemma 2.4: Theorem 2 in fu2000asymptotics and Lemma 2 in zou2006adaptive
  • Corollary 2.5: $\sqrt{n}$-consistency for Lasso
  • Lemma 2.6: Inconsistent Variable Selection; Proposition 1 in zou2006adaptive
  • Lemma 2.7: Lemma 3 in zou2006adaptive
  • Corollary 2.8: Slower Rate Consistency for Lasso
  • Lemma 2.9: Oracle Property for Adaptive Lasso; Theorem 2 in zou2006adaptive
  • proof
  • Lemma 3.2: Oracle Property for Adaptive Lasso with Different Sample Size
  • ...and 33 more