Table of Contents
Fetching ...

An Optimal Design Framework for Lasso Sign Recovery

Jonathan W. Stallrich, Kade Young, Maria L. Weese, Byran J. Smucker, David J. Edwards

TL;DR

This work links SSD design to the statistical properties of the lasso by formulating and optimizing the probability of sign recovery, rather than traditional orthogonality-based criteria. It shows that orthogonality is not universally optimal when active signs are known and that designs with small, positive correlations can improve sign recovery, especially under known signs. The authors develop exact local and approximate criteria, extend them to signed-uncertain scenarios, and propose a fast construction algorithm (HILS) that leverages a Pareto-front search followed by lasso-based ranking. Collectively, the framework provides a principled, computation-efficient path to SSDs tailored for penalized-estimation screening and demonstrates favorable performance across varied scenarios. The approach offers practical design guidance with direct implications for improving variable screening in high-dimensional, supersaturated settings.

Abstract

Supersaturated designs investigate more factors than there are runs, and are often constructed under a criterion measuring a design's proximity to an unattainable orthogonal design. The most popular analysis identifies active factors by inspecting the solution path of a penalized estimator, such as the lasso. Recent criteria encouraging positive correlations between factors have been shown to produce designs with more definitive solution paths so long as the active factors have positive effects. Two open problems affecting the understanding and practicality of supersaturated designs are: (1) do optimal designs under existing criteria maximize support recovery probability across an estimator's solution path, and (2) why do designs with positively correlated columns produce more definitive solution paths when the active factors have positive sign effects? To answer these questions, we develop criteria maximizing the lasso's sign recovery probability. We prove that an orthogonal design is an ideal structure when the signs of the active factors are unknown, and a design constant small, positive correlations is ideal when the signs are assumed known. A computationally-efficient design search algorithm is proposed that first filters through optimal designs under new heuristic criteria to select the one that maximizes the lasso sign recovery probability.

An Optimal Design Framework for Lasso Sign Recovery

TL;DR

This work links SSD design to the statistical properties of the lasso by formulating and optimizing the probability of sign recovery, rather than traditional orthogonality-based criteria. It shows that orthogonality is not universally optimal when active signs are known and that designs with small, positive correlations can improve sign recovery, especially under known signs. The authors develop exact local and approximate criteria, extend them to signed-uncertain scenarios, and propose a fast construction algorithm (HILS) that leverages a Pareto-front search followed by lasso-based ranking. Collectively, the framework provides a principled, computation-efficient path to SSDs tailored for penalized-estimation screening and demonstrates favorable performance across varied scenarios. The approach offers practical design guidance with direct implications for improving variable screening in high-dimensional, supersaturated settings.

Abstract

Supersaturated designs investigate more factors than there are runs, and are often constructed under a criterion measuring a design's proximity to an unattainable orthogonal design. The most popular analysis identifies active factors by inspecting the solution path of a penalized estimator, such as the lasso. Recent criteria encouraging positive correlations between factors have been shown to produce designs with more definitive solution paths so long as the active factors have positive effects. Two open problems affecting the understanding and practicality of supersaturated designs are: (1) do optimal designs under existing criteria maximize support recovery probability across an estimator's solution path, and (2) why do designs with positively correlated columns produce more definitive solution paths when the active factors have positive sign effects? To answer these questions, we develop criteria maximizing the lasso's sign recovery probability. We prove that an orthogonal design is an ideal structure when the signs of the active factors are unknown, and a design constant small, positive correlations is ideal when the signs are assumed known. A computationally-efficient design search algorithm is proposed that first filters through optimal designs under new heuristic criteria to select the one that maximizes the lasso sign recovery probability.
Paper Structure (31 sections, 8 theorems, 104 equations, 10 figures)

This paper contains 31 sections, 8 theorems, 104 equations, 10 figures.

Key Result

Lemma 1

For a given $\textbf{X}$, events $S_\lambda$ and $I_\lambda$ are independent, and $\phi_\lambda(\textbf{X} \, | \, \boldsymbol{\beta})=\phi_\lambda(\textbf{X} \, | \, -\boldsymbol{\beta})$ for any $\boldsymbol{\beta}$ and its reflection, $-\boldsymbol{\beta}$. Hence an optimal $\textbf{X}^*$ for $\p

Figures (10)

  • Figure 1: Probability of sign recovery for $n=16$, $p=20$, and $k=8$ for an orthogonal design and two designs constructed from selecting columns from \ref{['eqn:LODbetter']} with $k_1=4$ and $8$. The left and right panels correspond to $\phi_\lambda$ and $\phi_\lambda^{\pm}$, respectively.
  • Figure 2: Plot of the logarithm of equation \ref{['eqn:PSk2_equal_limit']}. The function fails to be concave for $c<-0.4422$.
  • Figure 3: Plot of the logarithm of equation \ref{['eqn:PSk2_equal']} for two of the 750 maximin scenarios that violate log concavity. The red line corresponds to the $c^*$ value, being $-0.99$ and $-0.94$ for the left and right panels, respectively.
  • Figure 4: Plot of $\log[\phi_\lambda^\pm(\textbf{C}_\mathcal{A} \, | \, \boldsymbol{\beta}_\mathcal{A})]$ for two of the 750 maximin scenarios that violate log concavity. The red line corresponds to the $c^*$ value, being $0$ and $-0.80$ for the left and right panels, respectively.
  • Figure 5: Contour plots of sign recovery probabilities when $\textbf{C}$ is completely symmetric across possible values of the off-diagonal $c$ and $\log(\lambda)$ where $k=4$ factors are active with $\beta=2\sqrt{10}$. The red lines correspond to values of $c=0.14$ for $\psi_\Lambda$ and $c=0$ for $\psi_\Lambda^\pm$.
  • ...and 5 more figures

Theorems & Definitions (8)

  • Lemma 1
  • Theorem 1
  • Corollary 1
  • Proposition 1
  • Lemma 2
  • Lemma 3
  • Theorem 2
  • Theorem 3