Diaconis-Ylvisaker prior penalized likelihood for $p/n \to κ\in (0,1)$ logistic regression

Philipp Sterzinger; Ioannis Kosmidis

Diaconis-Ylvisaker prior penalized likelihood for $p/n \to κ\in (0,1)$ logistic regression

Philipp Sterzinger, Ioannis Kosmidis

TL;DR

This work develops a comprehensive high-dimensional inference framework for logistic regression under proportional asymptotics by analysing the maximum Diaconis–Ylvisaker prior penalized likelihood (MDYPL) estimator. By recasting the non-separable DY prior as a logistic model with transformed responses, the authors derive an AMP-based state-evolution description that yields aggregate bias, variance, and asymptotic distributions for the MDYPL estimator and associated test statistics. They establish adjusted Z-statistics and rescaled penalized-likelihood ratio statistics that achieve standard null distributions, extend results to arbitrary covariate covariance, and propose estimation procedures for the unknown constants, with strong empirical support including simulations and a digit-recognition case study. The paper also explores adaptive shrinkage via the DY prior, identifies conditions for aggregate unbiasedness, and provides a conjectured extension to models with intercepts, all implemented in the brglm2 package. Overall, the results offer robust estimation and inference in high-dimensional logistic regression even when ML fails, broadening practical applicability and enabling principled hypothesis testing in complex regimes.

Abstract

We characterise the behavior of the maximum Diaconis--Ylvisaker prior penalized likelihood estimator in high-dimensional logistic regression, where the number of covariates is a fraction $κ\in (0,1)$ of the number of observations $n$, as $n \to \infty$. We construct a rescaled estimator with zero asymptotic aggregate bias and define adjusted $Z$-statistics and rescaled penalized likelihood ratio statistics that exhibit the typical null asymptotic distributions, when the covariates are independent multivariate normal with an arbitrary covariance matrix and the linear predictor has asymptotic variance $γ^2$. While the maximum likelihood estimate asymptotically exists only for a narrow range of $(κ, γ)$ values, the maximum Diaconis--Ylvisaker prior penalized likelihood estimate always exists and can be computed directly using standard maximum likelihood routines. Thus, our asymptotic results extend to $(κ, γ)$ values where the maximum likelihood framework breaks down, with no additional implementation or computational cost. We study the estimator's shrinkage properties, compare the proposed estimation and inference procedures with alternatives that also accommodate proportional asymptotics, and formulate a conjecture -- supported by strong empirical evidence -- that extends our results when the model includes an intercept parameter. Finally, we propose estimation methods for all unknown constants involved in our procedures and demonstrate the theoretical advances through extensive simulation studies and the analysis of digit recognition data.

Diaconis-Ylvisaker prior penalized likelihood for $p/n \to κ\in (0,1)$ logistic regression

TL;DR

Abstract

We characterise the behavior of the maximum Diaconis--Ylvisaker prior penalized likelihood estimator in high-dimensional logistic regression, where the number of covariates is a fraction

of the number of observations

, as

. We construct a rescaled estimator with zero asymptotic aggregate bias and define adjusted

-statistics and rescaled penalized likelihood ratio statistics that exhibit the typical null asymptotic distributions, when the covariates are independent multivariate normal with an arbitrary covariance matrix and the linear predictor has asymptotic variance

. While the maximum likelihood estimate asymptotically exists only for a narrow range of

values, the maximum Diaconis--Ylvisaker prior penalized likelihood estimate always exists and can be computed directly using standard maximum likelihood routines. Thus, our asymptotic results extend to

values where the maximum likelihood framework breaks down, with no additional implementation or computational cost. We study the estimator's shrinkage properties, compare the proposed estimation and inference procedures with alternatives that also accommodate proportional asymptotics, and formulate a conjecture -- supported by strong empirical evidence -- that extends our results when the model includes an intercept parameter. Finally, we propose estimation methods for all unknown constants involved in our procedures and demonstrate the theoretical advances through extensive simulation studies and the analysis of digit recognition data.

Paper Structure (28 sections, 4 theorems, 17 equations, 8 figures, 1 table)

This paper contains 28 sections, 4 theorems, 17 equations, 8 figures, 1 table.

Introduction
Logistic regression
Maximum penalized likelihood estimation
Crossing the phase transition
Approximate message passing for logistic regression
Our contribution
Maximum Diaconis-Ylvisaker prior penalized likelihood
Asymptotic behaviour of the MDYPL estimator
Preamble
Aggregate behaviour
Arbitrary covariate covariance
Inference
Shrinkage towards zero
Adaptive shrinkage
Aggregate unbiasedness
...and 13 more sections

Key Result

Theorem 3.1

Consider the logistic regression model (eq:logistic_y) with independent covariates $\boldsymbol{x}_i \sim \mathrm{N}(\boldsymbol{0}_p,n^{-1}\boldsymbol{I}_p)$, and the empirical distribution of the elements of $\boldsymbol{\beta}_0$ converging weakly to $\bar{\beta} \sim \pi_{\bar{\beta}}$ with $\su where $G \sim \mathrm{N}(0,1)$ is independent of $\bar{\beta}$.

Figures (8)

Figure 1: MDYPL (left) and rescaled MDYPL (right) estimates for various configurations of $(\kappa,\gamma)$ and $\alpha = 1 / (1+\kappa)$ in the simulation setting of Section \ref{['sec:phase_transition']}. The white and grey area indicate where the ML estimate does or does not exist asymptotically, respectively. Red points show the average coefficient estimates over $10$ independent replications per $(\kappa, \gamma)$ setting. The cyan segments show the sample mean of the estimates for each value of the truth, and the black segments are the truth.
Figure 2: Q-Q plots comparing $\chi^2_k$ quantiles to empirical quantiles of the DY prior penalized likelihood ratio statistic (light grey) and its rescaled version ${b}_{*} / (\kappa {\sigma}_{*}^2) \Lambda_{I}$ (dark grey), for $\alpha \in \{1, 3/4, 1/2, 1/4, 1/(1 + \kappa)\}$ and $I = \{1, 2, \ldots, 5\}$ ($k = 5$; left) $I = \{1, 2, \ldots, 50\}$ ($k = 50$; right). The figures are based on $1000$ simulations of $\{\boldsymbol{y}, \boldsymbol{X}\}$ where $\boldsymbol{x}_i \sim \mathrm{N}(\boldsymbol{0}_p,n^{-1}\boldsymbol{I}_p)$, $n = 2000$, $\kappa \in \{0.1, 0.5\}$, $p = n \kappa$, and $\boldsymbol{\beta}_{0}$ has $p/2$ entries of zero and the remainder set to one, appropriately rescaled so that $\gamma^2 = 5$.
Figure 3: Contours of the asymptotic aggregate bias parameter ${\mu}_{*}$ (left) and asymptotic MSE (right) for $\alpha = 1 / (1 + \kappa)$ over a $(\kappa, \gamma)$ grid. The diamonds mark the point $(\kappa, \gamma) = (0.2, \sqrt{0.9})$, where the unscaled MDYPL estimator has been found to perform well in terms of aggregate bias ($\mu_{*} \approx 0.914$) and aggregate MSE (${\sigma}_{*}^2 + (1-{\mu}_{*})^2 {\gamma^2}{\kappa^{-1}} \approx 5.08$) in the experiments of Section \ref{['sec:phase_transition']}. The grey curve is the phase transition curve of candes+sur:2020.
Figure 4: Contours of the values of shrinkage parameter $\alpha$ that achieve asymptotic aggregate unbiasedness ${\mu}_{*} = 1$ (left), and minimal asymptotic aggregate variance ${{\sigma}_{*}^2} / {{\mu}_{*}^2}$ of $\hat{\boldsymbol{\beta}}^{\textrm{\tiny DY}} / {\mu}_{*} - \boldsymbol{\beta}_0$ (right). The grey curve is the phase transition curve of candes+sur:2020.
Figure 5: Performance comparison of the oracle and non-oracle versions of the CLS and MDYPL procedures for estimation and inference in the simulation setting of Section \ref{['subsec:corrected_ls']}. The top left panel shows the estimated root aggregate MSE (root aMSE) of the estimators for $n \in \{400, 800, \ldots, 2000\}$, $\kappa \in \{0.2, 0.5\}$. The grey dashed curve represents the asymptotic root aMSE ${\sigma}_{*} {\mu}_{*}^{-1} \sqrt{2 (n + \kappa^{-1})^{-1}}$ of the rescaled MDYPL estimator. The mid-left and bottom-left panels show the aggregate bias (aBias) for the zero and non-zero elements of the parameter vector $\boldsymbol{\beta}_{0}$, respectively. The right panel shows the estimated finite-sample distributions of the $p$-values from the two-sided test that a single element of the parameter vector $\boldsymbol{\beta}_{0}$ is zero, for the first $25$ zero parameters.
...and 3 more figures

Theorems & Definitions (5)

Theorem 3.1
Theorem 3.2
Theorem 3.3
Theorem 3.4
Conjecture 6.1

Diaconis-Ylvisaker prior penalized likelihood for $p/n \to κ\in (0,1)$ logistic regression

TL;DR

Abstract

Diaconis-Ylvisaker prior penalized likelihood for $p/n \to κ\in (0,1)$ logistic regression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (5)