HAL-MLE Log-Splines Density Estimation (Part I: Univariate)

Yilong Hou; Zhengpu Zhao; Yi Li; Mark van der Laan

HAL-MLE Log-Splines Density Estimation (Part I: Univariate)

Yilong Hou, Zhengpu Zhao, Yi Li, Mark van der Laan

TL;DR

The paper develops HAL-MLE with a log-spline link for univariate density estimation under a variational penalty defined by bounded sectional variation norm, linking HAL to classical TV-penalized approaches. It proves univariate HAL-MLE is asymptotically linear, pointwise normal, and converges uniformly at rate $n^{-(k+1)/(2k+3)}$ (up to log factors for $k\ge1$), with plan to extend to higher dimensions. It situates HAL-MLE within NPMLE, provides a density-parameter delta-method variance estimator, and demonstrates asymptotic efficiency for pathwise-differentiable estimands via plug-in HAL-MLE and HAL-TMLE. The work includes optimization strategies and extensive simulations comparing HAL-MLE to logspline, trend filtering, and KDE, plus a galaxy velocity case study illustrating practical inference and variance estimation. Overall, the framework offers a unified, theoretically grounded approach to TV-penalized density estimation, with robust inference guarantees for nonparametric densities and related estimands.

Abstract

We study nonparametric maximum likelihood estimation of probability densities under a total variation (TV) type penalty, sectional variation norm (also named as Hardy-Krause variation). TV regularization has a long history in regression and density estimation, including results on $L^2$ and KL divergence convergence rates. Here, we revisit this task using the Highly Adaptive Lasso (HAL) framework. We formulate a HAL-based maximum likelihood estimator (HAL-MLE) using the log-spline link function from \citet{kooperberg1992logspline}, and show that in the univariate setting the bounded sectional variation norm assumption underlying HAL coincides with the classical bounded TV assumption. This equivalence directly connects HAL-MLE to existing TV-penalized approaches such as local adaptive splines \citep{mammen1997locally}. We establish three new theoretical results: (i) the univariate HAL-MLE is asymptotically linear, (ii) it admits pointwise asymptotic normality, and (iii) it achieves uniform convergence at rate $n^{-(k+1)/(2k+3)}$ up to logarithmic factors for the smoothness order $k \geq 1$. These results extend existing results from \citet{van2017uniform}, which previously guaranteed only uniform consistency without rates when $k=0$. We will include the uniform convergence for general dimension $d$ in the follow-up work of this paper. The intention of this paper is to provide a unified framework for the TV-penalized density estimation methods, and to connect the HAL-MLE to the existing TV-penalized methods in the univariate case, despite that the general HAL-MLE is defined for multivariate cases.

HAL-MLE Log-Splines Density Estimation (Part I: Univariate)

TL;DR

(up to log factors for

), with plan to extend to higher dimensions. It situates HAL-MLE within NPMLE, provides a density-parameter delta-method variance estimator, and demonstrates asymptotic efficiency for pathwise-differentiable estimands via plug-in HAL-MLE and HAL-TMLE. The work includes optimization strategies and extensive simulations comparing HAL-MLE to logspline, trend filtering, and KDE, plus a galaxy velocity case study illustrating practical inference and variance estimation. Overall, the framework offers a unified, theoretically grounded approach to TV-penalized density estimation, with robust inference guarantees for nonparametric densities and related estimands.

Abstract

and KL divergence convergence rates. Here, we revisit this task using the Highly Adaptive Lasso (HAL) framework. We formulate a HAL-based maximum likelihood estimator (HAL-MLE) using the log-spline link function from \citet{kooperberg1992logspline}, and show that in the univariate setting the bounded sectional variation norm assumption underlying HAL coincides with the classical bounded TV assumption. This equivalence directly connects HAL-MLE to existing TV-penalized approaches such as local adaptive splines \citep{mammen1997locally}. We establish three new theoretical results: (i) the univariate HAL-MLE is asymptotically linear, (ii) it admits pointwise asymptotic normality, and (iii) it achieves uniform convergence at rate

up to logarithmic factors for the smoothness order

. These results extend existing results from \citet{van2017uniform}, which previously guaranteed only uniform consistency without rates when

. We will include the uniform convergence for general dimension

in the follow-up work of this paper. The intention of this paper is to provide a unified framework for the TV-penalized density estimation methods, and to connect the HAL-MLE to the existing TV-penalized methods in the univariate case, despite that the general HAL-MLE is defined for multivariate cases.

Paper Structure (103 sections, 12 theorems, 140 equations, 37 figures, 8 tables)

This paper contains 103 sections, 12 theorems, 140 equations, 37 figures, 8 tables.

Introduction
The Highly Adaptive Lasso (HAL) Assumptions
Basic Concepts.
Relationship with TV
A Review of Locally Adaptive Regression Splines (LAS):
The Missing Càdlàg Perspective:
Conclusion:
HAL Density Estimation
HAL Construction.
HAL-MLE with Link Function
Comparison with NPMLE
Uniform Grid vs Data Adaptive Grid
Theoretical Properties for HAL-MLE
Rate of Convergence of HAL-MLE in Loss-based Dissimilarity
Pointwise Asymptotic Normality of the HAL-MLE
...and 88 more sections

Key Result

Proposition 2.1

Any $f\in D^{(k)}_U([0,1])$ could be represented as follows:

Figures (37)

Figure 1: Optimization algorithm comparison for Truncated Normal (2nd order basis): knot selection per iteration (top left), knot selection per FLOP (top right), loss convergence per iteration (bottom left), and loss convergence per FLOP (bottom right).
Figure 2: Six DGPs: (a) TN, (b) GS3, (c) GA3, (d) GS5, (e) Step, (f) Sine.
Figure 3: Uniform– convergence sup– norm error (log– scale) by DGP. Panels (row– wise, left to right): (a) TN, (b) GS3, (c) GA3, (d) GS5, (e) Step, (f) Sine.
Figure 4: Representative asymptotic– efficiency comparison (DGP: GA3). Columns show four statistical estimands (Mean, Median, Survival at 0.5, Second Moment) and rows show three metrics (MSE, Variance, $|$Bias$|$/SE). Curves compare the asymptotically efficient estimator (green), HAL– MLE (blue), and HAL– TMLE (red).
Figure 5: Bias at $n=800$ across six DGPs (columns) for five estimators (legend).
...and 32 more figures

Theorems & Definitions (37)

Definition 2.1: Càdlàg Functions with Bounded Sectional Variational Norm (BSVN)
Definition 2.2: The Function with k-th order BSVN
Proposition 2.1: Univariate Exact Representation
Remark 2.1
Remark 3.1
Remark 4.1
Remark 4.2
Theorem 4.1: $L^2$ convergence of HAL-MLE
Theorem 4.2: Asymptotic Linearity of HAL-MLE
Remark 4.3: The Necessity for Cross-Validation and Undersmoothing
...and 27 more

HAL-MLE Log-Splines Density Estimation (Part I: Univariate)

TL;DR

Abstract

HAL-MLE Log-Splines Density Estimation (Part I: Univariate)

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (37)

Theorems & Definitions (37)