Table of Contents
Fetching ...

Beyond Maximum Likelihood: Variational Inequality Estimation for Generalized Linear Models

Linglingzhi Zhu, Jonghyeok Lee, Yao Xie

TL;DR

Numerical experiments show that the VI framework preserves the statistical efficiency of MLE while substantially extending its applicability to more challenging GLM settings, and establishes both non-asymptotic estimation error bounds and asymptotic normality for the VI estimator, and provides convergence guarantees for fixed-point and stochastic approximation algorithms.

Abstract

Generalized linear models (GLMs) are fundamental tools for statistical modeling, with maximum likelihood estimation (MLE) serving as the classical method for parameter inference. While MLE performs well in canonical GLMs, it can become computationally inefficient near the true parameter value. In more general settings with non-canonical or fully general link functions, the resulting optimization landscape is often non-convex, non-smooth, and numerically unstable. To address these challenges, we investigate an alternative estimator based on solving the variational inequality (VI) formulation of the GLM likelihood equations, originally proposed by Juditsky and Nemirovski as an alternative for solving nonlinear least-squares problems. Unlike their focus on algorithmic convergence in monotone settings, we analyze the VI approach from a statistical perspective, comparing it systematically with the MLE. We also extend the theory of VI estimators to a broader class of link functions, including non-monotone cases satisfying a strong Minty condition, and show that it admits weaker smoothness requirements than MLE, enabling faster, more stable, and less locally trapped optimization. Theoretically, we establish both non-asymptotic estimation error bounds and asymptotic normality for the VI estimator, and further provide convergence guarantees for fixed-point and stochastic approximation algorithms. Numerical experiments show that the VI framework preserves the statistical efficiency of MLE while substantially extending its applicability to more challenging GLM settings.

Beyond Maximum Likelihood: Variational Inequality Estimation for Generalized Linear Models

TL;DR

Numerical experiments show that the VI framework preserves the statistical efficiency of MLE while substantially extending its applicability to more challenging GLM settings, and establishes both non-asymptotic estimation error bounds and asymptotic normality for the VI estimator, and provides convergence guarantees for fixed-point and stochastic approximation algorithms.

Abstract

Generalized linear models (GLMs) are fundamental tools for statistical modeling, with maximum likelihood estimation (MLE) serving as the classical method for parameter inference. While MLE performs well in canonical GLMs, it can become computationally inefficient near the true parameter value. In more general settings with non-canonical or fully general link functions, the resulting optimization landscape is often non-convex, non-smooth, and numerically unstable. To address these challenges, we investigate an alternative estimator based on solving the variational inequality (VI) formulation of the GLM likelihood equations, originally proposed by Juditsky and Nemirovski as an alternative for solving nonlinear least-squares problems. Unlike their focus on algorithmic convergence in monotone settings, we analyze the VI approach from a statistical perspective, comparing it systematically with the MLE. We also extend the theory of VI estimators to a broader class of link functions, including non-monotone cases satisfying a strong Minty condition, and show that it admits weaker smoothness requirements than MLE, enabling faster, more stable, and less locally trapped optimization. Theoretically, we establish both non-asymptotic estimation error bounds and asymptotic normality for the VI estimator, and further provide convergence guarantees for fixed-point and stochastic approximation algorithms. Numerical experiments show that the VI framework preserves the statistical efficiency of MLE while substantially extending its applicability to more challenging GLM settings.

Paper Structure

This paper contains 22 sections, 8 theorems, 123 equations, 4 figures, 4 tables, 2 algorithms.

Key Result

Lemma 4.1

Suppose that the inverse link function $g^{-1}$ satisfies the strong monotonicity with modulus $\mu_g \ge 0$ or the averaged strong Minty condition with $\operatorname{Sol}(V_N)\neq\emptyset$ that Then the vector field $V_N$ satisfies the strong Minty condition with modulus $\mu_g\sigma_N^2/N$, where $\sigma_N$ is the minimal singular value of $\tilde{\bm{X}}_N:=[\bm{1}\; \bm{X}_N]$, i.e.,

Figures (4)

  • Figure 1: MLE loss function $\mathcal{L}(\beta)=-\log g^{-1}(\beta)+g^{-1}(\beta)$ and its derivative.
  • Figure 2: VI vector field $V(\beta)=g^{-1}(\beta)-1$.
  • Figure 3: Convergence trajectories of VI and MLE for Poisson regression with different link functions ($d=20, N=400$).
  • Figure 4: Average squared error for VI and MLE against iteration budgets for Poisson regression with different link functions ($d=20, N=400$).

Theorems & Definitions (23)

  • Lemma 4.1: Sufficient Condition for Strong Minty Condition
  • Remark 4.1: Strong Minty condition
  • Lemma 4.2
  • Theorem 4.3: Estimation error
  • Remark 4.2
  • Lemma 4.4
  • Theorem 4.5: Asymptotic normality of the VI estimator
  • Remark 4.3: Comparison of VI with MLE on asymptotic statistical efficiency
  • Theorem 5.1: Linear convergence of fixed-point method
  • Theorem 5.2: Sublinear estimation error of stochastic approximation
  • ...and 13 more