Robust and Sparse Generalized Linear Models for High-Dimensional Data via Maximum Mean Discrepancy

Xiaoning Kang; Lulu Kang

Robust and Sparse Generalized Linear Models for High-Dimensional Data via Maximum Mean Discrepancy

Xiaoning Kang, Lulu Kang

TL;DR

This paper introduces an $\ell_1$-penalized MMD objective and develops two versions of the estimator: a full $O(n^2)$ version and a computationally efficient $O(n)$ approximation, which shows particular strength in handling high-leverage points and heavy-tailed error distributions, where traditional methods often fail.

Abstract

High-dimensional datasets are frequently subject to contamination by outliers and heavy-tailed noise, which can severely bias standard regularized estimators like the Lasso. While Maximum Mean Discrepancy (MMD) has recently been introduced as a "universal" framework for robust regression, its application to high-dimensional Generalized Linear Models (GLMs) remains largely unexplored, particularly regarding variable selection. In this paper, we propose a penalized MMD framework for robust estimation and feature selection in GLMs. We introduce an $\ell_1$-penalized MMD objective and develop two versions of the estimator: a full $O(n^2)$ version and a computationally efficient $O(n)$ approximation. To solve the resulting non-convex optimization problem, we employ an algorithm based on the Alternating Direction Method of Multipliers (ADMM) combined with AdaGrad. Through extensive simulation studies involving Gaussian linear regression and binary logistic regression, we demonstrate that our proposed methods significantly outperform classical penalized GLMs and existing robust benchmarks. Our approach shows particular strength in handling high-leverage points and heavy-tailed error distributions, where traditional methods often fail.

Robust and Sparse Generalized Linear Models for High-Dimensional Data via Maximum Mean Discrepancy

TL;DR

This paper introduces an

-penalized MMD objective and develops two versions of the estimator: a full

version and a computationally efficient

approximation, which shows particular strength in handling high-leverage points and heavy-tailed error distributions, where traditional methods often fail.

Abstract

-penalized MMD objective and develop two versions of the estimator: a full

version and a computationally efficient

approximation. To solve the resulting non-convex optimization problem, we employ an algorithm based on the Alternating Direction Method of Multipliers (ADMM) combined with AdaGrad. Through extensive simulation studies involving Gaussian linear regression and binary logistic regression, we demonstrate that our proposed methods significantly outperform classical penalized GLMs and existing robust benchmarks. Our approach shows particular strength in handling high-leverage points and heavy-tailed error distributions, where traditional methods often fail.

Paper Structure (11 sections, 33 equations, 1 figure, 5 tables)

This paper contains 11 sections, 33 equations, 1 figure, 5 tables.

Introduction
Preliminaries and MMD-based GLM Estimators
Penalized MMD Estimators for Gaussian and Binary Responses
Optimization Method
Gaussian Linear Regression
Binary Logistic Regression
Simulation Studies
Gaussian Linear Regression
Binary Logistic Regression
Real Data Applications
Conclusion

Figures (1)

Figure :

Theorems & Definitions (1)

Remark 2: Convexity of the Logistic MMD Loss

Robust and Sparse Generalized Linear Models for High-Dimensional Data via Maximum Mean Discrepancy

TL;DR

Abstract

Robust and Sparse Generalized Linear Models for High-Dimensional Data via Maximum Mean Discrepancy

Authors

TL;DR

Abstract

Table of Contents

Figures (1)

Theorems & Definitions (1)