Optimal convex $M$-estimation via score matching
Oliver Y. Feng, Yu-Chun Kao, Min Xu, Richard J. Samworth
TL;DR
The paper develops a data-driven approach to convex M-estimation in linear regression by leveraging score matching with the Fisher divergence to obtain the best decreasing score function under a convexity constraint. The key idea is the antitonic score projection, which yields a population-optimal score $\psi_0^*$ and its corresponding convex loss $\ell_0^*$ through a log-concave Fisher divergence projection, even when the error density is not log-concave. Semiparametric estimation is achieved via an alternating procedure that estimates $eta$ and the projected score from residuals, with three-fold cross-fitting ensuring $\, ext{sqrt} obreakspace{n}$-consistency and asymptotic normality that attains an antitonic efficiency lower bound $i^*(p_0)$. In heavy-tailed scenarios like Cauchy errors, the resultingHuber-like loss $\,\, extell_0^*$ provides substantial robustness with minimal loss of efficiency (ARE$^*$ near 0.88), and numerical experiments with the R package asm corroborate both accuracy and computational efficiency. Overall, the framework unites shape-constrained estimation, Fisher-information-inspired projections, and robust convex optimization to deliver practically efficient, statistically near-optimal linear regression under unknown error distributions.
Abstract
In the context of linear regression, we construct a data-driven convex loss function with respect to which empirical risk minimisation yields optimal asymptotic variance in the downstream estimation of the regression coefficients. At the population level, the negative derivative of the optimal convex loss is the best decreasing approximation of the derivative of the log-density of the noise distribution. This motivates a fitting process via a nonparametric extension of score matching, corresponding to a log-concave projection of the noise distribution with respect to the Fisher divergence. At the sample level, our semiparametric estimator is computationally efficient, and we prove that it attains the minimal asymptotic covariance among all convex $M$-estimators. As an example of a non-log-concave setting, the optimal convex loss function for Cauchy errors is Huber-like, and our procedure yields asymptotic efficiency greater than $0.87$ relative to the maximum likelihood estimator of the regression coefficients that uses oracle knowledge of this error distribution. In this sense, we provide robustness and facilitate computation without sacrificing much statistical efficiency. Numerical experiments using our accompanying R package 'asm' confirm the practical merits of our proposal.
