Robust Wasserstein Profile Inference and Applications to Machine Learning
Jose Blanchet, Yang Kang, Karthyek Murthy
TL;DR
<p>We address regularization in machine learning by casting popular estimators as distributionally robust optimization problems with Wasserstein-based uncertainty sets. The authors introduce Robust Wasserstein Profile Inference (RWPI), a novel inference framework that uses transport-cost-based profile functions to automatically calibrate the uncertainty radius and, hence, regularization parameters without cross-validation. They prove dual representations and asymptotic limit theorems for the Robust Wasserstein Profile (RWP) function, enabling confidence regions and data-driven parameter tuning for square-root LASSO and regularized logistic regression, including high-dimensional settings. Numerical experiments corroborate the theoretical results, showing competitive performance with cross-validation while avoiding repetitive tuning, and illustrate practical guidance for selecting regularization in DRO-based ML estimators.</p>
Abstract
We show that several machine learning estimators, including square-root LASSO (Least Absolute Shrinkage and Selection) and regularized logistic regression can be represented as solutions to distributionally robust optimization (DRO) problems. The associated uncertainty regions are based on suitably defined Wasserstein distances. Hence, our representations allow us to view regularization as a result of introducing an artificial adversary that perturbs the empirical distribution to account for out-of-sample effects in loss estimation. In addition, we introduce RWPI (Robust Wasserstein Profile Inference), a novel inference methodology which extends the use of methods inspired by Empirical Likelihood to the setting of optimal transport costs (of which Wasserstein distances are a particular case). We use RWPI to show how to optimally select the size of uncertainty regions, and as a consequence, we are able to choose regularization parameters for these machine learning estimators without the use of cross validation. Numerical experiments are also given to validate our theoretical findings.
